System Resource Manager

Components of the infrastructure need resources in order to function. Resources may consist of processes (or execution contexts), communication hardware, or other shared system functionality. The resource manager is responsible for coordinating interaction between components that require resources, and the control and allocation of those resources by the target system.

Activities that are provided by the resource manager include:

  • resource discovery
  • allocation and deallocation of resources
  • resource usage information

Since resource allocation by the target system can take place by a variety of mechanisms, the resource manager attempts to provide an abstraction of these services. All interaction with external job schedulers and runtime systems is undertaken by the resource manager.

The resource manager assumes the following:

  • notification of successful/unsuccessful allocation is asynchronous
  • resources can be named, and there a number of predefined resource names
  • in addition to the name of the resource, an allocation can also take an arbitrary number of attributes that further specify the allocation request (e.g. a process allocation request might take "number of processes" as an attribute)
  • allocation may be partly successful, completely successful, or unsuccessful
  • resources should be released when they are no longer needed
  • the allocated resources may become unavailable at any point

The following default resources and attributes will always be available:

Name Description Attributes
process Allocation of computational resource number, location, category

Resource Discovery

Resource Allocation Request

Resource Deallocation

Resource Usage

Specific Resources

Processor allocation

This includes both allocating "compute" nodes, and "service" nodes

  • Determine how the allocation is specified, e.g.:
    • user supplied list of hosts and processor counts (no Scheduler is running) ; we are done
    • Allocation size to be requested from a scheduler (e.g. PBS)
    • User supplied description of actual nodes to be requested from a Scheduler (e.g. PBS)
  • Make the request(s) to the Scheduler
  • Create handle(s) referencing the request(s)
  • Wait for request(s) to be satisfied, or to time out
  • Return handle(s) (or failure(s))

Compute node requests and service node requests may need to be obtained in separate requests.

processor_allocation.png
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License