High level components
- Execution contexts
- Communication
- Session control
- Persistence
- Security
Execution contexts
Components associated with managing execution contexts. Execution context initiation, termination, monitoring and instantiation.
- Bootstrapping
- system (infrastructure lifecycle)
- tool
- Execution context management
- Monitoring
- Responding to changes in execution context (e.g., do something when an execution context dies)
- Initiate changes to execution contexts (e.g., kill an execution context)
- Start execution context on an allocated resource
- Resource management (discovery, allocation, deallocation) e.g., locations (e.g., nodes, cores) or network resources
- Discovering what resources are out there
- Allocation step: on systems where an execution context is started upon allocation, this can return "delayed allocation" and the resources are actually allocated when the execution context is started
- Query what was actually allocated: in the "delayed allocation" case, this can return "nothing yet".
- Allocate for real and execute: for the "delayed allocation"
Communications
- Point-to-point transfer layer
- Description:
- transfer data between two end-point
- Functionality:
- send data (input: "VC" or "endpoint")
- multisend data (send same message to multiple VCs)
- receive data
- send data (input: "VC" or "endpoint")
- Description:
- Routing layer
- Description:
- Translates (stream, dest) pair into "VC/endpoint"
- If message is to be routed, VC will point to next "hop"
- Uses the Point-to-point transfer layer to send data on each link
- For sending to all children call pt-to-pt multisend function
- When messages are received, route data to next hop if necessary, otherwise forward to ordering layer
- Translates (stream, dest) pair into "VC/endpoint"
- Description:
- Ordering/reliability layer
- Description:
- Ensures messages are received correctly and in order
- Functionality:
- Reorder messages
- Acks/Naks/timeouts
- Description:
- Active message layer
- Description:
- Implement the active messaging protocol
- Sends message through routing layer
- Functionality:
- Setup active messaging transfer
- Support for receive side active message callbacks
- Implements rendezvous protocol
- Description:
- Group operations
- Barrier synchronization
- Allgather
- Broadcast
- Wrappers
- Reduction streams
- Description:
- Wrapper functions to implement MRNet "filter" style functionality over STCI streams
- Description:
- Point-to-point matching
- Wrapper functions to provide matching for pt-to-pt messages
- Quarry functions (i.e., broadcast request, and gather responses)
- Requests: test/wait
- Reduction streams
Session Control
This component deals with managing the session. Creating new sessions, expanding/contracting sessions, merging sessions, detaching/attaching front-ends, management of streams (create, quiesce, flush, delete), topology management.
- Session manager
- Creates session
- Destroy session
- Expand/contract session
- Attach detach front-end
- Merge session
- Stream manager
- Description
- Uses topology manager to locate junctions and agents
- Connects junctions, agents and front end according to topology
- Controls streams (quiesce, flush)
- Junction manager
- Description:
- Uses file staging component to stage junctions to service locations.
- Keeps track of where junctions binaries have been staged and should be staged
- Description:
- Description
- Topology manager
- Description
- Keeps track of logical topology definitions ("create topology")
- Keeps track of logical topology mappings (maps logical nodes to physical locations)
- Keeps track of staged topologies (where junctions and agents are staged and loaded)
- Maps logical topology
- Associates nodes in logical topology description to physical locations
- Stages a mapped topology
- Uses junction manager to stage junctions and agents
- Uses execution context component to load junctions
- Description
- File staging manager
- Description:
- Junction and Agent binaries are "registered" with file staging manager.
- Agent and stream managers can use this component to stage agent and junction binaries to compute and service locations (aka nodes).
- Description:
- Agent manager
- Description:
- Using topology manager determines where agents are located
- Starts agents
- Description:
- I/O Forwarding
- Description
- Forwards stdout and stderr from agents to front-end
- Forwards stdin from front-end to one or more agents
- May compress/aggregate data or add labels to output
Persistence
Where persistent state is stored. Store anything that would need to be cleaned up. E.g., publish/subscribe, info for attaching and detaching front end, session and tool agent location.
- Data storage
- Stores the actual data
- Might be implemented in a distributed/replicated fashion
- Policy server
- other components query this component regarding policy (e.g., security, recovery, error handling)
- Uses data storage to store and retrieve policy
- Session lookup
- Used when front-end is connecting to a session for connection info and other state associated with the session
- Used by a session for info on merging with another session
- Uses data storage to store and retrieve session info
- Event manager
- Listens for events and responds according to policy (e.g., some component is unresponsive —> clean it up)
Security
This component manages credentials for access to various resources. E.g., check the credentials of a front-end requesting to connect to a session; issue credentials to a session when it's created; manage privilege level (e.g., when a service agent creates a new session agent, the appropriate privilege level must be set). Works with policy manager.
- Credential manager
- Issue credentials
- Verify credentials
- Revoke credentials
- Session credential manager (one per session)
- Validates requests for front-end connections
- Validates requests to merge sessions
- Resource credential manager
- Interacts with OS/environment to get authorization to create or use resources (e.g., create a process, allocate a node, ssh into a node)
- Infrastructure credential manager
- Validates requests to connect to infrastructure
- Validates requests to infrastructure such as to create new session or delete existing session