Batch Instrumentation Of Running Program

Title

Batch Instrumentation of Running Program

High Level Description:

This use case covers the cases where the user has instrumented an application program by modifying the source code and recompiling and relinking, or by use of a tool which inserts the instrumentation at the source code or binary level, and recompiling, relinking, or rewriting the executable as required.

The user then invokes the application, either as an interactive session, i.e. thru an IDE such as Eclipse, or as a batch job submission. In either case, the user has no interaction with the instrumentation.

When the application runs, the instrumentation gathers required data. This data may be sent to a collection front end either as the application is running, or sent at the end of application execution, at or just before the application calls exit().

Environment Assumptions

  1. If the instrumentation requires additional resources such as special run time libraries or special node configuration, that setup is outside the scope of the infrastructure. If the user has sufficient privilege, he will handle those requirements prior to running the application. Otherwise, those requirements will be handled by system administration staff.
  2. The system provides a method for the infrastructure to determine the mapping of application tasks to nodes, including application task rank.
  3. The application runs using the user id of the invoking user.
  4. The instrumentation does not require any additional privileges beyond those granted to the application by the system.
  5. Systems with general purpose nodes, with compute and I/O nodes, or with heterogeneous architecture should be supported.
  6. The visualization (GUI) component of the analysis tool may run on a different node (and even different OS or hardware architecture) than the nodes where the application runs

STCI Infrastructure Assumptions

  1. The infrastructure is assumed to be up and running before the analysis session begins.
  2. The infrastructure provides an API for agent code linked with an application to determine if it has successfully connected to the infrastructure.
  3. The infrastructure allows plugins to be loaded and unloaded at any time.
  4. The infrastructure provides a way to control the order of loading or unloading for plugins.
  5. The infrastructure provides a way to enforce dependencies between plugins, where a plugin may require another plugin to be present, or a load request may be rejected if a specific plugin requires that another plugin not be present.
  6. If there are endian or other data representation issues due to heterogeneous architecture, the infrastructure must deal with those issues transparently.
  7. The security component may need to refresh user credentials for the infrastructure from the node where the analysis tool was invoked, since in some security implementations, credentials have a lifetime.

Specific Use Cases

There are six sub-cases to consider

  • The user runs the instrumented application interactively using an analysis tool and the data generated by the instrumentation is sent back to the analysis tool in real time as the application runs.
  • The user runs the instrumented application interactively using an analysis tool and the data generated by the instrumentation is sent back to the analysis tool at the end of application execution.
  • The user submits the instrumented application to a batch job scheduler. When the job scheduler runs the job, the data generated by the instrumentation is handled in real time.
  • The user submits the instrumented application to a batch job scheduler. When the job scheduler runs the job, the data generated by the instrumentation is handled as part of application termination.
  • The user invokes the analysis tool to collect performance data from an already running application.
  • The user wants to exit from the analysis tool while an instrumented application is being monitored by that tool, and wants to leave the application running with the instrumentation dea ctivated.

Interactive Execution, Real Time Data Collection

Since data is being sent back to the analysis tool in real time, high bandwidth and low latency are both important for this case.

  1. The user builds an executable with the required instrumentation.
  2. The user ensures the executable is distributed to the execution nodes.
  3. The user starts the analysis tool.
  4. The analysis tool attempts to connect to the infrastructure.
    • If the analysis tool cannot connect to the infrastructure, it issues an error message and the user must resolve the problem
  5. The analysis tool passes identification and authentication/authorization credentials to the infrastructure.
  6. The infrastructure validates the identity and permissions of the user using the identity, authentication and authorization information passed by the analysis tool.
    • If the identity or permissions checks fail, the infrastructure returns an appropriate error indication. It is up to the analysis tool to take appropriate action for the error notification, possibly issuing an error message. The analysis tool may be able to recover from the error automatically, or it may be up to the user to resolve the problem.
  7. The application is invoked under analysis tool control.
  8. The analysis tool waits for the system to provide it a mapping of application tasks to nodes.
  9. At some point, the instrumentation code in an application task is called for the first time, at which time it must connect to the infrastructure at appropriate point for that application task. This execution may overlap with the analysis tool waiting for mapping of application tasks to nodes.
    1. That connection request needs to be authenticated and authorized by the security model.
    2. The instrumentation code in the application task may need to block execution of some or all threads in the application task until the instrumentation code receives notification via the infrastructure to resume execution. This gives the user or the analysis tool an opportunity, if needed, to ensure that the tool has properly initialized both itself and the infrastructure properly, including such steps as ensuring plugins are properly loaded by infrastructure nodes and plugin parameters have been set.
    3. If the infrastructure rejects a connection request from instrumentation code, it will be up to the instrumentation code to determine how to handle the failure, possibly taking such actions as terminating the application task (and thus terminating the application) or just disabling instrumentation code and allowing the application to continue to run.
    4. In a multi-threaded application, it is possible for a second thread to make an instrumentation call while the first thread is still trying to establish a connection to the infrastructure. In this case, the instrumentation code must recognize a connection is in process and block the second thread until the connection is established by the first thread.
  10. The analysis tool waits for instrumentation code in all tasks it will be collecting data from (this may be all application tasks or only a subset of the application tasks which are of interest to the user or the analysis tool) to connect to the infrastructure.
  11. At this point, the analysis tool understands the application task mapping, all required application tasks have connected to the infrastructure, and initialization is complete. The application may be suspended at this point in time, depending on analysis tool requirements, waiting for notification from the infrastructure to resume execution.
  12. The user specifies the set of plugins to be used. Depending on the analysis tool, the user may have the ability to modify the set of default plugins needed by the analysis tool. The user or the analysis tool may specify the order that plugins are loaded, to facilitate stacking plugins. Plugins may also have specific loading order requirements and dependencies on the presence or absence of other plugins.
  13. The user has the option of changing the parameters for any plugin, where those parameters may be unique to a plugin handling a specific set of application tasks.
  14. The analysis tool requests the infrastructure to load the requested plugins and set the parameters for those plugins. This may be part of the processing by the analysis tool when the user requests that the application be started, or it may be a separate user-initiated activity. Plugins may need to be sent (staged) to the infrastructure nodes where they are needed, or they may already be present at those intermediate nodes.
  15. The analysis tool may allow the user to request that plugins be removed or that their parameters be modified.
  16. Upon user request, the analysis tool requests the infrastructure to resume application execution.
  17. The infrastructure informs the instrumentation code in each application task to resume execution.
  18. At this point, the application is running and the instrumentation code is sending messages to the analysis tool thru the front end as necessary.
  19. The instrumentation code generates a message to be sent to the analysis tool thru the infrastructure. Each message contains a header that uniquely identifies the message, possibly by message type and sequence number within message type, and identifies it's origin. Each message may optionally contain a body (simple notifications may have no body) whose structure is known to the instrumentation code, plugins that will process it, and by the analysis tool. The infrastructure treats the message body as an opaque binary object.
    1. The instrumentation code sends the message to the infrastructure using the connection to it's designated infrastructure node.
    2. As an infrastructure node receives a message, it examines the header to determine the message type. If there are one or more plugins registered to handle that message type, those plugins are called, in order of loading as specified by the analysis tool.
    3. If a plugin is called to process a message, it must run with the same userid and credentials as the userid that invoked the application and the analysis tool.
    4. When a plugin processes a message, it may modify the message body, it may consume the message (meaning it is not forwarded to additional plugins and/or upward thru the infrastructure), or it may generate additional messages.
    5. For simple filters, asynchronous processing of messages may be sufficient. For more sophisticated filtering, the infrastructure will provide barrier and task grouping functions that the plugins can use.
    6. After optional processing of received messages by plugins, the infrastructure sends the resulting messages upward towards the analysis tool.
    7. The top level node in the infrastructure communication tree forwards any messages it receives indirectly (thru the infrastructure) from the instrumentation code to the analysis tool.
  20. The user or analysis tool may want to change the parameters for a plugin while the application is running.
    1. The user or analysis tool determines the set of plugins that the message should be sent to.
    2. The analysis tool sends the message to the infrastructure.
    3. The infrastructure routes the message to the infrastructure node(s) where the target plugins reside.
    4. The infrastructure on the target node passes the message to the target plugins.
    5. The plugin processes the message.
    6. The plugin optionally sends an acknowledgment (with or without additional data) back to the analysis tool thru the infrastructure.
  21. Eventually, the application terminates, normally or abnormally.
    1. The instrumentation code may send a notification message to the infrastructure prior to task termination. That message is processed no differently than any other message from the instrumentation code, and neither the analysis tool nor the infrastructure should not rely on an explicit termination message.
    2. Regardless of how an infrastructure node detects application task termination, by explicit message or otherwise, it must ensure that it cleans up any resources held by the infrastructure on behalf of that application task.
    3. Notification of the application task termination should be passed upwards thru the infrastructure to the analysis tool so that intermediate infrastructure nodes and the analysis tool can properly clean up any resources that were held on behalf of the application task.
  22. The analysis tool may terminate before the application execution completes, again by normal termination at user request, or by abnormal termination.
  23. The user may want the application to continue execution after the analysis tool exits, or he may want the application to be terminated.
  24. The analysis tool should provide a way for the user to specify the action taken at analysis tool termination, either terminate the application or allow it to continue.
    1. If the user wants the application to terminate, then the analysis tool will terminate the application's controlling process (poe, mpirun, etc). This should cause all application processes to terminate, which in turn results in infrastructure resource cleanup as described above for application termination.
    2. If the user wants to allow the application to continue execution, then the infrastructure must ensure resources are left in place to allow the application to continue execution even if data is no longer being connected. The infrastructure may notify the instrumentation code in the application task that data collection is suspended.

Interactive Execution, Data Collection at Application Termination

This use case is similar to the preceding case, where the major difference is that the instrumentation code generates the data to be sent back to the analysis tool as part of final processing in the application before the application terminates, or as part of the application termination processing.

Since the data generated by the instrumentation code is generated at application termination, with no user interaction, latency may not be an important consideration. Since instrumentation may generate large amounts of data, bandwidth is still an important consideration.

Since the instrumentation code does not attempt to send the accumulated data until application termination, a connection to the infrastructure is not strictly required until the instrumentation code attempts to send data. However, this may result in a wasted run, since the application is run to completion, then at completion, an error occurs which prevents the data from being sent to the analysis tool.

For this reason, the instrumentation code should still be required to get control for the first time as an application begins execution, or as close to that time as possible in order to establish the infrastructure connections, allow the plugins to be loaded, and any parameters required for the plugins to be passed to the plugins. Once the initial setup is complete, the user instructs the analysis tool to begin application execution. The analysis tool in turn requests the infrastructure to notify the application to resume execution. The user has no further interaction with the application until the instrumentation code has sent the data thru the infrastructure to the analysis tool.

A second, optional difference in processing is in the case where the analysis tool terminates before the instrumentation code has sent the data to the analysis tool. If the application is in termination processing, the instrumentation code could just terminate the application when it detects that the connections to the infrastructure have been closed. It may be safer, however, to just follow the same sequence as in the preceding use case where the analysis tool terminates, especially if it is difficult to ensure that the instrumentation code can safely terminate the application.

Batch Execution

In both batch execution cases, the primary difference is that the analysis tool is replaced by a non-interactive tool, probably controlled by a scripting language. The interaction between the non-interactive tool and the infrastructure is the same as with an interactive analysis tool. With a batch execution model, latency probably is not as important, but since large amounts of data can be generated, bandwidth is still a consideration.

Otherwise, the use cases are the same as described above.

Invoke Analysis Tool to Collect Data From a Running Application

This case has similarities to the interactive execution, real time data collection case. The flow of this use case jumps in and out of the steps described for the interactive execution, real time data collection case, so the complete flow for this case is included here rather than summarizing the differences between the two use cases.

  1. The user builds an executable with the required instrumentation.
  2. The user ensures the executable is distributed to the execution nodes.
  3. The user invokes the application, outside the control of the analysis tool, for instance by command line invocation of the executable.
  4. Although instrumentation calls are being made by the application, the instrumentation code must recognize that the analysis tool is not running, and therefore the instrumentation may need to be running in passive mode, acting as if it was a NOP.
  5. Some time passes while the application runs.
  6. The same user (userid) that started the application starts the analysis tool.
  7. The analysis tool attempts to connect to the infrastructure.
    • If the analysis tool cannot connect to the infrastructure, it issues an error message and the user must resolve the problem
  8. The analysis tool passes identification and authentication/authorization credentials to the infrastructure.
  9. The infrastructure validates the identity and permissions of the user using the identity, authentication and authorization information passed by the analysis tool.
    • If the identity or permissions checks fail, the infrastructure returns an appropriate error indication. It is up to the analysis tool to take appropriate action for the error notification, possibly issuing an error message. The analysis tool may be able to recover from the error automatically, or it may be up to the user to resolve the problem.
  10. The analysis tool waits for the system to provide it a mapping of application tasks to nodes. This assumes the application runtime environment provides a means to find the application tasks.
  11. The analysis tool requests the infrastructure to notify the application's instrumentation code that the analysis tool is now active and that the application's instrumentation code now needs to attempt to connect to the infrastructure as soon as possible.
  12. At some point, afterwards, the instrumentation code in an application task is called, at which time it must connect to the infrastructure at appropriate point for that application task. This execution may overlap with the analysis tool waiting for mapping of application tasks to nodes, where once the infrastructure knows the location of a given application task, it may notify that application task's instrumentation code that it should attempt to connect to the infrastructure.
    1. That connection request needs to be authenticated and authorized by the security model.
    2. The instrumentation code in the application task may need to block execution of some or all threads in the application task until the instrumentation code receives notification via the infrastructure to resume execution. This gives the user or the analysis tool an opportunity, if needed, to ensure that the tool has properly initialized both itself and the infrastructure properly, including such steps as ensuring plugins are properly loaded by infrastructure nodes and plugin parameters have been set.
    3. If the infrastructure rejects a connection request from instrumentation code, it will be up to the instrumentation code to determine how to handle the failure, possibly taking such actions as terminating the application task (and thus terminating the application) or just disabling instrumentation code and allowing the application to continue to run.
    4. In a multi-threaded application, it is possible for a second thread to make an instrumentation call while the first thread is still trying to establish a connection to the infrastructure. In this case, the instrumentation code must recognize a connection is in process and block the second thread until the connection is established by the first thread.
  13. The analysis tool waits for instrumentation code in all tasks it will be collecting data from (this may be all application tasks or only a subset of the application tasks which are of interest to the user or the analysis tool) to connect to the infrastructure.
  14. At this point, the analysis tool understands the application task mapping, all required application tasks have connected to the infrastructure, and initialization is complete. The application may be suspended at this point in time, depending on analysis tool requirements, waiting for notification from the infrastructure to resume execution.
  15. The user specifies the set of plugins to be used. Depending on the analysis tool, the user may have the ability to modify the set of default plugins needed by the analysis tool. The user or the analysis tool may specify the order that plugins are loaded, to facilitate stacking plugins. Plugins may also have specific loading order requirements and dependencies on the presence or absence of other plugins.
  16. The user has the option of changing the parameters for any plugin, where those parameters may be unique to a plugin handling a specific set of application tasks.
  17. The analysis tool requests the infrastructure to load the requested plugins and set the parameters for those plugins. This may be part of the processing by the analysis tool when the user requests that the application be started, or it may be a separate user-initiated activity. Plugins may need to be sent (staged) to the infrastructure nodes where they are needed, or they may already be present at those intermediate nodes.
  18. The analysis tool may allow the user to request that plugins be removed or that their parameters be modified.
  19. Upon user request, the analysis tool requests the infrastructure to resume application execution.
  20. The infrastructure informs the instrumentation code in each application task to resume execution.
  21. At this point, the application is running and the instrumentation code is sending messages to the analysis tool thru the front end as necessary.
  22. The instrumentation code generates a message to be sent to the analysis tool thru the infrastructure. Each message contains a header that uniquely identifies the message, possibly by message type and sequence number within message type, and identifies it's origin. Each message may optionally contain a body (simple notifications may have no body) whose structure is known to the instrumentation code, plugins that will process it, and by the analysis tool. The infrastructure treats the message body as an opaque binary object.
    1. The instrumentation code sends the message to the infrastructure using the connection to it's designated infrastructure node.
    2. As an infrastructure node receives a message, it examines the header to determine the message type. If there are one or more plugins registered to handle that message type, those plugins are called, in order of loading as specified by the analysis tool.
    3. If a plugin is called to process a message, it must run with the same userid and credentials as the userid that invoked the application and the analysis tool.
    4. When a plugin processes a message, it may modify the message body, it may consume the message (meaning it is not forwarded to additional plugins and/or upward thru the infrastructure), or it may generate additional messages.
    5. For simple filters, asynchronous processing of messages may be sufficient. For more sophisticated filtering, the infrastructure will provide barrier and task grouping functions that the plugins can use.
    6. After optional processing of received messages by plugins, the infrastructure sends the resulting messages upward towards the analysis tool.
    7. The top level node in the infrastructure communication tree forwards any messages it receives indirectly (thru the infrastructure) from the instrumentation code to the analysis tool.
  23. The user or analysis tool may want to change the parameters for a plugin while the application is running.
    1. The user or analysis tool determines the set of plugins that the message should be sent to.
    2. The analysis tool sends the message to the infrastructure.
    3. The infrastructure routes the message to the infrastructure node(s) where the target plugins reside.
    4. The infrastructure on the target node passes the message to the target plugins.
    5. The plugin processes the message.
    6. The plugin optionally sends an acknowledgment (with or without additional data) back to the analysis tool thru the infrastructure.
  24. Eventually, the application terminates, normally or abnormally.
    1. The instrumentation code may send a notification message to the infrastructure prior to task termination. That message is processed no differently than any other message from the instrumentation code, and neither the analysis tool nor the infrastructure should not rely on an explicit termination message.
    2. Regardless of how an infrastructure node detects application task termination, by explicit message or otherwise, it must ensure that it cleans up any resources held by the infrastructure on behalf of that application task.
    3. Notification of the application task termination should be passed upwards thru the infrastructure to the analysis tool so that intermediate infrastructure nodes and the analysis tool can properly clean up any resources that were held on behalf of the application task.
  25. The analysis tool may terminate before the application execution completes, again by normal termination at user request, or by abnormal termination.
  26. The user may want the application to continue execution after the analysis tool exits, or he may want the application to be terminated.
  27. The analysis tool should provide a way for the user to specify the action taken at analysis tool termination, either terminate the application or allow it to continue.
    1. If the user wants the application to terminate, then the analysis tool will terminate the application's controlling process (poe, mpirun, etc). This should cause all application processes to terminate, which in turn results in infrastructure resource cleanup as described above for application termination.
    2. If the user wants to allow the application to continue execution, then the infrastructure must ensure resources are left in place to allow the application to continue execution even if data is no longer being connected. The infrastructure may notify the instrumentation code in the application task that data collection is suspended.

Shutdown Analysis Tool, Disconnecting from Running Application

This case is essentially same as the last portion of the interactive execution, real time data collection case, where the use case flow starts at the point where the user wants to terminate the analysis tool and leave the application running. The objective here is to reclaim any infrastructure resources which were held on behalf of the analysis tool or the application's instrumentation code, and to allow the application's instrumentation code to recover from a closed connection to the infrastructure and continuing to execute.

Target Computer Systems

The specification should not be tied to a particular operating system or hardware architecture. It is likely that an interactive analysis tool would run on a local laptop or workstation with a network link to a login node which has access to the STCI infrastructure, although the analysis tool could run directly on the login node if the user chose to do that. Especially when defining the connection between the analysis tool and the login node, issues such as endianess need to be considered. If the infrastructure supports heterogeneous systems, then those issues also need to be considered in the infrastructure communication protocols.

Unresolved Questions

  1. Is the infrastructure up and running all the time? If it is not persistent, what userid starts it? Can an arbitrary user start it, for instance by inetd? What shuts it down? Is it user/sysasmin action? Does the infrastructure use a reference count of sessions using the infrastructure drops to zero. If the infrastructure is up and running permanently, then what impact do persistent infrastructure daemons have on the running system? Daemons that wake up and poll periodically can adversely affect performance of an application by interfering with communication timing between nodes.
  2. How is application startup and connection to the infrastructure handled? The infrastructure needs to understand application task to node mapping before it can set up the infrastructure connections for data collection. Application task to node mapping may not be known before the application is invoked. This seems to imply that the analysis tool starts the application, and that the instrumentation code built into the binary knows that at the first instrumentation call the instrumentation code has to connect to the infrastructure. It also seems to imply that the instrumentation code must block at the connection attempt until the instrumentation code is notified by the infrastructure to begin execution, and that this should be a barrier operation across all application tasks. Should a requirement be imposed on all instrumentation code that the first instrumentation call be made immediately after main(), or some other point where all tasks are guaranteed to execute? What happens if an application task blocks for a long time before MPI_Init is called? Does this cause problems for any MPI runtime, such as timeouts? The use cases above present one way to accomplish this, but imposing requirements upon the instrumentation code and maybe the user. Is this approach too restrictive? Is it not generalized enough to be useful?
  3. These use cases assume that the application starts, then instrumentation code connects to the infrastructure. How does instrumentation code know how to connect to the infrastructure? Is there an infrastructure daemon running on each node? What about OS/hardware where the only process running on a compute node is the application process? Can we assume the existence of a configuration file visible on all nodes, that, among other things, specifies the host name and port where the instrumentation code may connect? If instrumentation must make an initial connection to the infrastructure to query the real connection point for that application process, does this result in an unacceptable scaling bottleneck for the infrastructure?
  4. I am assuming that the instrumentation code does not receive messages from the infrastructure for two reasons. First, this implies that the user has sufficient knowledge of where the application is in its execution to send an appropriate request to the instrumentation. Second, this implies at least one additional thread of execution in the instrumentation code to monitor the connection to the infrastructure for incoming messages. Is this reasonable?
  5. In the alternative to step 9 in the above use case, the instrumentation code sets up a port for the infrastructure to connect to, which works when there is a single application task per node. If there are multiple application tasks per node, then a different scheme is required. A pool of unique port numbers would work as long as port numbers are not a constrained resource. The infrastructure node may have additional work to do to keep track of the multiple pending connection requests.
  6. In a multi-threaded application, does blocking a single thread while a connection to the infrastructure is being established cause the application to behave improperly? Is there a way for the instrumentation code to block all other threads in the application as well?
  7. Can we assume that in the connect to running application cases, that the same userid that invoked the application invokes the analysis tool? If different userids, neither being root, are allowed to start the application and connect to the application, then it seems there are a number of security problems and exposures. Doing this potentially allows the user invoking the analysis tool to see data from the running application that they would not normally be able to see, unless the security subsystem can implement a secure method to appropriately restrict such use to a special class of users such as sysadmin. Also, will system security even allow us to elevate the infrastructure code to the appropriate privileges to access another user's data?
  8. How does instrumentation code in an application determine that it is being called for the first time in an application that is invoked under control of an analysis tool, and thus needs to connect to the infrastructure vs instrumentation code that is called while the corresponding analysis tool is not running, and where the instrumentation code may therefore be expected to operate in passive mode? Does the analysis tool maybe pass a request to the infrastructure to set a particular environment variable before invoking the application, then the instrumentation code determines the application's mode based on the presence of that environment variable?

Contributors

  • Dave Wootton
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License