![]() |
|
The work library is intended to simplify the use of multithreading in the context of our software ecosystem.
This library is intended as a thin abstraction layer on top of a multithreading subsystem. The abstraction serves two purposes:
Because of the way multithreading subsystems work and because of the way they need to interact with each other in managing system resources, it is not generally practical for each client to use whatever threading system they like (e.g., TBB for one client, OpenMP for another).
The library defaults to maximum concurrency, i.e. it will attempt to use as many threads as available on the system. The default concurrency limit is established at static initialization time. If granular thread limits is enabled on the libwork backend (as described in Providing an Alternate Work Implementation) then the PXR_WORK_THREAD_LIMIT environment variable can be set to further limit concurrency, such as for example in a farm environment. PXR_WORK_THREAD_LIMIT must be set to an integer N, and it is up to your implementation to interpret this value given its thread limiting granularity. In the default TBB-based backend, setting PXR_WORK_THREAD_LIMIT to N denotes one of the following:
If granular thread limits is not enabled then the PXR_WORK_THREAD_LIMIT environment variable can be set to allow maximum concurrency or run the program serially. PXR_WORK_THREAD_LIMIT must be set to an integer N, denoting one of the following:
The concurrency limit can be set programmatically, using for example:
or
It is preferable to use WorkSetMaximumConcurrencyLimit() when the desire to use the hardware to its fullest rather than specify the maximum concurrency limit manually.
Once you've initialized the library, you can now harness the awesome power of your multi-core machine. Here's a simple example of a Parallel For.
You can avoid the std::bind and provide your own functor object as well.
You can provide your own work backend that uses your preferred dispatching system, instead of TBB's task/task_group API, by building your own library that implements the APIs described in the following sections:
Note: In each of the subsections below we list the required API and also outline specific behaviors that the implementations must follow. For general requirements please refer to the API docs under pxr/base/work. For more information on building and linking an alternate work backend to USD please refer to BUILDING.md.
Note that the work abstraction contains a serial implementation of these functions so you only need to provide a concurrent implementation. We do not require that your work backend must include TBB to interoperate with TBB range types, and so we have provided a WorkParallelForTBBRange implementation built on the WorkDispatcher. If you do wish to supply your own WorkImpl_ParallelForTBBRange then you must also define WORK_IMPL_HAS_PARALLEL_FOR_TBB_RANGE in your implementation.
If the implementation can support granular thread limits (limiting the concurrency to any value other than 1 and maximum concurrency), you must set WorkImpl_SupportsGranularThreadLimits accordingly. If granular thread limits are supported, it is up to you to define what "granular" entails.
testWorkThreadLimits only checks if the implementation supports some level of granularity i.e. checking if the thread limit has been set to some value less than or equal to whichever is greater: the physical concurrency limit given by WorkImpl_GetPhysicalConcurrencyLimit or the requested thread limit. You are responsible for further testing the granularity of your implementation's thread limiting.
In the non granular case, the implementation should respect the behavior outlined in Initializing and Limiting Multithreading.
You can implement WorkImpl_InitializeThreading if any thread limiting object needs to be eagerly initialized if PXR_WORK_THREAD_LIMIT is set.
When running a detached task, you must ensure that the program does not end while a detached task could still be running. An example of how this could be done is in /pxr/extras/usd/examples/workTaskflowExample/detachedTask.cpp.
When executing with scoped parallelism, the callable fn must be executed in the same calling thread as WorkImpl_WithScopedParallelism.
On top of the documented requirements for WorkDispatcher, you must ensure that WorkImpl_Dispatcher can execute serially and that Run(Callable &&c) will still return immediately when single threaded. It is common for tasks to spawn more tasks for the same dispatcher, so if the implemention executes the callable in place, then the program runs a risk of causing a stack overflow due to the now recursive nature of these nested calls.
Currently OpenUSD and especially OpenExec take advantage of the low level control over work stealing and scheduling that TBB provides to optimize our code; however not all work dispatching systems may provide that same functionality. If an alternate backend is not able to implement the WorkImpl_IsolatingDispatcher, the abstraction will default to the WorkImpl_Dispatcher, and if so it is possible that the performance of OpenUSD and OpenExec may suffer.