Originally, the decoder had a single execution pool, with one
execution context per thread. Execution pools were always intended
to be thread-safe, as long as there were enough execution contexts
in the pool to satisfy all threads.
Due to synchronization issues, the threading part was removed at some
point, and, for decoding, each thread had its own execution pool.
Having a single execution pool per context is hacky, not to mention
wasteful.
Most importantly, we *cannot* associate single shaders across multiple
execution pools for a single application. This means that we cannot
use shaders to either apply film grain, or use this framework for
software-defined decoders.
The recent commits added threading capabilities back to the execution
pool, and the number of contexts in each pool was increased. This was
done with the assumption that the execution pool was singular, which
it was not. This led to increased parallelism and number of frames
in flight, which is taxing on memory.
This commit finally restores proper threading behaviour.
The validation layer has isses that are reported and addressed in the
earlier commit.