ProRes codes chroma blocks in 444 mode in different order than luma blocks,
so make both decoder and encoder read/write chroma blocks in right order.
Reported by Phil Barrett
The operations that use it require it to be promoted to a larger (natural)
type and thus perform sign extension on it.
While an optimal compiler may account for this, gcc 4.6 (for x86 Windows)
fails. Using the natural integer type provides a 2% speedup for Win64
and 1% for Win32.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Apple ProRes Format Specifications mentions target data size for every frame,
so make sure frame meets it. This also allows encoder to demand much smaller
packet sizes for output.