Instead, keep them in the bitstream buffer until we read them verbatim,
this saves a memcpy() and a subsequent clearing of the target buffer.
decode_cabac+decode_mb for a sample file (CAPM3_Sony_D.jsv) goes from
6121.4 to 6095.5 cycles, i.e. 26 cycles faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
pull/10/head
Ronald S. Bultje12 years agocommitted byMartin Storsjö
DECLARE_ALIGNED(16,int16_t,mb)[16*48*2];///< as a dct coeffecient is int32_t in high depth, we need to reserve twice the space.
DECLARE_ALIGNED(16,int16_t,mb_luma_dc)[3][16*2];
int16_tmb_padding[256*2];///< as mb is addressed by scantable[i] and scantable is uint8_t we can either check that i is not too large or ensure that there is some unused stuff after mb