Instead of using a combination of bitreader and -writer for copying data,
one can byte-align the (obsolete and removed) bitreader to improve performance.
One can even use memcpy in the normal case.
This improved the time needed for writing the slicedata from 33618 to
2370 decicycles when tested on a video originating from a DVD (4194394
runs).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
before:
419022 decicycles in assemble_fragment, 2047 runs, 1 skips
after:
104621 decicycles in assemble_fragment, 2045 runs, 3 skips
Benched with a 2 minutes long 720x480 DVD mpeg2 sample.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
This saves one malloc + memcpy per packet
The CodedBitstreamFragment buffer is padded to follow the requirements
of AVPacket.
Reviewed-by: jkqxz
Signed-off-by: James Almer <jamrial@gmail.com>
This makes it easier for users of the CBS API to get alloc/free right -
all subelements use the buffer API so that it's clear how to free them.
It also allows eliding some redundant copies: the packet -> fragment copy
disappears after this change if the input packet is refcounted, and more
codec-specific cases are now possible (but not included in this patch).