In some cases, 2 or 3 calls are performed to functions for unusual
widths. Instead, perform 2 calls for different widths to split the
workload.
The 8+16 and 4+8 widths for respectively 8 and more than 8 bits can't
be processed that way without modifications: some calls use unaligned
buffers, and having branches to handle this was resulting in no
micro-benchmark benefit.
For block_w == 12 (around 1% of the pixels of the sequence):
Before:
12758 decicycles in epel_uni, 4093 runs, 3 skips
19389 decicycles in qpel_uni, 8187 runs, 5 skips
22699 decicycles in epel_bi, 32743 runs, 25 skips
34736 decicycles in qpel_bi, 32733 runs, 35 skips
After:
11929 decicycles in epel_uni, 4096 runs, 0 skips
18131 decicycles in qpel_uni, 8184 runs, 8 skips
20065 decicycles in epel_bi, 32750 runs, 18 skips
31458 decicycles in qpel_bi, 32753 runs, 15 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Not having allocated it is not a good reason to leave the object
in an undetermined state. Though a particular setting like the
AV_EF_* flags could be useful to control that behaviour.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
In some cases, in particular if several blocks are needed because of
the channel layout (e.g. 2.1), the information used to write the
trailing bits terminating the sample data was not reset.
This would cause potential desync on the decoder, although decoded
samples were actually mostly fine.
Fixes ticket #3879.
MpegEncContext based decoders are only fully initialized after the first
ff_thread_get_buffer() call. The RV30/40 decoders may fail before a frame
buffer was requested. ff_mpeg_update_thread_context() fails on half
initialized MpegEncContexts. Since this can only happen before a the
first frame was decoded there is no need to call
ff_mpeg_update_thread_context().
Based on patches by John Stebbins and tested by John Stebbins.
CC: libav-stable@libav.org
It was only validating that normal data wasn't filling the buffer.
However, extra data may be written afterwards.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Currently, the encoder will try to reduce it down to 150000, but the
decoder will complain starting at 131072 (WV_MAX_SAMPLES). Therefore,
change the loop limit.
Fixes ticket #3881.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Buffers containing copies of the AAC and AC3 header bits were not padded
before parsing, violating init_get_bits() buffer padding requirement,
leading to potential buffer read overflows.
This change adds FF_INPUT_BUFFER_PADDING_SIZE bytes to the bit buffer
for parsing the header in each of aac_parser.c and ac3_parser.c.
Based on patch by: Matt Wolenetz <wolenetz@chromium.org>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Some streams were found to have what appeared to be truncated SPS.
Their syntax seem to be valid at least until the end of the VUI, so
try that syntax if the parsing would overflow the SPS in the
conforming syntax.
Fixes ticket #3872.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Reduced xmm register count to 7 (As such they are now enabled for x86_32).
* Removed four movdqa (affects the sse2 version only).
* pxor is now used to clear m0 only once.
~5% faster.
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
bytestream2_* will not cause buffer overflow, but in that case, this means
the allocation would be incorrect and the encoded result invalid. Therefore,
assert no overflow occurred.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
With huge sampling rates, the table derivation method does not converge fast
enough. While fixing it using e.g. Newton-Rhapson-like methods (the curve is
nicely convex) is possible, it is much simpler to reject these cases.
The value of 96000 was arbitrarily chosen as a realistic value, though
1000000 would still work and converge.
Fixes ticket #3868.
Suggested-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The AVSampleFormat list of sample_fmts_s16p is missing the trailing "P" for planar formats. AV_SAMPLE_FMT_S16 vs AV_SAMPLE_FMT_S16P
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
They should match but they do not always
Fixes assertion failure
no testcase with unmodified source available
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The allocation didn't account for headers, that can be easily 79 bytes.
As a result, buffers allocated for a few samples (e.g. 5 in the original
bug) could be undersized.
Fixed ticket #2881.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>