The VPS referenced by the SPS must always be present as the max value for
sps_max_sub_layers_minus1 is vps_max_sub_layers_minus1. This replaces a buggy
custom range check for the aforementioned field.
Also, add the missing conformance check for sps_temporal_id_nesting_flag while
at it.
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes: out of array access
Fixes: 69098/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MPEG2VIDEO_fuzzer-6107989688778752
Fixes: 69599/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MPEG4_fuzzer-4848626296225792.fuzz
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This simplification assumes that the code is correct
Fixes: CID1560036 Logically dead code
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Unlike the 8-bit version, we need two iterations to process this within
128-bit vectors. This adds some extra complexity for pointer arithmetic
and counting down which is unnecessary in the 8-bit variant.
Accordingly the gain relative to C are just slight better than half as
good with 128-bit vectors as with 256-bit ones.
T-Head C908 (2 iterations):
h264_idct8_add_9bpp_c: 17.5
h264_idct8_add_9bpp_rvv_i32: 10.0
h264_idct8_add_10bpp_c: 17.5
h264_idct8_add_10bpp_rvv_i32: 9.7
h264_idct8_add_12bpp_c: 17.7
h264_idct8_add_12bpp_rvv_i32: 9.7
h264_idct8_add_14bpp_c: 17.7
h264_idct8_add_14bpp_rvv_i32: 9.7
SpacemiT X60 (single iteration):
h264_idct8_add_9bpp_c: 15.2
h264_idct8_add_9bpp_rvv_i32: 5.0
h264_idct8_add_10bpp_c: 15.2
h264_idct8_add_10bpp_rvv_i32: 5.0
h264_idct8_add_12bpp_c: 14.7
h264_idct8_add_12bpp_rvv_i32: 5.0
h264_idct8_add_14bpp_c: 14.7
h264_idct8_add_14bpp_rvv_i32: 4.7
Fixes: signed integer overflow: 865309950 * 256 cannot be represented in type 'int'
Fixes: 69191/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_OSQ_fuzzer-6310214413385728
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Found by reviewing code related to CID1604365 Overflowed constant
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is more a style fix than a bugfix (CID1604392 Overflowed constant)
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Found while reviewing CID1608712 Explicit null dereferenced
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Found by code review related to CID1604563 Overflowed return value
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Found while reviewing code related to CID1604409 Overflowed return value
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Found by code review related to CID1604386 Overflowed constant
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: left shift of negative value -208
Fixes: 69073/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_PRORES_KS_fuzzer-4745020002336768
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Otherwise, slice index will never update for hwaccel decode, and slice
RPL will be always overlap into first one which use slice index to construct.
Fixes hwaccel decoding after 47d34ba7fb
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Slice address tab only been updated in software decode slice data.
Fixes hwaccel decoding after d725c737fe.
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Not a bugfix, but might fix CID1604361 Overflowed constant
Sponsored-by: Sovereign Tech Fund
Reviewed-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
There are two implementations here:
- a generic scalable one processing two columns at a time,
- a specialised processing one (fixed-size) row at a time.
Unsurprisingly, the generic one works out better with smaller widths.
With larger widths, the gains from filling vectors are outweighed by
the extra cost of strided loads and stores. In other words, memory
accesses become the bottleneck.
T-Head C908:
h264_weight2_8_c: 54.5
h264_weight2_8_rvv_i32: 13.7
h264_weight4_8_c: 101.7
h264_weight4_8_rvv_i32: 27.5
h264_weight8_8_c: 197.0
h264_weight8_8_rvv_i32: 75.5
h264_weight16_8_c: 385.0
h264_weight16_8_rvv_i32: 74.2
SpacemiT X60:
h264_weight2_8_c: 48.5
h264_weight2_8_rvv_i32: 8.2
h264_weight4_8_c: 90.7
h264_weight4_8_rvv_i32: 16.5
h264_weight8_8_c: 175.0
h264_weight8_8_rvv_i32: 37.7
h264_weight16_8_c: 342.2
h264_weight16_8_rvv_i32: 66.0
In vtenc_populate_extradata, the cleanup function vtenc_reset should not
be used when no error occurs, otherwise some color information is lost
(#11036).
This patch checks the status code and conducts the correct cleanup.
Signed-off-by: Hao Guan <hguandl@gmail.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
While this *tends* to be faster than plain C, the performance numbers
are all over the place, presuambly due to the conditional character of
the main loop.
Some additional micro-optimisations should be feasible after the
underlying h264_idct_add and h264_idct_dc_add functions are also
implemented. Then it will no longer be necesseray to stricly abide by
the C ABI.
Performance is (unfortunately) the same as with non-MBAFF, since the
hardware under test does not short-circuit vector tail calculations.
(IMO, a generic solution or work-around should be agreed on, rather
than bespoke approaches all over the place.)
T-Head C908 (cycles):
h264_h_loop_filter_luma_8bpp_c: 297.5
h264_h_loop_filter_luma_8bpp_rvv_i32: 369.2
h264_v_loop_filter_luma_8bpp_c: 862.7
h264_v_loop_filter_luma_8bpp_rvv_i32: 199.7
Performance in the horizontal scenario seems worse than scalar. x86
SSE2 and AVX optimisations are similarly affected. This is presumably
caused by unlucky inputs from checkasm, such that the C code
short-circuits almost all filter calculations.