The max height is currently documented as 16; the max difference per
pixel is 255, and a .8h element can easily contain 16*255, thus keep
accumulating in two .8h vectors, and just do the final accumulationat the
end. This should work for heights up to 256.
This requires a minor register renumbering in ff_pix_abs16_xy2_neon.
Before: Cortex A53 A72 A73 Graviton 3
pix_abs_0_0_neon: 97.7 47.0 37.5 22.7
pix_abs_0_1_neon: 154.0 59.0 52.0 25.0
pix_abs_0_3_neon: 179.7 96.7 87.5 41.2
After:
pix_abs_0_0_neon: 96.0 39.2 31.2 22.0
pix_abs_0_1_neon: 150.7 59.7 46.2 23.7
pix_abs_0_3_neon: 175.7 83.7 81.7 38.2
Signed-off-by: Martin Storsjö <martin@martin.st>
Using absolute-difference-accumulate does use twice the amount of
absolute-difference instructions, but avoids the need for the
uaddl and add instructions, reducing the total number of instructions
by 3.
These can be interleaved in the rest of the calculation, to avoid
tight dependencies at the end. Unfortunately, this is marginally
slower on Cortex A53, but faster on A72 and A73.
Before: Cortex A53 A72 A73 Graviton 3
pix_abs_0_3_neon: 175.7 109.2 92.0 41.2
After:
pix_abs_0_3_neon: 179.7 96.7 87.5 41.2
Signed-off-by: Martin Storsjö <martin@martin.st>
Fixes: out of array access
Fixes: 48799/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_LAGARITH_fuzzer-4764457825337344
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is a per-file input option that adjusts an input's timestamps
with reference to another input, so that emitted packet timestamps
account for the difference between the start times of the two inputs.
Typical use case is to sync two or more live inputs such as from capture
devices. Both the target and reference input source timestamps should be
based on the same clock source.
If either input lacks starting timestamps, then no sync adjustment is made.
Provide neon implementation for pix_abs16_x2 function.
Performance tests of implementation are below.
- pix_abs_0_1_c: 283.5
- pix_abs_0_1_neon: 39.0
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Fixes: signed integer overflow: 2147483645 + 16 cannot be represented in type 'int'
Fixes: 46993/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_FIXED_fuzzer-4759025234870272
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This function needs more cleanup and it lacks error handling
Fixes: use of uninitialized memory
Fixes: CID700776
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array read
Fixes: 47875/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-5719393113341952
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array access
Fixes: 48271/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TIFF_fuzzer-6149705769287680
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array access
Fixes: 47936/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MPEG4_fuzzer-5745039940124672
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This limit is possibly not reachable due to other restrictions on buffers but
the decoder run table is too small beyond this, so explicitly check for it.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 315680096256 * 134215943 cannot be represented in type 'long long'
Fixes: 48713/clusterfuzz-testcase-minimized-ffmpeg_dem_IFF_fuzzer-5886272312311808
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The decoder only outputs pixels for width >1 images, fail early
Fixes: Timeout
Fixes: 48298/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WNV1_fuzzer-6198626319204352
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Data does not have to be decrypted in 16-byte blocks for AES-CTR mode, so
existing buggy code can be hugely simplified.
Fixes ticket #9829.
Signed-off-by: Marton Balint <cus@passwd.hu>
This resulted in the wrong column/row being chosen.
This can be seen best when using xfade on streams with transparency.
For example: in case of a slideleft transition, the first column from
the first input will overwrite the first column of the second stream
throught the transition.
GSoC'22
libavfilter/vf_chromakey_cuda.cu:the CUDA kernel for the filter
libavfilter/vf_chromakey_cuda.c: the C side that calls the kernel and gets user input
libavfilter/allfilters.c: added the filter to it
libavfilter/Makefile: added the filter to it
cuda/cuda_runtime.h: added two math CUDA functions that are used in the filter
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
The earlier code ignored the lower 16 bits and instead used
the highest 8 bits twice.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>