Fixes: out of array access
Fixes: 64081/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-6151006496620544
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 2147478526 + 33924 cannot be represented in type 'int'
Fixes: shift exponent 32 is too large for 32-bit type 'unsigned int'
Fixes: 64243/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEGLS_fuzzer-5195717848989696
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Forgot to init c->sao_edge_filter[idx] when idx=0/1/2/3.
After this patch, the speedup of decoding H265 4K 30FPS
30Mbps on 3A6000 is about 7% (42fps==>45fps).
Change-Id: I521999b397fa72b931a23c165cf45f276440cdfb
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Rework the code a bit to speed up the 10-bit bitpacked decoding
routine. This is probably about as fast as I can get it without
switching to assembly language.
Demonstratable with:
./ffmpeg -f lavfi -i "smptehdbars=size=3840x2160" -c bitpacked -f image2 -frames:v 1 source.yuv
./ffmpeg -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le out.yuv
On my development system, it went from 80ms for a 2160p frame
down to 20ms (i.e. a 4X speedup). Good enough for now, I hope...
Comments from Marton:
Originally on my system better performance could be achieved by simply
switching to the cached bitstream reader, but for Devin it was slower than
his direct byte operations.
I changed the order of writing output from u/y/v/y to u/v/y/y, and that made
the code faster than the cached bitstream reader on my system as well.
TIMER measurement of the decode loop on Ryzen 5 3600 with command line:
./ffmpeg -stream_loop 256 -threads 1 -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le -f null none -loglevel error
Before: 823204127 decicycles in YUV, 256 runs, 0 skips
After: 315070524 decicycles in YUV, 256 runs, 0 skips
Signed-off-by: Devin Heitmueller <dheitmueller@ltnglobal.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
The specification doesn't mention that clusters cannot have alphabet
sizes greater than 1 << bundle->log_alphabet_size, but the reference
implementation rejects these entropy streams as invalid, so we should
too. Refusing to do so can overflow a stack variable that should be
large enough otherwise.
Fixes#10738.
Found-by: Zeng Yunxiang and Li Zeyuan
Signed-off-by: Leo Izen <leo.izen@gmail.com>
If a sequence of JXL images is encapsulated in a container that has PTS
information, we should use the PTS information from the container. At
this time there is no container that does this, but if JPEG XL support
is ever added to NUT, AVTransport, or some other container, this commit
should allow the PTS information those containers provide to work as
expected.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
These pixel formats have always been supported by libjxl, but at the
time this plugin was written, they were not in FFmpeg yet. Now that
they are in FFmpeg, we should support them.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
These pixel formats have always been supported by libjxl, but at the
time this plugin was written, they were not in FFmpeg yet. Now that
they are in FFmpeg, we should support them.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
FFmpeg doesn't support tv-range RGB throughout most of its pipeline, so
we should keep the warning. However, in case something does support it
we should at least keep it tagged properly. Additionally, the encoder
writes this tag if the space is tagged as such so this makes a round
trip work as it should.
Also, PNG doesn't support nonzero matrices but we only warn and ignore
in that case, so we have no reason to error out for illegal cICP ranges
either (i.e. greater than 1).
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Reported-by: Kacper Michajłow <kasper93@gmail.com>
Unnecessary since acf63d5350adeae551d412db699f8ca03f7e76b9;
also avoids relocations.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This frame will be freed in the next line.
Reviewed-by: Zhao Zhili <quinkblack@foxmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The current pack_output function pointer is a property of the decoder,
rather than a constant method provided by the DSP code. Indeed, except
for an unused initialisation, the field is never used in DSP code.
The penultimate loop iteration could pick any vl such that:
vlenb/4 < vl <= vlenb/2
Thus if the total length is not a multiple of vlenb/2, the vfadd.vf
on the penultimate iteration would yield corrupt values for the last
iteration.
To avoid this, force vl = vlen/2 until the last iteration. Unfortunately
this latent bug is not reproducible with either hardware or QEMU as of now.
This skips the round-trip to scalar register for the sliding 'x'
coefficients, improving performance by about 5%. The trick here is that
the vector slide-up instruction preserves elements in destination vector
until the slide offset.
The switch from vfslide1up.vf to vslideup.vi also allows the elimination
of data dependencies on consecutive slides. Since the specifications
recommend sticking to power of two offsets, we could slide as follows:
vslideup.vi v8, v0, 2
vslideup.vi v4, v0, 1
vslideup.vi v12, v8, 1
vslideup.vi v16, v8, 2
However in the device under test, this seems to make performance slightly
worse, so this is left for (in)validation with future better hardware.
This fixes the following build error:
src/libavcodec/d3d12va_decode.c:49:10: error: no previous prototype for function
'ff_d3d12va_get_surface_index' [-Werror,-Wmissing-prototypes]
49 | unsigned ff_d3d12va_get_surface_index(const AVCodecContext *avctx,
| ^
Signed-off-by: Martin Storsjö <martin@martin.st>
Same as d3d11va, this flag enables main still picture profile for
d3d12va. User should add this flag when decoding main still picture
profile.
Signed-off-by: Tong Wu <tong1.wu@intel.com>
The implementation is based on:
https://learn.microsoft.com/en-us/windows/win32/medfound/direct3d-12-video-overview
With the Direct3D 12 video decoding support, we can render or process
the decoded images by the pixel shaders or compute shaders directly
without the extra copy overhead, which is beneficial especially if you
are trying to render or post-process a 4K or 8K video.
The command below is how to enable d3d12va:
ffmpeg -hwaccel d3d12va -i input.mp4 output.mp4
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
From AOSP doc, these values are device and codec specific, but lower
values generally result in more efficient (smaller-sized) encoding.
For example, global_quality 50 on Pixel 6 results a 1080P 30 FPS
HEVC with 3744 kb/s, while global_quality 80 results 28178 kb/s.
Fix#10689
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
With the way the runtime checks are currently set up, every single
openh264 release, no matter how minor, is considered an ABI break and
requires ffmpeg recompilation. This is unnecessarily strict because it
doesn't allow downstream distributions to ship any openh264 bug fix
version updates without breaking ffmpeg's openh264 support.
Years ago, at the time when ffmpeg's openh264 support was merged,
openh264 releases were done without a versioned soname (the library was
just libopenh264.so, unversioned). Since then, starting with version
1.3.0, openh264 has started using versioned sonames and the intent has
been to bump the soname every time there's a new release with an ABI
change.
This patch drops the exact version check and instead adds a minimum
requirement on 1.3.0 to the configure script.
Signed-off-by: Kalev Lember <klember@redhat.com>
Signed-off-by: Martin Storsjö <martin@martin.st>