FFmpeg

Commit Graph

Author	SHA1	Message	Date
Andreas Rheinhardt	0154fb43e3	avcodec/mpegvideo: Allocate encoder-only tables in mpegvideo_enc.c This commit moves the encoder-only allocations of the tables owned solely by the main encoder context to mpegvideo_enc.c. This avoids checks in mpegvideo.c for whether we are dealing with an encoder; it also improves modularity (if encoders are disabled, this code will no longer be included in the binary). And it also avoids having to reset these pointers at the beginning of ff_mpv_common_init() (in case the dst context is uninitialized, ff_mpeg_update_thread_context() simply copies the src context into dst which therefore may contain pointers not owned by it, but this does not happen for encoders at all). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	d0de80ee44	avcodec/mpeg4videodec: Keep data_partitioning in sync between threads Fixes frame-threaded decoding of samples created by concatting a video with data partitioning and a video not using it. (Only the MPEG-4 decoder sets this, so it is synced in mpeg4_update_thread_context() despite being a MpegEncContext-field.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	02ad827226	avcodec/mpegvideo_dec: Combine two loops (I think the check for !reference is unnecessary.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
James Almer	0446282320	APIchanges: fix library version in the latest entry Also add commit hash and date while at it. Signed-off-by: James Almer <jamrial@gmail.com>	2 years ago
James Almer	352799dca8	The vuya pixel format was recently added, so this lavc workaround is no longer needed. Signed-off-by: James Almer <jamrial@gmail.com>	2 years ago
Timo Rothenpieler	aca569aad2	swscale/input: add rgbaf16 input support This is by no means perfect, since at least ddagrab will return scRGB data with values outside of 0.0f to 1.0f for HDR values. Its primary purpose is to be able to work with the format at all.	2 years ago
Timo Rothenpieler	f2de911818	swscale: add opaque parameter to input functions	2 years ago
Timo Rothenpieler	ef2c2a2220	avutil/half2float: use native _Float16 if available _Float16 support was available on arm/aarch64 for a while, and with gcc 12 was enabled on x86 as long as SSE2 is supported. If the target arch supports f16c, gcc emits fairly efficient assembly, taking advantage of it. This is the case on x86-64-v3 or higher. Same goes on arm, which has native float16 support. On x86, without f16c, it emulates it in software using sse2 instructions. This has shown to perform rather poorly: _Float16 full SSE2 emulation: frame=50074 fps=848 q=-0.0 size=N/A time=00:33:22.96 bitrate=N/A speed=33.9x _Float16 f16c accelerated (Zen2, --cpu=znver2): frame=50636 fps=1965 q=-0.0 Lsize=N/A time=00:33:45.40 bitrate=N/A speed=78.6x classic half2float full software implementation: frame=49926 fps=1605 q=-0.0 Lsize=N/A time=00:33:17.00 bitrate=N/A speed=64.2x Hence an additional check was introduced, that only enables use of _Float16 on x86 if f16c is being utilized. On aarch64, a similar uplift in performance is seen: RPi4 half2float full software implementation: frame= 6088 fps=126 q=-0.0 Lsize=N/A time=00:04:03.48 bitrate=N/A speed=5.06x RPi4 _Float16: frame= 6103 fps=158 q=-0.0 Lsize=N/A time=00:04:04.08 bitrate=N/A speed=6.32x Since arm/aarch64 always natively support 16 bit floats, it can always be considered fast there. I'm not aware of any additional platforms that currently support _Float16. And if there are, they should be considered non-fast until proven fast.	2 years ago
Timo Rothenpieler	6dc79f1d04	avutil/half2float: move non-inline init code out of header	2 years ago
Timo Rothenpieler	f3fb528cd5	avutil/half2float: move tables to header-internal structs Having to put the knowledge of the size of those arrays into a multitude of places is rather smelly.	2 years ago
Timo Rothenpieler	cb8ad005bb	avutil/half2float: adjust conversion of NaN IEEE-754 differentiates two different kind of NaNs. Quiet and Signaling ones. They are differentiated by the MSB of the mantissa. For whatever reason, actual hardware conversion of half to single always sets the signaling bit to 1 if the mantissa is != 0, and to 0 if it's 0. So our code has to follow suite or fate-testing hardware float16 will be impossible.	2 years ago
Timo Rothenpieler	b42925264a	avutil: move half-precision float helper to avutil	2 years ago
Martin Storsjö	f921c58335	checkasm: sw_scale: Produce more realistic test filter coefficients for yuv2yuvX This avoids triggering overflows in the filters, and avoids stray test failures in the approximate functions on x86; due to rounding differences, one implementation might overflow while another one doesn't. Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Derek Buitenhuis	e1e981c65e	mov: Compare frag times in correct time base when seeking a stream without a corresponding sidx Some muxers, such as GPAC, create files with only one sidx, but two streams muxed into the same fragments pointed to by this sidx. Prevously, in such a case, when we seeked in such files, we fell back to, for example, using the sidx associated with the video stream, to seek the audio stream, leaving the seekhead in the wrong place. We can still do this, but we need to take care to compare timestamps in the same time base. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2 years ago
Andreas Rheinhardt	8bec225c3c	swscale/x86/yuv2yuvX: Remove unused ff_yuv2yuvX_mmx() Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	afd9da24d9	avcodec/mpegvideo_dec: Don't sync AVCodecContext fields manually They are already synced generically in update_context_from_thread() in pthread_frame.c. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	22e157c1c6	avcodec/mpegvideo_dec: Remove commented-out cruft The fields in question were removed in `759001c534`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Chema Gonzalez	59225b459f	doc: fix binary values of SI prefixes Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	3553b70d6d	avcodec/ffv1enc: Remove redundant wrapper Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	7e9a790441	avcodec/ffv1enc: Don't create and keep unnecessary reference Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	f76cef5c51	avcodec/get_buffer: Don't get AVPixFmtDescriptor unnecessarily It is unused since `3575a495f6` (and the error message is dangerous: av_get_pix_fmt_name(format) returns NULL iff av_pix_fmt_desc_get(format) returns NULL and using a NULL string for %s would be UB). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	e506843183	avcodec/mpegpicture: Reset fields explicitly instead of memsetting them Improves the grepability of the code. (Furthermore, I hope that no compiler will really call memset for 28 bytes.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	f0ea5094af	avcodec/h263dec: Don't set frame parameters redundantly This frame will be reset later in ff_mpv_frame_start() anyway. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Andreas Rheinhardt	74d623914f	avcodec/h263dec: Remove redundant code to set cur_pic_ptr It is done later in ff_mpv_frame_start() (and nobody uses current_picture_ptr between setting it in ff_mpv_frame_start()). (The reason the vsynth*-h263-obmc ref files change is because the call to ff_find_unused_picture() now happens after the older pictures have been unreferenced in ff_mpv_frame_start(), so that their slots in the picture array can be immediately reused; the obmc code is somehow buggy and changes its output depending on the earlier contents of the motion_val buffer.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Alan Kelly	da0a37bab7	checkasm/sw_scale: hscale does not requires cpuflag test. This is done in ff_shuffle_filter_coefficients. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2 years ago
Alan Kelly	a38293e444	libswscale: Enable hscale_avx2 for all input sizes. ff_shuffle_filter_coefficients shuffles the tail as required. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2 years ago
Alan Kelly	a6724285fd	sws: allow avx2 hscale to process inputs of any size. The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2 years ago
Alan Kelly	51a34e8525	sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
J. Dekker	ce2f47318b	lavc/aarch64: hevc_add_res add 12bit variants hevc_add_res_4x4_12_c: 46.0 hevc_add_res_4x4_12_neon: 18.7 hevc_add_res_8x8_12_c: 194.7 hevc_add_res_8x8_12_neon: 25.2 hevc_add_res_16x16_12_c: 716.0 hevc_add_res_16x16_12_neon: 69.7 hevc_add_res_32x32_12_c: 3820.7 hevc_add_res_32x32_12_neon: 261.0 Signed-off-by: J. Dekker <jdek@itanimul.li>	2 years ago
Martin Storsjö	48be6616d0	aarch64: me_cmp: Remove a leftover unnecessary instruction This was missed in `a2e45ad407`. Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Hubert Mazur	70efa4d011	lavc/aarch64: Add neon implementation for pix_abs8 Provide optimized implementation of pix_abs8 function for arm64. Performance comparison tests are shown below. - pix_abs_1_0_c: 101.2 - pix_abs_1_0_neon: 22.5 - sad_1_c: 101.2 - sad_1_neon: 22.5 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Hubert Mazur	74312e80d7	lavc/aarch64: Add neon implementation for sse8 Provide optimized implementation of sse8 function for arm64. Performance comparison tests are shown below. - sse_1_c: 130.7 - sse_1_neon: 29.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Hubert Mazur	a2e45ad407	lavc/aarch64: Add neon implementation for pix_abs16_y2 Provide optimized implementation of pix_abs16_y2 function for arm64. Performance comparison tests are shown below. pix_abs_0_2_c: 317.2 pix_abs_0_2_neon: 37.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Hubert Mazur	d7abb7d143	lavc/aarch64: Add neon implementation for sse4 Provide neon implementation for sse4 function. Performance comparison tests are shown below. - sse_2_c: 80.7 - sse_2_neon: 31.0 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Hubert Mazur	ad251fd262	lavc/aarch64: Add neon implementation for sse16 Provide neon implementation for sse16 function. Performance comparison tests are shown below. - sse_0_c: 268.2 - sse_0_neon: 43.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	60109d5b3d	aarch64: me_cmp: Fix the indentation of function declarations Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Gyan Doshi	d5544f6457	ffprobe: restore reporting error code for failed inputs `c11fb46731` led to a regression whereby the return code for missing input or input probe is overridden by writer close return code and hence not conveyed in the exit code.	2 years ago
Andreas Rheinhardt	444d80bd87	avcodec/me_cmp: Remove now incorrect av_assert2() Since `d69d12a5b9` these av_assert2() (or more exactly, the ones in hadamard8_diff8x8_c() and hadamard8_intra8x8_c()) are hit. So just remove all of these asserts. (If the test were improved to know which functions expect h == 8 and which support any value, the asserts could be readded at the appropriate places.) Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Martin Storsjö	1eaa575cf1	tools: Make sure to create the tools directory before building decode_simple.o This directory dependency is normally added implicitly by rules in ffbuild/common.mak; for tools it's created by a rule for TOOLOBJS. TOOLOBJS is populated implicitly from TOOLS, and decode_simple.o doesn't end up there because it's an odd occurrance of a lone object file in the tools subdirectory, not belonging to any other tool. Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	d69d12a5b9	checkasm: motion: Test different h parameters Previously, the checkasm test always passed h=8, so no other cases were tested. Out of the me_cmp functions, in practice, some functions are hardcoded to always assume a 8x8 block (ignoring the h parameter), while others do use the parameter. For those with hardcoded height, both the reference C function and the assembly implementations ignore the parameter similarly. The documentation for the functions indicate that heights between w/2 and 2*w, within the range of 4 to 16, should be supported. This patch just tests random heights in that range, without knowing what width the current function actually uses. Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	dc55e63578	x86: Don't hardcode the height to 8 in sad8_xy2_mmx The height is hardcoded in some of the me_cmp functions, but not in all of them. But in the case of all other functions, it's hardcoded in the same place in SIMD functions as in the C reference functions, while this one function differs from the behaviour of the C code. (Before `542765ce3e`, there were a couple other sad8_*_mmx functions with similar hardcoded height.) Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	21c2c57ba5	checkasm: Provide enough alignment in the new yuv2plane1 test This fixes the checkasm test in some setups on x86. Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
J. Dekker	aa9eabb7a5	lavc/aarch64: reformat add_res funcs Signed-off-by: J. Dekker <jdek@itanimul.li>	2 years ago
J. Dekker	ea6ecb12aa	checkasm/hevc_add_res: add 12bit test Also fix the bug where in every other byte only the lower 2 bits were used in the 8bit test. Signed-off-by: J. Dekker <jdek@itanimul.li>	2 years ago
Swinney, Jonathan	0d7caa5b09	swscale/aarch64: add vscale specializations This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By using specialized code with unrolling to match the filterSize we can improve performance. On AWS c7g (Graviton 3, Neoverse V1) instances: before after yuv2yuvX_2_0_512_accurate_neon: 558.8 268.9 yuv2yuvX_4_0_512_accurate_neon: 637.5 434.9 yuv2yuvX_8_0_512_accurate_neon: 1144.8 806.2 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1853.7 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Swinney, Jonathan	3e708722a2	swscale/aarch64: vscale optimization Use scalar times vector multiply accumlate instructions instead of vector times vector to remove the need for replicating load instructions which are slightly slower. On AWS c7g (Graviton 3, Neoverse V1) instances: yuv2yuvX_8_0_512_accurate_neon: 1144.8 987.4 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1869.4 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Swinney, Jonathan	4dcd191a50	checkasm: updated tests for sw_scale Change the reference to exactly match the C reference in swscale, instead of exactly matching the x86 SIMD implementations (which differs slightly). Test with and without SWS_ACCURATE_RND - if this flag isn't set, the output must match the C reference exactly, otherwise it is allowed to be off by 2. Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND is set - apparently this discrepancy hasn't been noticed in other exact tests before. Add a test for yuv2plane1. Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Timo Rothenpieler	317f5252c0	doc/APIchanges: add missing rgbaf16 pixfmt entry	2 years ago
Anton Khirnov	ab31473830	fftools/ffmpeg: store a separate copy of input codec parameters Use it instead of AVStream.codecpar in the main thread. While AVStream.codecpar is documented to only be updated when the stream is added or avformat_find_stream_info(), it is actually updated during demuxing. Accessing it from a different thread then constitutes a race. Ideally, some mechanism should eventually be provided for signalling parameter updates to the user. Then the demuxing thread could pick up the changes and propagate them to the decoder.	2 years ago
Swinney, Jonathan	75ffca7eef	libswscale/aarch64: add another hscale specialization This specialization handles the case where filtersize is 4 mod 8, e.g. 12, 20, etc. Aarch64 was previously using the c function for this case. This implementation speeds up that case significantly. hscale_8_to_15__fs_12_dstW_512_c: 6234.1 hscale_8_to_15__fs_12_dstW_512_neon: 1505.6 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago

1 2 3 4 5 ...

107897 Commits (cc5a5c986047d38b53c0f12a227b04487624e7cb) All Branches Search

107897 Commits (cc5a5c986047d38b53c0f12a227b04487624e7cb)

All Branches