FFmpeg

Commit Graph

Author	SHA1	Message	Date
Martin Storsjö	7e42d5f0ab	aarch64: vp8: Optimize vp8_idct_add_neon for aarch64 The previous version was a pretty exact translation of the arm version. This version does do some unnecessary arithemetic (it does more operations on vectors that are only half filled; it does 4 uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead of packing data together (which could be done for free in the arm version). This gives a decent speedup on Cortex A53, a minor speedup on A72 and a very minor slowdown on Cortex A73. Before: Cortex A53 A72 A73 vp8_idct_add_neon: 79.7 67.5 65.0 After: vp8_idct_add_neon: 67.7 64.8 66.7 Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	49f9c4272c	aarch64: vp8: Skip saturating in shrn in ff_vp8_idct_add_neon The original arm version didn't do saturation here. This probably doesn't make any difference for performance, but reduces the differences. Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	37394ef01b	aarch64: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2 This makes it similar to put_epel16_v6, and gives a large speedup on Cortex A53, a minor speedup on A72 and a very minor slowdown on A73. Before: Cortex A53 A72 A73 vp8_put_epel16_h6v6_neon: 2211.4 1586.5 1431.7 After: vp8_put_epel16_h6v6_neon: 1736.9 1522.0 1448.1 Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	cef914e083	arm: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2 This makes it similar to put_epel16_v6, and gives a 10-25% speedup of this function. Before: Cortex A7 A8 A9 A53 A72 vp8_put_epel16_h6v6_neon: 3058.0 2218.5 2459.8 2183.0 1572.2 After: vp8_put_epel16_h6v6_neon: 2670.8 1934.2 2244.4 1729.4 1503.9 Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	e39a9212ab	aarch64: vp8: Port bilin functions from arm version Cortex A53 A72 A73 vp8_put_bilin4_h_c: 303.8 102.2 161.8 vp8_put_bilin4_h_neon: 100.0 40.9 41.2 vp8_put_bilin4_hv_c: 322.8 201.0 305.9 vp8_put_bilin4_hv_neon: 156.8 72.6 77.0 vp8_put_bilin4_v_c: 304.7 101.7 166.5 vp8_put_bilin4_v_neon: 82.7 41.2 33.0 vp8_put_bilin8_h_c: 1192.7 352.5 623.8 vp8_put_bilin8_h_neon: 213.5 70.2 87.8 vp8_put_bilin8_hv_c: 1098.6 769.2 1041.9 vp8_put_bilin8_hv_neon: 324.0 123.5 146.0 vp8_put_bilin8_v_c: 1193.9 350.4 617.7 vp8_put_bilin8_v_neon: 183.9 60.7 64.7 vp8_put_bilin16_h_c: 2353.1 671.2 1223.3 vp8_put_bilin16_h_neon: 261.9 140.7 145.0 vp8_put_bilin16_hv_c: 2453.2 1470.9 2355.2 vp8_put_bilin16_hv_neon: 383.9 196.0 217.0 vp8_put_bilin16_v_c: 2349.3 669.8 1251.2 vp8_put_bilin16_v_neon: 202.9 110.7 96.2 Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	58d1549227	aarch64: vp8: Port epel4 functions from arm version Cortex A53 A72 A73 vp8_put_epel4_h4_c: 631.4 291.7 367.8 vp8_put_epel4_h4_neon: 241.0 131.0 155.7 vp8_put_epel4_h4v4_c: 967.5 529.3 667.7 vp8_put_epel4_h4v4_neon: 429.3 241.8 279.7 vp8_put_epel4_h4v6_c: 1374.7 657.5 864.5 vp8_put_epel4_h4v6_neon: 515.5 295.5 334.7 vp8_put_epel4_h6_c: 851.0 421.0 486.0 vp8_put_epel4_h6_neon: 321.5 195.0 217.7 vp8_put_epel4_h6v4_c: 1111.3 621.1 781.2 vp8_put_epel4_h6v4_neon: 539.2 328.0 365.3 vp8_put_epel4_h6v6_c: 1561.3 763.3 999.7 vp8_put_epel4_h6v6_neon: 645.5 401.0 434.7 vp8_put_epel4_v4_c: 663.8 298.3 357.0 vp8_put_epel4_v4_neon: 116.0 81.5 72.5 vp8_put_epel4_v6_c: 870.5 437.0 507.4 vp8_put_epel4_v6_neon: 147.7 108.8 92.0 Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	cc7ba00c35	aarch64: vp8: Port missing epel8 functions from arm version Cortex A53 A72 A73 vp8_put_epel8_h4_c: 2594.8 1159.6 1374.8 vp8_put_epel8_h4_neon: 506.4 244.2 314.0 vp8_put_epel8_h6_c: 3445.8 1677.1 1811.3 vp8_put_epel8_h6_neon: 634.4 371.7 433.0 vp8_put_epel8_v4_c: 2614.0 1174.8 1378.0 vp8_put_epel8_v4_neon: 321.0 221.7 235.8 vp8_put_epel8_v6_c: 3635.5 1703.0 2079.2 vp8_put_epel8_v6_neon: 416.9 317.0 295.5 Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	52c9b0a6c0	aarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version Cortex A53 A72 A73 vp8_luma_dc_wht_c: 115.7 75.7 90.7 vp8_luma_dc_wht_neon: 60.7 41.2 45.7 vp8_idct_dc_add4uv_c: 376.1 262.9 282.5 vp8_idct_dc_add4uv_neon: 52.0 29.0 37.0 Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	c513fcd7d2	aarch64: vp8: Fix a typo in a comment Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	f1011ea28a	aarch64: vp8: Reorder the function pointer inits to match the arm original Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	b4b27dce95	aarch64: vp8: Move the vp8dsp makefile entries to the right places Even if NEON would be disabled, the init functions should be built as they are called as long as ARCH_AARCH64 is set. These functions are part of a generic DSP subsytem, not tied directly to one decoder. (They should be built if the vp7 decoder is enabled, even if the vp8 decoder is disabled.) Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	ad32f7b126	aarch64: vp8: Remove superfluous includes This fixes building with MSVC, which lacks unistd.h. Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	85bfaa4949	aarch64: vp8: Use the proper aarch64 form for conditional branches The previous form also does seem to assemble on current tools, but I think it might fail on some older aarch64 tools. Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	2eeac79936	aarch64: vp8: Fix assembling with armasm64 Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	26d7af4c38	aarch64: vp8: Fix assembling with clang This also partially fixes assembling with MS armasm64 (via gas-preprocessor). The movrel macro invocations need to pass the offset via a separate parameter. Mach-o and COFF relocations don't allow a negative offset to a symbol, which is handled properly if the offset is passed via the parameter. If no offset parameter is given, the macro evaluates to something like "adrp x17, subpel_filters-16+(0)", which older clang versions also fail to parse (the older clang versions only support one single offset term, although it can be a parenthesis. Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Magnus Röös	0801853e64	libavcodec: vp8 neon optimizations for aarch64 Partial port of the ARM Neon for aarch64. Benchmarks from fate: benchmarking with Linux Perf Monitoring API nop: 58.6 checkasm: using random seed 1760970128 NEON: - vp8dsp.idct [OK] - vp8dsp.mc [OK] - vp8dsp.loopfilter [OK] checkasm: all 21 tests passed vp8_idct_add_c: 201.6 vp8_idct_add_neon: 83.1 vp8_idct_dc_add_c: 107.6 vp8_idct_dc_add_neon: 33.8 vp8_idct_dc_add4y_c: 426.4 vp8_idct_dc_add4y_neon: 59.4 vp8_loop_filter8uv_h_c: 688.1 vp8_loop_filter8uv_h_neon: 216.3 vp8_loop_filter8uv_inner_h_c: 649.3 vp8_loop_filter8uv_inner_h_neon: 195.3 vp8_loop_filter8uv_inner_v_c: 544.8 vp8_loop_filter8uv_inner_v_neon: 131.3 vp8_loop_filter8uv_v_c: 706.1 vp8_loop_filter8uv_v_neon: 141.1 vp8_loop_filter16y_h_c: 668.8 vp8_loop_filter16y_h_neon: 242.8 vp8_loop_filter16y_inner_h_c: 647.3 vp8_loop_filter16y_inner_h_neon: 224.6 vp8_loop_filter16y_inner_v_c: 647.8 vp8_loop_filter16y_inner_v_neon: 128.8 vp8_loop_filter16y_v_c: 721.8 vp8_loop_filter16y_v_neon: 154.3 vp8_loop_filter_simple_h_c: 387.8 vp8_loop_filter_simple_h_neon: 187.6 vp8_loop_filter_simple_v_c: 384.1 vp8_loop_filter_simple_v_neon: 78.6 vp8_put_epel8_h4v4_c: 3971.1 vp8_put_epel8_h4v4_neon: 855.1 vp8_put_epel8_h4v6_c: 5060.1 vp8_put_epel8_h4v6_neon: 989.6 vp8_put_epel8_h6v4_c: 4320.8 vp8_put_epel8_h6v4_neon: 1007.3 vp8_put_epel8_h6v6_c: 5449.3 vp8_put_epel8_h6v6_neon: 1158.1 vp8_put_epel16_h6_c: 6683.8 vp8_put_epel16_h6_neon: 831.8 vp8_put_epel16_h6v6_c: 11110.8 vp8_put_epel16_h6v6_neon: 2214.8 vp8_put_epel16_v6_c: 7024.8 vp8_put_epel16_v6_neon: 799.6 vp8_put_pixels8_c: 112.8 vp8_put_pixels8_neon: 78.1 vp8_put_pixels16_c: 131.3 vp8_put_pixels16_neon: 129.8 This contains a fix to include guards by Carl Eugen Hoyos. Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Luca Barbato	899ee03088	Unbreak travis on macos	6 years ago
Diego Biurrun	f8df5e2f31	tests: Add a convenience function for video-only lavf tests Rename a test in the process for consistency and simplicity and remove the remnants of the now-unused lavf regression test scripts.	6 years ago
Diego Biurrun	618d02c1fa	tests: Convert lavf container tests to non-legacy test scripts Rename some tests in the process for consistency and simplicity.	6 years ago
Diego Biurrun	896fe15dbb	tests: Convert lavf pixfmt conversion tests to non-legacy test scripts Also split monolithic lavf-pixfmt test into individual tests.	6 years ago
Diego Biurrun	a957e9379d	tests: Convert lavf image tests to non-legacy test scripts Rename some tests in the process for consistency and simplicity.	6 years ago
Diego Biurrun	eb8a811599	tests: Convert audio-only lavf tests to non-legacy test scripts Rename some tests in the process for consistency and simplicity.	6 years ago
Diego Biurrun	a70eac7a9b	tests: Convert image2pipe tests to non-legacy test scripts	6 years ago
Diego Biurrun	5846b496f0	tests: Use a predefined function for lavf-rm test	6 years ago
Diego Biurrun	dad5fd59f3	tests: Enable CRC test for yuv4mpeg	6 years ago
Diego Biurrun	8629149816	tests: Drop duplicate variable declaration	6 years ago
Diego Biurrun	e22ffb3805	tests: Unify output directory creation	6 years ago
Diego Biurrun	7e5bde93a1	build: Rename OBJDIRS variable to OUTDIRS These directories are not just for object files.	6 years ago
Sven Dueking	90b15f60bf	srt: Set srto_sender flag to sender srt socket SRT API Documentation: This flag is superfluous if both parties are at least version 1.3.0 (this shall be enforced by setting this value to SRTO_MINVERSION if you expect that it be true) and therefore support HSv5 handshake, where the SRT extended handshake is done with the overall handshake process. This flag is however obligatory if at least one party may be using SRT below version 1.3.0 and does not support HSv5.	6 years ago
Janne Grunau	156ea66c91	h264/x86: sign extend int stride in deblock functions Fixes checkasm errors after adding the h264 deblock tests.	6 years ago
Martin Storsjö	eec93e5709	libopenh264dec: Use a newer decoding entry point function The "new" entry point actually has existed since OpenH264 1.4 in 2015 and is the the recommended decoding entry point. The name of this function, DecodeFrameNoDelay, is rather backwards considering that it doesn't return the latest decoded frame immediately, but actually does proper delaying and reordering of frames. Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Janne Grunau	28a8b5413b	h264/aarch64: add intra loop filter neon asm Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported (x264 uses nv12 chroma) and optimized. Cycle count for checkasm --bench on a Snapdragon 820e: h264_h_loop_filter_luma_intra_8bpp_c: 60.0 h264_h_loop_filter_luma_intra_8bpp_neon: 54.2 h264_v_loop_filter_luma_intra_8bpp_c: 148.3 h264_v_loop_filter_luma_intra_8bpp_neon: 73.8 h264_h_loop_filter_chroma_intra_8bpp_c: 27.8 h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4 h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8 h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7 h264_v_loop_filter_chroma_intra_8bpp_c: 45.8 h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3	6 years ago
Janne Grunau	846c3d6aca	h264/aarch64: optimize neon loop filter Exit as soon as possible if no filtering will be done. Improves the checkasm --bench cycle count on a Snapdragon 820e: h264_h_loop_filter_luma_8bpp_c: 72.4 -> 72.5 h264_h_loop_filter_luma_8bpp_neon: 97.1 -> 56.3 h264_v_loop_filter_luma_8bpp_c: 174.0 -> 173.5 h264_v_loop_filter_luma_8bpp_neon: 62.9 -> 60.9 h264_h_loop_filter_chroma_8bpp_c: 30.2 -> 30.3 h264_h_loop_filter_chroma_8bpp_neon: 51.6 -> 25.7 h264_v_loop_filter_chroma_8bpp_c: 57.3 -> 57.3 h264_v_loop_filter_chroma_8bpp_neon: 28.0 -> 24.0	6 years ago
Janne Grunau	d7f4f5c4a1	checkasm/h264: add loop filter tests	6 years ago
Janne Grunau	bb515e3a73	h264/aarch64: sign extend int stride in loop filter asm	6 years ago
Martin Storsjö	41cf3e3b1c	arm: Create proper .rdata sections for COFF As .rodata isn't one of the default created sections for COFF, it was created as a read-write data section. By using the default .rdata section name for COFF, it automatically becomes a read-only data section. The existing ".section .rodata" works as intended for ELF though. This is based on an original patch and diagnose by Tom Tan <Tom.Tan@microsoft.com>. Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
James Almer	ca44fa5d7f	avcodec/libdav1d: properly free all output picture references Dav1dPictures contain more than one buffer reference, so we're forced to use the API properly to free them all. Signed-off-by: James Almer <jamrial@gmail.com>	6 years ago
Luca Barbato	90adbf4abf	cook: Use the correct table for 6-bit stereo coupling Thanks to Kostya for digging it out and telling me.	6 years ago
James Almer	70ab2778be	libdav1d: update API usage to the first stable release The color fields were moved to another struct, and a way to propagate timestamps and other input metadata was introduced, so the packet fifo can be removed. Add support for 12bit streams, an option to disable film grain, and read the profile from the sequence header referenced by the ouput picture instead of guessing based on output pix_fmt. Signed-off-by: James Almer <jamrial@gmail.com>	6 years ago
James Almer	56f50183f3	libdav1d: fix build after a recent API break Signed-off-by: James Almer <jamrial@gmail.com>	6 years ago
Linjie Fu	e716323fa8	qsvenc: Add VDENC support for H264 and HEVC Add VDENC(lowpower mode) support for QSV h264 and HEVC It's an experimental function(like lowpower in vaapi) with some limitations: - CBR/VBR require HuC which should be explicitly loaded via i915 module parameter(i915.enable_guc=2 for linux kerner version >= 4.16) - HEVC VDENC was supported >= ICE LAKE use option "-low_power 1" to enable VDENC. Signed-off-by: Linjie Fu <linjie.fu@intel.com>	6 years ago
James Almer	9bf9358b61	avcodec: libdav1d AV1 decoder wrapper. Originally written by Ronald S. Bultje, with fixes, optimizations and improvements by James Almer. Signed-off-by: James Almer <jamrial@gmail.com>	6 years ago
Carl Eugen Hoyos	f149a4a5fc	swscale: Add GRAY10 Based on `ab839054` by Luca Barbato. Signed-off-by: James Almer <jamrial@gmail.com>	6 years ago
Carl Eugen Hoyos	ee3f62a90c	pixfmt: Add GRAY10 Based on `7471352f` by Luca Barbato. Signed-off-by: James Almer <jamrial@gmail.com>	6 years ago
Martin Storsjö	80f85a95da	libx264: Pass the reordered_opaque field through the encoder libx264 does have a field for opaque data to pass along with frames through the encoder, but it is a pointer, while the libavcodec reordered_opaque field is an int64_t. Therefore, allocate an array within the libx264 wrapper, where reordered_opaque values in flight are stored, and pass a pointer to this array to libx264. Update the public libavcodec documentation for the AVCodecContext field to explain this usage, and add a codec capability that allows detecting whether an encoder handles this field. Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Martin Storsjö	a3a501df24	libavutil: Undeprecate the AVFrame reordered_opaque field This was marked as deprecated (but only in the doxygen, not with an actual deprecation attribute) in `81c623fae0` in 2011, but was undeprecated in `ad1ee5fa7`. Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
James Almer	8d80046a0f	libaom: remove references to yuva444p pixfmt Support for it was apparently never in the codebase, and the enum value was recently removed from the public headers [1] [1] https://aomedia.googlesource.com/aom/+/f1570f0c2f70832dd170285f8de60bd2379c8efa Signed-off-by: James Almer <jamrial@gmail.com>	6 years ago
James Almer	cacb62f9cb	Revert "decode: copy the output parameters from the last bsf in the chain back to the AVCodecContext" This reverts commit `662558f985`. The avcodec_parameters_to_context() call was freeing and reallocating AVCodecContext->extradata, essentially taking ownership of it, which according to the doxy is user owned. This is an API break and has produces crashes in some library users like Firefox. Revert until a better solution is found to internally propagate the filtered extradata back into the decoder context. Signed-off-by: James Almer <jamrial@gmail.com>	6 years ago
Zhong Li	1ff6cb2ca6	lavc/qsvenc_jpeg: set a default quality Keep alignment with vaapi mjpeg encoder. Signed-off-by: Zhong Li <zhong.li@intel.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	6 years ago
Zhong Li	4c5e77e0bf	lavc/qsvenc_jpeg: add async_depth support Currently qsv (m)jpeg encoding is broken. Regression introducing by the commit(id: c1bcd3): fix async support, which requires the minimum async_depth to be 1, instead previous zero. But the default async_depth of qsv (m)jpeg encoding is still initialized (mostly) as zero. This patch also abviously improves qsv (m)jpeg encoding performance due to the default async_depth is changed to 4. Signed-off-by: Zhong Li <zhong.li@intel.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	6 years ago

1 2 3 4 5 ...

45224 Commits (7e42d5f0ab2aeac811fd01e122627c9198b13f01) All Branches Search

45224 Commits (7e42d5f0ab2aeac811fd01e122627c9198b13f01)

All Branches