FFmpeg

Commit Graph

Author	SHA1	Message	Date
Martin Storsjö	dd299a2d6d	arm: vp9: Add NEON loop filters This work is sponsored by, and copyright, Google. The implementation tries to have smart handling of cases where no pixels need the full filtering for the 8/16 width filters, skipping both calculation and writeback of the unmodified pixels in those cases. The actual effect of this is hard to test with checkasm though, since it tests the full filtering, and the benefit depends on how many filtered blocks use the shortcut. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_loop_filter_h_4_8_neon: 2.72 2.68 1.78 3.15 vp9_loop_filter_h_8_8_neon: 2.36 2.38 1.70 2.91 vp9_loop_filter_h_16_8_neon: 1.80 1.89 1.45 2.01 vp9_loop_filter_h_16_16_neon: 2.81 2.78 2.18 3.16 vp9_loop_filter_mix2_h_44_16_neon: 2.65 2.67 1.93 3.05 vp9_loop_filter_mix2_h_48_16_neon: 2.46 2.38 1.81 2.85 vp9_loop_filter_mix2_h_84_16_neon: 2.50 2.41 1.73 2.85 vp9_loop_filter_mix2_h_88_16_neon: 2.77 2.66 1.96 3.23 vp9_loop_filter_mix2_v_44_16_neon: 4.28 4.46 3.22 5.70 vp9_loop_filter_mix2_v_48_16_neon: 3.92 4.00 3.03 5.19 vp9_loop_filter_mix2_v_84_16_neon: 3.97 4.31 2.98 5.33 vp9_loop_filter_mix2_v_88_16_neon: 3.91 4.19 3.06 5.18 vp9_loop_filter_v_4_8_neon: 4.53 4.47 3.31 6.05 vp9_loop_filter_v_8_8_neon: 3.58 3.99 2.92 5.17 vp9_loop_filter_v_16_8_neon: 3.40 3.50 2.81 4.68 vp9_loop_filter_v_16_16_neon: 4.66 4.41 3.74 6.02 The speedup vs C code is around 2-6x. The numbers are quite inconclusive though, since the checkasm test runs multiple filterings on top of each other, so later rounds might end up with different codepaths (different decisions on which filter to apply, based on input pixel differences). Disabling the early-exit in the asm doesn't give a fair comparison either though, since the C code only does the necessary calcuations for each row. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 4-9x. This is pretty similar in runtime to the corresponding routines in libvpx. (This is comparing vpx_lpf_vertical_16_neon, vpx_lpf_horizontal_edge_8_neon and vpx_lpf_horizontal_edge_16_neon to vp9_loop_filter_h_16_8_neon, vp9_loop_filter_v_16_8_neon and vp9_loop_filter_v_16_16_neon - note that the naming of horizonal and vertical is flipped between the libraries.) In order to have stable, comparable numbers, the early exits in both asm versions were disabled, forcing the full filtering codepath. Cortex A7 A8 A9 A53 vp9_loop_filter_h_16_8_neon: 597.2 472.0 482.4 415.0 libvpx vpx_lpf_vertical_16_neon: 626.0 464.5 470.7 445.0 vp9_loop_filter_v_16_8_neon: 500.2 422.5 429.7 295.0 libvpx vpx_lpf_horizontal_edge_8_neon: 586.5 414.5 415.6 383.2 vp9_loop_filter_v_16_16_neon: 905.0 784.7 791.5 546.0 libvpx vpx_lpf_horizontal_edge_16_neon: 1060.2 751.7 743.5 685.2 Our version is consistently faster on on A7 and A53, marginally slower on A8, and sometimes faster, sometimes slower on A9 (marginally slower in all three tests in this particular test run). Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Diego Biurrun	f7d183f084	libxvid: Check return value of write() call libavcodec/libxvid_rc.c:106:9: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]	8 years ago
Diego Biurrun	e5e8a26dcf	libxvid: Use proper context in av_log() calls	8 years ago
Diego Biurrun	12db2832e4	libxvid: Require availability of mkstemp() The replacement code uses tempnam(), which is dangerous. Such a fringe feature is not worth the trouble.	8 years ago
Martin Storsjö	a67ae67083	arm: vp9: Add NEON itxfm routines This work is sponsored by, and copyright, Google. For the transforms up to 8x8, we can fit all the data (including temporaries) in registers and just do a straightforward transform of all the data. For 16x16, we do a transform of 4x16 pixels in 4 slices, using a temporary buffer. For 32x32, we transform 4x32 pixels at a time, in two steps of 4x16 pixels each. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_inv_adst_adst_4x4_add_neon: 3.39 5.83 4.17 4.01 vp9_inv_adst_adst_8x8_add_neon: 3.79 4.86 4.23 3.98 vp9_inv_adst_adst_16x16_add_neon: 3.33 4.36 4.11 4.16 vp9_inv_dct_dct_4x4_add_neon: 4.06 6.16 4.59 4.46 vp9_inv_dct_dct_8x8_add_neon: 4.61 6.01 4.98 4.86 vp9_inv_dct_dct_16x16_add_neon: 3.35 3.44 3.36 3.79 vp9_inv_dct_dct_32x32_add_neon: 3.89 3.50 3.79 4.42 vp9_inv_wht_wht_4x4_add_neon: 3.22 5.13 3.53 3.77 Thus, the speedup vs C code is around 3-6x. This is mostly marginally faster than the corresponding routines in libvpx on most cores, tested with their 32x32 idct (compared to vpx_idct32x32_1024_add_neon). These numbers are slightly in libvpx's favour since their version doesn't clear the input buffer like ours do (although the effect of that on the total runtime probably is negligible.) Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_add_neon: 18436.8 16874.1 14235.1 11988.9 libvpx vpx_idct32x32_1024_add_neon 20789.0 13344.3 15049.9 13030.5 Only on the Cortex A8, the libvpx function is faster. On the other cores, ours is slightly faster even though ours has got source block clearing integrated. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Mark Thompson	fd0fae6037	pthread_frame: Unreference hw_frames_ctx on per-thread codec contexts When decoding with threads enabled, the get_format callback will be called with one of the per-thread codec contexts rather than with the outer context. If a hwaccel is in use too, this will add a reference to the hardware frames context on that codec context, which will then propagate to all of the other per-thread contexts for decoding. Once the decoder finishes, however, the per-thread contexts are not freed normally, so these references leak.	8 years ago
Martin Storsjö	11623217e3	arm: vp9mc: Use a different helper register for PIC loads This fixes crashes since `557c1675cf` in linux PIC builds. Previously, movrelx silently used r12 as helper register, which doesn't work when r12 is the destination register. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	6a62795d40	aarch64: h264idct: Use the offset parameter to movrel Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	557c1675cf	arm: vp9mc: Minor adjustments from review of the aarch64 version This work is sponsored by, and copyright, Google. The speedup for the large horizontal filters is surprisingly big on A7 and A53, while there's a minor slowdown (almost within measurement noise) on A8 and A9. Cortex A7 A8 A9 A53 orig: vp9_put_8tap_smooth_64h_neon: 20270.0 14447.3 19723.9 10910.9 new: vp9_put_8tap_smooth_64h_neon: 20165.8 14466.5 19730.2 10668.8 Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	383d96aa22	aarch64: vp9: Add NEON optimizations of VP9 MC functions This work is sponsored by, and copyright, Google. These are ported from the ARM version; it is essentially a 1:1 port with no extra added features, but with some hand tuning (especially for the plain copy/avg functions). The ARM version isn't very register starved to begin with, so there's not much to be gained from having more spare registers here - we only avoid having to clobber callee-saved registers. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_neon: 27.2 23.7 vp9_avg8_neon: 56.5 54.7 vp9_avg16_neon: 169.9 167.4 vp9_avg32_neon: 585.8 585.2 vp9_avg64_neon: 2460.3 2294.7 vp9_avg_8tap_smooth_4h_neon: 132.7 125.2 vp9_avg_8tap_smooth_4hv_neon: 478.8 442.0 vp9_avg_8tap_smooth_4v_neon: 126.0 93.7 vp9_avg_8tap_smooth_8h_neon: 241.7 234.2 vp9_avg_8tap_smooth_8hv_neon: 690.9 646.5 vp9_avg_8tap_smooth_8v_neon: 245.0 205.5 vp9_avg_8tap_smooth_64h_neon: 11273.2 11280.1 vp9_avg_8tap_smooth_64hv_neon: 22980.6 22184.1 vp9_avg_8tap_smooth_64v_neon: 11549.7 10781.1 vp9_put4_neon: 18.0 17.2 vp9_put8_neon: 40.2 37.7 vp9_put16_neon: 97.4 99.5 vp9_put32_neon/armv8: 346.0 307.4 vp9_put64_neon/armv8: 1319.0 1107.5 vp9_put_8tap_smooth_4h_neon: 126.7 118.2 vp9_put_8tap_smooth_4hv_neon: 465.7 434.0 vp9_put_8tap_smooth_4v_neon: 113.0 86.5 vp9_put_8tap_smooth_8h_neon: 229.7 221.6 vp9_put_8tap_smooth_8hv_neon: 658.9 621.3 vp9_put_8tap_smooth_8v_neon: 215.0 187.5 vp9_put_8tap_smooth_64h_neon: 10636.7 10627.8 vp9_put_8tap_smooth_64hv_neon: 21076.8 21026.9 vp9_put_8tap_smooth_64v_neon: 9635.0 9632.4 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is pretty much the same as for the 32 bit case; on the A53 it's around 6-13x for ther larger 8tap filters. The exact speedup varies a little, since the C versions generally don't end up exactly as slow/fast as on 32 bit. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	a4cfcddcb0	vp9: Make the subpel filters non-static Make them aligned, to allow efficient access to them from simd. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Anton Khirnov	84f225684c	pthread_frame: properly propagate the hw frame context across frame threads	8 years ago
Diego Biurrun	72a19f4013	mpegaudiodsp: aarch64: Adjust function prototype after `2caa93b813`	8 years ago
Diego Biurrun	67deba8a41	Use avpriv_report_missing_feature() where appropriate	8 years ago
Vittorio Giovara	47a795727f	hevc: Support extradata changes from multiple stsd Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	8 years ago
Vittorio Giovara	2fe30b4743	hevc: Allow parsing external extradata buffers	8 years ago
Vittorio Giovara	5be2153111	hevc: Move hevc_decode_extradata before frame decoding Avoids a forward-declaration in the following commit. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	8 years ago
Vittorio Giovara	bed2c4b265	lavc: Add hevc main10 profile to avconv cli	8 years ago
Vittorio Giovara	17dac56b8f	lavu: Rename ycgco color space appropriately Planes are ordered as the name suggests now. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	8 years ago
Diego Biurrun	0361e4dcb4	h264_qpel: x86: Move function with only one instance out of template macro libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]	8 years ago
Andreas Cadhalpun	43de8b328b	lzf: update pointer p after realloc This fixes heap-use-after-free detected by AddressSanitizer. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	8 years ago
Anton Khirnov	4ab61cd983	qsv{enc,dec}: extend the internal frame allocator Handle the internal frame requests, which is required by the HEVC encoding plugin. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	8 years ago
Anton Khirnov	00aeedd841	qsv{dec,enc}: use a struct as a memory id with internal memory allocator This will allow implementing the allocator more fully, which is needed by the HEVC encoder plugin with video memory input. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	8 years ago
Anton Khirnov	404e51478e	qsv{dec,enc}: always use an internal mfxFrameSurface1 For encoding, this avoids modifying the input surface, which we are not allowed to do. This will also be useful in the following commits. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	8 years ago
Hendrik Leppkes	fabfbfe571	dxva2: fix surface selection when compiled with both d3d11va and dxva2 Fixes a regression introduced in `be630b1e08` Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Derek Buitenhuis	db0b3dccb3	libx265: Add option to force IDR frames This is in the same the same vein as `380146924e`. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Diego Biurrun	3cba09e522	x86: Drop stray semicolons after function definitions libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic] libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]	8 years ago
Martin Storsjö	392caa65df	arm: vp9mc: Insert a literal pool at the middle of the file This fixes errors like this when building non-pic binaries with armv6 as baseline: Error: invalid literal constant: pool needs to be closer Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Diego Biurrun	67351924fa	Drop unreachable break and return statements	8 years ago
Diego Biurrun	6354957a95	dnxhdenc: Have function pointer prototype match implementation libavcodec/dnxhdenc.c(326) : warning C4028: formal parameter 1 different from declaration libavcodec/dnxhdenc.c(329) : warning C4028: formal parameter 1 different from declaration	8 years ago
Diego Biurrun	c778eb15b8	pixblockdsp: Have function pointer prototype match implementation libavcodec/pixblockdsp.c(58) : warning C4028: formal parameter 1 different from declaration libavcodec/pixblockdsp.c(63) : warning C4028: formal parameter 1 different from declaration libavcodec/pixblockdsp.c(66) : warning C4028: formal parameter 1 different from declaration	8 years ago
Diego Biurrun	99ddeddc7f	ituh263dec: Have function signature match across declaration and definition libavcodec/ituh263dec.c(215) : warning C4028: formal parameter 1 different from declaration libavcodec/ituh263dec.c(215) : warning C4028: formal parameter 2 different from declaration	8 years ago
Diego Biurrun	13fcdfb976	svq3: Drop unused function dctcoef_get() libavcodec/svq3.c:627:29: warning: unused function 'dctcoef_get' [-Wunused-function]	8 years ago
Diego Biurrun	ee59f05408	intrax8: Have function signature match across declaration and definition libavcodec/intrax8.c(776) : warning C4028: formal parameter 1 different from declaration	8 years ago
Martin Storsjö	1a469a5e42	options_table: Remove a now unnecessary include of config.h The include of config.h was added in 2012 in `1d9c2dc8`, due to the use of CONFIG_SNOW_ENCODER ifdefs within options_table.h. When the snow codec was dropped later (in `a0c5917f8` in 2013), this include no longer served any purpose. options_table.h is included in builds for the host as well, when building documentation. config.h should not be included in code that is built for the host, since it can contain workarounds for the target compiler/environment, like adding a missing define of restrict, defining getenv(x) to NULL for environments that lack getenv. The seemingly innocent include reordering in `2025d37871` broke builds that have getenv(x) defined to NULL in config.h (Windows CE and Windows Phone/RT), since libavcodec/options_table.h include config.h, while libavformat/options_table.h end up bringing in more system headers, and those system headers can contain a proper definition of getenv, which clash with the getenv define in config.h. This was avoided earlier as long as libavformat/options_table.h (or avformat.h) was included before libavcodec/options_table.h. This fixes builds for Windows Phone/RT and CE. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	ffbd1d2b00	arm: vp9: Add NEON optimizations of VP9 MC functions This work is sponsored by, and copyright, Google. The filter coefficients are signed values, where the product of the multiplication with one individual filter coefficient doesn't overflow a 16 bit signed value (the largest filter coefficient is 127). But when the products are accumulated, the resulting sum can overflow the 16 bit signed range. Instead of accumulating in 32 bit, we accumulate the largest product (either index 3 or 4) last with a saturated addition. (The VP8 MC asm does something similar, but slightly simpler, by accumulating each half of the filter separately. In the VP9 MC filters, each half of the filter can also overflow though, so the largest component has to be handled individually.) Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_avg4_neon: 1.71 1.15 1.42 1.49 vp9_avg8_neon: 2.51 3.63 3.14 2.58 vp9_avg16_neon: 2.95 6.76 3.01 2.84 vp9_avg32_neon: 3.29 6.64 2.85 3.00 vp9_avg64_neon: 3.47 6.67 3.14 2.80 vp9_avg_8tap_smooth_4h_neon: 3.22 4.73 2.76 4.67 vp9_avg_8tap_smooth_4hv_neon: 3.67 4.76 3.28 4.71 vp9_avg_8tap_smooth_4v_neon: 5.52 7.60 4.60 6.31 vp9_avg_8tap_smooth_8h_neon: 6.22 9.04 5.12 9.32 vp9_avg_8tap_smooth_8hv_neon: 6.38 8.21 5.72 8.17 vp9_avg_8tap_smooth_8v_neon: 9.22 12.66 8.15 11.10 vp9_avg_8tap_smooth_64h_neon: 7.02 10.23 5.54 11.58 vp9_avg_8tap_smooth_64hv_neon: 6.76 9.46 5.93 9.40 vp9_avg_8tap_smooth_64v_neon: 10.76 14.13 9.46 13.37 vp9_put4_neon: 1.11 1.47 1.00 1.21 vp9_put8_neon: 1.23 2.17 1.94 1.48 vp9_put16_neon: 1.63 4.02 1.73 1.97 vp9_put32_neon: 1.56 4.92 2.00 1.96 vp9_put64_neon: 2.10 5.28 2.03 2.35 vp9_put_8tap_smooth_4h_neon: 3.11 4.35 2.63 4.35 vp9_put_8tap_smooth_4hv_neon: 3.67 4.69 3.25 4.71 vp9_put_8tap_smooth_4v_neon: 5.45 7.27 4.49 6.52 vp9_put_8tap_smooth_8h_neon: 5.97 8.18 4.81 8.56 vp9_put_8tap_smooth_8hv_neon: 6.39 7.90 5.64 8.15 vp9_put_8tap_smooth_8v_neon: 9.03 11.84 8.07 11.51 vp9_put_8tap_smooth_64h_neon: 6.78 9.48 4.88 10.89 vp9_put_8tap_smooth_64hv_neon: 6.99 8.87 5.94 9.56 vp9_put_8tap_smooth_64v_neon: 10.69 13.30 9.43 14.34 For the larger 8tap filters, the speedup vs C code is around 5-14x. This is significantly faster than libvpx's implementation of the same functions, at least when comparing the put_8tap_smooth_64 functions (compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from libvpx). Absolute runtimes from checkasm: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_64h_neon: 20150.3 14489.4 19733.6 10863.7 libvpx vpx_convolve8_horiz_neon: 52623.3 19736.4 21907.7 25027.7 vp9_put_8tap_smooth_64v_neon: 14455.0 12303.9 13746.4 9628.9 libvpx vpx_convolve8_vert_neon: 42090.0 17706.2 17659.9 16941.2 Thus, on the A9, the horizontal filter is only marginally faster than libvpx, while our version is significantly faster on the other cores, and the vertical filter is significantly faster on all cores. The difference is especially large on the A7. The libvpx implementation does the accumulation in 32 bit, which probably explains most of the differences. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	2e55e26b40	vp9: Flip the order of arguments in MC functions This makes it match the pattern already used for VP8 MC functions. This also makes the signature match ffmpeg's version of these functions, easing porting of code in both directions. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Diego Biurrun	baab87c4f3	bink: Have function pointer prototype match implementation libavcodec/binkdsp.c(156) : warning C4028: formal parameter 1 different from declaration	8 years ago
Diego Biurrun	4cf2ffb7c4	idct: Have function pointer prototype match implementation libavcodec/idctdsp.c(175) : warning C4028: formal parameter 2 different from declaration	8 years ago
Diego Biurrun	39cea6570c	aactab: Move extern keyword to the front of array declarations libavcodec/aactab.h:49:1: warning: ‘extern’ is not at beginning of declaration [-Wold-style-declaration]	8 years ago
Luca Barbato	801ac7156d	qsv: Be informative when reporting that no data has been consumed	8 years ago
Diego Biurrun	30015305f3	Use avpriv_request_sample() where appropriate	8 years ago
Diego Biurrun	3ec6f855d0	srt: Adjust signedness of sscanf format strings Fixes several warnings from -Wformat.	8 years ago
Diego Biurrun	7a2b2b6a92	dxtory: Drop nonsense ISO C printf conversion specifiers for standard types	8 years ago
Diego Biurrun	c454dfcff9	Use ISO C printf conversion specifiers where appropriate	8 years ago
Diego Biurrun	fbe425c8d2	hap: Adjust printf length modifiers to match variable types libavcodec/hapenc.c:121:20: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t {aka unsigned int}’ [-Wformat=] libavcodec/hapenc.c:121:20: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘size_t {aka unsigned int}’ [-Wformat=]	8 years ago
Diego Biurrun	1263b2039e	Adjust printf conversion specifiers to match variable signedness	8 years ago
Diego Biurrun	47756f51fe	dnxhdenc: Drop pointless, commented-out debug output	8 years ago
Diego Biurrun	0574780d7a	h264_loopfilter: Do not print value of uninitialized variable libavcodec/h264_loopfilter.c:531:111: warning: variable 'edge' is uninitialized when used here [-Wuninitialized]	8 years ago
Diego Biurrun	2555269985	mpegaudio: Do not print value of uninitialized variable libavcodec/mpegaudiodec_template.c:885:97: warning: variable 'x' is uninitialized when used here [-Wuninitialized]	8 years ago

1 2 3 4 5 ...

21218 Commits (c5e01d91702b082ac1b5c2101f1d84dd5017e4ad)