FFmpeg

Commit Graph

Author	SHA1	Message	Date
Michael Niedermayer	e477f09d0b	avcodec/pngdec: Check trns more completely Fixes out of array access Fixes: 546/clusterfuzz-testcase-4809433909559296 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Rostislav Pehlivanov	084f3addda	opus_rc: rename total_bits_used to total_bits and #define some constants Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	8 years ago
Michael Niedermayer	b1e2192007	avcodec/interplayvideo: Move parameter change check up Fixes out of array read Fixes: 544/clusterfuzz-testcase-5936536407244800.f8bd9b24_8ba77916_70c2c7be_3df6a2ea_96cd9f14 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Michael Niedermayer	831274fba4	avcodec/flacdsp: Avoid undefined operations in non debug builds This fixes ubsan warnings in non debug builds by using unsigned operations in debug builds the correct signed operations are retained so that overflows (which should not occur in valid files and may indicate problems in the DSP code or decoder) can be detected. Alternatively they can be changed to unsigned unconditionally, then its not possible though to detect overflows easily if someone wants to test the DSP code for overflows. The 2nd alternative would be to leave the code as it is and accept that there are undefined operations in the DSP code and that ubsan output is full of them in some cases. Similar changes would be needed in some other DSP routines Suggested-by: Matt Wolenetz <wolenetz@google.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Michael Niedermayer	fd00203554	avcodec/flacdec: Check for invalid vlcs Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Paul B Mahol	c331be21c4	avcodec/ivi: use init_get_bits8() Signed-off-by: Paul B Mahol <onemda@gmail.com>	8 years ago
Paul B Mahol	2b707018bc	avcodec/metasound: use init_get_bits8() Signed-off-by: Paul B Mahol <onemda@gmail.com>	8 years ago
Paul B Mahol	7ecdc03ea3	avcodec/xsubdec: use init_get_bits8() Signed-off-by: Paul B Mahol <onemda@gmail.com>	8 years ago
Paul B Mahol	f09afb4a90	avcodec/mpc7: use init_get_bits8() Signed-off-by: Paul B Mahol <onemda@gmail.com>	8 years ago
Paul B Mahol	6a6e20bfdf	avcodec/mpc7: return meaningful error values Signed-off-by: Paul B Mahol <onemda@gmail.com>	8 years ago
Paul B Mahol	207fa224a0	avcodec/mpc8: use init_get_bits8() Signed-off-by: Paul B Mahol <onemda@gmail.com>	8 years ago
Michael Niedermayer	39afd0482f	avcodec/ituh263dec: Implement U263s interpretation of H.263 B frames Fixes Ticket1536 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Michael Niedermayer	ad7a3f5294	avcodec/utils: Fix memleak with subtitles and sidedata Fixes: 454/fuzz-3-ffmpeg_SUBTITLE_AV_CODEC_ID_MOV_TEXT_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
James Almer	c8467abbad	x86/rv34dsp: add ff_rv34_idct_dc_add_sse2 Also disable ff_rv34_idct_dc_add_mmx on x86_64 as the presence of sse2 is guaranteed in such builds. Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	ab5c4d006d	x86/vp8dsp: add ff_vp8_idct_dc_add_sse2 Also disable ff_vp8_idct_dc_add_mmx on x86_64 as the presence of sse2 is guaranteed in such builds. Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
addr-see-the-website@aetey.se	712ad6b661	libavcodec/cinepakenc.c: comments cleanup (contents) Change the encoding of the original developer name from ISO-8859-1 to UTF-8. Remove the stale/completed TODO list. Fix two small typos. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Michael Niedermayer	61f70416f8	avcodec/dca_lbr: Fix off by 1 error in freq check Fixes out of array read Fixes: 510/clusterfuzz-testcase-5737865715646464 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Steinar H. Gunderson	08b098169b	speedhq: fix out-of-bounds write Certain alpha run lengths (for SHQ1/SHQ3/SHQ5) could be stored in both long and short versions, and we would only accept the short version, returning -1 (invalid code) for the others. This could cause an out-of-bounds write on malicious input, as discovered by Andreas Cadhalpun during fuzzing. Fix by simply allowing both versions, leaving no invalid codes in the alpha VLC. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>	8 years ago
Michael Niedermayer	0126cd95cc	avcodec/ituh263dec: Correct timestamp recovery for B frames Improves u263_b-frames_5.avi Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Paul B Mahol	71ca855c9d	avcodec/wmalosslessdec: remove warning message as bug is fixed Signed-off-by: Paul B Mahol <onemda@gmail.com>	8 years ago
bnnm	c61b28e042	avcodec/atrac3: Add multichannel joint stereo ATRAC3 Multichannel joint stereo simply interleaves stereo pairs (6ch: 2ch + 2ch + 2ch), so each pair is decoded separatedly. *** To test my changes, I converted examples to wav with ffmpeg.exe (old and new), and compared them to see they are byte-exact. Regular 2ch files (JS and normal) were straightforward to test. For multichannel, to check each JS pair is correctly decoded separatedly I did: - manually demux 6ch.msf into 3 pairs and convert them (2ch_1.wav + 2ch_2.wav + 2ch_3.wav) - convert the 6ch.msf file to wav (with my changes) - manually demux the 6ch.wav into 3 pairs (6ch_d1.wav + 6ch_d2.wav + 6ch_d3.wav) - compare each pair (ex. 2ch_3.wav vs 6ch_d3.wav): all pairs are byte-exact. The new code just processes each JS pair separatedly, there are no algorithm changes. It could be improved a bit but I'm not sure about typical styles. I've only seen 6ch .MSF (probably the AT3 spec only supports 2ch audio). Signed-off-by: bnnm <bananaman255@gmail.com>	8 years ago
Michael Niedermayer	4f651c723b	avcodec/h263: Remove disabled and wrong code from ff_h263_loop_filter() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Michael Niedermayer	901c625494	avcodec/ituh263dec: Use correct error codes in ff_h263_decode_mb() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Michael Niedermayer	e00c516d1e	avcodec/ituh263dec: Correct indention Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Carl Eugen Hoyos	aecdb14ad9	lavc/error_resilience: Remove two unused variables.	8 years ago
Michael Niedermayer	b28ae1e09b	avcodec/ituh263dec: Implement B frame support with UMV Fixes: u263_b-frames_1.avi Fixes part of Ticket1536 return -1 is used here as it is used in similar code in this function, I intend to replace it by proper error codes in the whole function. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Andreas Cadhalpun	842e98b4d8	pgssubdec: reset rle_data_len/rle_remaining_len on allocation error The code relies on their validity and otherwise can try to access a NULL object->rle pointer, causing segmentation faults. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>	8 years ago
Michael Niedermayer	536ac72f46	Revert "Merge commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553'" The assumption this is based on is wrong, the code is not always run with bitexact flags This reverts commit `a956164e1e`, reversing changes made to `f6005907fd`. Approved-by: James Almer <jamrial@gmail.com>	8 years ago
Michael Niedermayer	3782656631	avcodec/mjpegdec: Check for for the bitstream end in mjpeg_decode_scan_progressive_ac() Fixes timeout Fixes: 496/clusterfuzz-testcase-5805083497332736 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Clément Bœsch	7c300a8ed4	lavc/hevc: remove a few random spaces to reduce diff with libav	8 years ago
Carl Eugen Hoyos	12f7c091e8	lavc/alac: Export samplerate. Fixes ticket #6096.	8 years ago
Clément Bœsch	3a3554871a	lavc/hevcdsp: fix pretty printing mistake "Issue" introduced in `83976e40e8`.	8 years ago
Matthieu Bouron	2ae8278832	lavc/mjpegdec: consume SOS data even if the frame is discarded Speeds up next marker search when a SOS marker is found but the frame is discarded (which happens in avformat_find_stream_info).	8 years ago
Michael Niedermayer	f28299da8d	avcodec/h264dec: Clear ref_count on slice header processing failure Fixes using freed memory Introduced in `7448019890` Fixes: 471/fuzz-1-ffmpeg_VIDEO_AV_CODEC_ID_H264_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Paul B Mahol	b4a13d442a	avcodecc/ccaption_dec: remove extra word from long codec description Signed-off-by: Paul B Mahol <onemda@gmail.com>	8 years ago
Michael Niedermayer	2080bc3371	avcodec/utils: correct align value for interplay Fixes out of array access Fixes: 452/fuzz-1-ffmpeg_VIDEO_AV_CODEC_ID_INTERPLAY_VIDEO_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Carl Eugen Hoyos	6d6faa2a2d	lavc/svq3: Fail for media key encryption. Tested-by: ami_stuff Fixes a part of ticket #6094.	8 years ago
Michael Niedermayer	9e6a242755	avcodec/vp56: Check for the bitstream end, pass error codes on Fixes timeout Fixes: 446/fuzz-3-ffmpeg_VIDEO_AV_CODEC_ID_VP6_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Martin Storsjö	9f10cff610	aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter This work is sponsored by, and copyright, Google. This is similar to the arm version, but due to the larger registers on aarch64, we can do 8 pixels at a time for all filter sizes. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_loop_filter_h_4_8_10bpp_neon: 213.2 172.6 vp9_loop_filter_h_8_8_10bpp_neon: 281.2 244.2 vp9_loop_filter_h_16_8_10bpp_neon: 657.0 444.5 vp9_loop_filter_h_16_16_10bpp_neon: 1280.4 877.7 vp9_loop_filter_mix2_h_44_16_10bpp_neon: 397.7 358.0 vp9_loop_filter_mix2_h_48_16_10bpp_neon: 465.7 429.0 vp9_loop_filter_mix2_h_84_16_10bpp_neon: 465.7 428.0 vp9_loop_filter_mix2_h_88_16_10bpp_neon: 533.7 499.0 vp9_loop_filter_mix2_v_44_16_10bpp_neon: 271.5 244.0 vp9_loop_filter_mix2_v_48_16_10bpp_neon: 330.0 305.0 vp9_loop_filter_mix2_v_84_16_10bpp_neon: 329.0 306.0 vp9_loop_filter_mix2_v_88_16_10bpp_neon: 386.0 365.0 vp9_loop_filter_v_4_8_10bpp_neon: 150.0 115.2 vp9_loop_filter_v_8_8_10bpp_neon: 209.0 175.5 vp9_loop_filter_v_16_8_10bpp_neon: 492.7 345.2 vp9_loop_filter_v_16_16_10bpp_neon: 951.0 682.7 This is significantly faster than the ARM version in almost all cases except for the mix2 functions. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 2-3x. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	ceb36b8178	aarch64: Add NEON optimizations for 10 and 12 bit vp9 itxfm This work is sponsored by, and copyright, Google. Compared to the arm version, on aarch64 we can keep the full 8x8 transform in registers, and for 16x16 and 32x32, we can process it in slices of 4 pixels instead of 2. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_inv_adst_adst_4x4_sub4_add_10_neon: 111.0 109.7 vp9_inv_adst_adst_8x8_sub8_add_10_neon: 914.0 733.5 vp9_inv_adst_adst_16x16_sub16_add_10_neon: 5184.0 3745.7 vp9_inv_dct_dct_4x4_sub1_add_10_neon: 65.0 65.7 vp9_inv_dct_dct_4x4_sub4_add_10_neon: 100.0 96.7 vp9_inv_dct_dct_8x8_sub1_add_10_neon: 111.0 119.7 vp9_inv_dct_dct_8x8_sub8_add_10_neon: 618.0 494.7 vp9_inv_dct_dct_16x16_sub1_add_10_neon: 295.1 284.6 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 2303.2 1883.9 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 2984.8 2189.3 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 3890.0 2799.4 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 1044.4 1012.7 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 13333.7 9695.1 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 18531.3 12459.8 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 24470.7 16160.2 vp9_inv_wht_wht_4x4_sub4_add_10_neon: 83.0 79.7 The larger transforms are significantly faster than the corresponding ARM versions. The speedup vs C code is smaller than in 32 bit mode, probably because the 64 bit intermediates in the C code can be expressed more efficiently in aarch64. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	638eceed47	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC This work is sponsored by, and copyright, Google. This has mostly got the same differences to the 8 bit version as in the arm version. For the horizontal filters, we do 16 pixels in parallel as well. For the 8 pixel wide vertical filters, we can accumulate 4 rows before storing, just as in the 8 bit version. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_10bpp_neon: 35.7 30.7 vp9_avg8_10bpp_neon: 93.5 84.7 vp9_avg16_10bpp_neon: 324.4 296.6 vp9_avg32_10bpp_neon: 1236.5 1148.2 vp9_avg64_10bpp_neon: 4639.6 4571.1 vp9_avg_8tap_smooth_4h_10bpp_neon: 130.0 128.0 vp9_avg_8tap_smooth_4hv_10bpp_neon: 440.0 440.5 vp9_avg_8tap_smooth_4v_10bpp_neon: 114.0 105.5 vp9_avg_8tap_smooth_8h_10bpp_neon: 327.0 314.0 vp9_avg_8tap_smooth_8hv_10bpp_neon: 918.7 865.4 vp9_avg_8tap_smooth_8v_10bpp_neon: 330.0 300.2 vp9_avg_8tap_smooth_16h_10bpp_neon: 1187.5 1155.5 vp9_avg_8tap_smooth_16hv_10bpp_neon: 2663.1 2591.0 vp9_avg_8tap_smooth_16v_10bpp_neon: 1107.4 1078.3 vp9_avg_8tap_smooth_64h_10bpp_neon: 17754.6 17454.7 vp9_avg_8tap_smooth_64hv_10bpp_neon: 33285.2 33001.5 vp9_avg_8tap_smooth_64v_10bpp_neon: 16066.9 16048.6 vp9_put4_10bpp_neon: 25.5 21.7 vp9_put8_10bpp_neon: 56.0 52.0 vp9_put16_10bpp_neon/armv8: 183.0 163.1 vp9_put32_10bpp_neon/armv8: 678.6 563.1 vp9_put64_10bpp_neon/armv8: 2679.9 2195.8 vp9_put_8tap_smooth_4h_10bpp_neon: 120.0 118.0 vp9_put_8tap_smooth_4hv_10bpp_neon: 435.2 435.0 vp9_put_8tap_smooth_4v_10bpp_neon: 107.0 98.2 vp9_put_8tap_smooth_8h_10bpp_neon: 303.0 290.0 vp9_put_8tap_smooth_8hv_10bpp_neon: 893.7 828.7 vp9_put_8tap_smooth_8v_10bpp_neon: 305.5 263.5 vp9_put_8tap_smooth_16h_10bpp_neon: 1089.1 1059.2 vp9_put_8tap_smooth_16hv_10bpp_neon: 2578.8 2452.4 vp9_put_8tap_smooth_16v_10bpp_neon: 1009.5 933.5 vp9_put_8tap_smooth_64h_10bpp_neon: 16223.4 15918.6 vp9_put_8tap_smooth_64hv_10bpp_neon: 32153.0 31016.2 vp9_put_8tap_smooth_64v_10bpp_neon: 14516.5 13748.1 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is around 4-9x. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	48ad3fe1be	aarch64: vp9dsp: Restructure the bpp checks This work is sponsored by, and copyright, Google. This is more in line with how it will be extended for more bitdepths. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	1e5d87eec3	arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter This work is sponsored by, and copyright, Google. This is pretty much similar to the 8 bpp version, but in some senses simpler. All input pixels are 16 bits, and all intermediates also fit in 16 bits, so there's no lengthening/narrowing in the filter at all. For the full 16 pixel wide filter, we can only process 4 pixels at a time (using an implementation very much similar to the one for 8 bpp), but we can do 8 pixels at a time for the 4 and 8 pixel wide filters with a different implementation of the core filter. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_loop_filter_h_4_8_10bpp_neon: 1.83 2.16 1.40 2.09 vp9_loop_filter_h_8_8_10bpp_neon: 1.39 1.67 1.24 1.70 vp9_loop_filter_h_16_8_10bpp_neon: 1.56 1.47 1.10 1.81 vp9_loop_filter_h_16_16_10bpp_neon: 1.94 1.69 1.33 2.24 vp9_loop_filter_mix2_h_44_16_10bpp_neon: 2.01 2.27 1.67 2.39 vp9_loop_filter_mix2_h_48_16_10bpp_neon: 1.84 2.06 1.45 2.19 vp9_loop_filter_mix2_h_84_16_10bpp_neon: 1.89 2.20 1.47 2.29 vp9_loop_filter_mix2_h_88_16_10bpp_neon: 1.69 2.12 1.47 2.08 vp9_loop_filter_mix2_v_44_16_10bpp_neon: 3.16 3.98 2.50 4.05 vp9_loop_filter_mix2_v_48_16_10bpp_neon: 2.84 3.64 2.25 3.77 vp9_loop_filter_mix2_v_84_16_10bpp_neon: 2.65 3.45 2.16 3.54 vp9_loop_filter_mix2_v_88_16_10bpp_neon: 2.55 3.30 2.16 3.55 vp9_loop_filter_v_4_8_10bpp_neon: 2.85 3.97 2.24 3.68 vp9_loop_filter_v_8_8_10bpp_neon: 2.27 3.19 1.96 3.08 vp9_loop_filter_v_16_8_10bpp_neon: 3.42 2.74 2.26 4.40 vp9_loop_filter_v_16_16_10bpp_neon: 2.86 2.44 1.93 3.88 The speedup vs C code measured in checkasm is around 1.1-4x. These numbers are quite inconclusive though, since the checkasm test runs multiple filterings on top of each other, so later rounds might end up with different codepaths (different decisions on which filter to apply, based on input pixel differences). Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 2-4x. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	2ed67eba96	arm: Add NEON optimizations for 10 and 12 bit vp9 itxfm This work is sponsored by, and copyright, Google. This is structured similarly to the 8 bit version. In the 8 bit version, the coefficients are 16 bits, and intermediates are 32 bits. Here, the coefficients are 32 bit. For the 4x4 transforms for 10 bit content, the intermediates also fit in 32 bits, but for all other transforms (4x4 for 12 bit content, and 8x8 and larger for both 10 and 12 bit) the intermediates are 64 bit. For the existing 8 bit case, the 8x8 transform fit all coefficients in registers; for 10/12 bit, when the coefficients are 32 bit, the 8x8 transform also has to be done in slices of 4 pixels (just as 16x16 and 32x32 for 8 bit). The slice width also shrinks from 4 elements to 2 elements in parallel for the 16x16 and 32x32 cases. The 16 bit coefficients from idct_coeffs and similar tables also need to be lenghtened to 32 bit in order to be used in multiplication with vectors with 32 bit elements. This leads to the fixed coefficient vectors needing more space, leading to more cases where they have to be reloaded within the transform (in iadst16). This technically would need testing in checkasm for subpartitions in increments of 2, but that slows down normal checkasm runs excessively. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_inv_adst_adst_4x4_sub4_add_10_neon: 4.83 11.36 5.22 6.77 vp9_inv_adst_adst_8x8_sub8_add_10_neon: 4.12 7.60 4.06 4.84 vp9_inv_adst_adst_16x16_sub16_add_10_neon: 3.93 8.16 4.52 5.35 vp9_inv_dct_dct_4x4_sub1_add_10_neon: 1.36 2.57 1.41 1.61 vp9_inv_dct_dct_4x4_sub4_add_10_neon: 4.24 8.66 5.06 5.81 vp9_inv_dct_dct_8x8_sub1_add_10_neon: 2.63 4.18 1.68 2.87 vp9_inv_dct_dct_8x8_sub4_add_10_neon: 4.52 9.47 4.24 5.39 vp9_inv_dct_dct_8x8_sub8_add_10_neon: 3.45 7.34 3.45 4.30 vp9_inv_dct_dct_16x16_sub1_add_10_neon: 3.56 6.21 2.47 4.32 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 5.68 12.73 5.28 7.07 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 4.42 9.28 4.24 5.45 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 3.41 7.29 3.35 4.19 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 4.52 8.35 3.83 6.40 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 5.86 13.19 6.14 7.04 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 4.29 8.11 4.59 5.06 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 3.31 5.70 3.56 3.84 vp9_inv_wht_wht_4x4_sub4_add_10_neon: 1.89 2.80 1.82 1.97 The speedup compared to the C functions is around 1.3 to 7x for the full transforms, even higher for the smaller subpartitions. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	a4d4bad75c	arm: Add NEON optimizations for 10 and 12 bit vp9 MC This work is sponsored by, and copyright, Google. The plain pixel put/copy functions are used from the 8 bit version, for the double size (e.g. put16 uses ff_vp9_copy32_neon), and a new copy128 is added. Compared with the 8 bit version, the filters can no longer use the trick to accumulate in 16 bit with only saturation at the end, but now the accumulators need to be 32 bit. This avoids the need to keep track of which filter index is the largest though, reducing the size of the executable code for these filters. For the horizontal filters, we only do 4 or 8 pixels wide in parallel (while doing two rows at a time), since we don't have enough register space to filter 16 pixels wide. For the vertical filters, we still do 4 and 8 pixels in parallel just as in the 8 bit case, but we need to store the output after every 2 rows instead of after every 4 rows. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_avg4_10bpp_neon: 2.25 2.44 3.05 2.16 vp9_avg8_10bpp_neon: 3.66 8.48 3.86 3.50 vp9_avg16_10bpp_neon: 3.39 8.26 3.37 2.72 vp9_avg32_10bpp_neon: 4.03 10.20 4.07 3.42 vp9_avg64_10bpp_neon: 4.15 10.01 4.13 3.70 vp9_avg_8tap_smooth_4h_10bpp_neon: 3.38 6.22 3.41 4.75 vp9_avg_8tap_smooth_4hv_10bpp_neon: 3.89 6.39 4.30 5.32 vp9_avg_8tap_smooth_4v_10bpp_neon: 5.32 9.73 6.34 7.31 vp9_avg_8tap_smooth_8h_10bpp_neon: 4.45 9.40 4.68 6.87 vp9_avg_8tap_smooth_8hv_10bpp_neon: 4.64 8.91 5.44 6.47 vp9_avg_8tap_smooth_8v_10bpp_neon: 6.44 13.42 8.68 8.79 vp9_avg_8tap_smooth_64h_10bpp_neon: 4.66 9.02 4.84 7.71 vp9_avg_8tap_smooth_64hv_10bpp_neon: 4.61 9.14 4.92 7.10 vp9_avg_8tap_smooth_64v_10bpp_neon: 6.90 14.13 9.57 10.41 vp9_put4_10bpp_neon: 1.33 1.46 2.09 1.33 vp9_put8_10bpp_neon: 1.57 3.42 1.83 1.84 vp9_put16_10bpp_neon: 1.55 4.78 2.17 1.89 vp9_put32_10bpp_neon: 2.06 5.35 2.14 2.30 vp9_put64_10bpp_neon: 3.00 2.41 1.95 1.66 vp9_put_8tap_smooth_4h_10bpp_neon: 3.19 5.81 3.31 4.63 vp9_put_8tap_smooth_4hv_10bpp_neon: 3.86 6.22 4.32 5.21 vp9_put_8tap_smooth_4v_10bpp_neon: 5.40 9.77 6.08 7.21 vp9_put_8tap_smooth_8h_10bpp_neon: 4.22 8.41 4.46 6.63 vp9_put_8tap_smooth_8hv_10bpp_neon: 4.56 8.51 5.39 6.25 vp9_put_8tap_smooth_8v_10bpp_neon: 6.60 12.43 8.17 8.89 vp9_put_8tap_smooth_64h_10bpp_neon: 4.41 8.59 4.54 7.49 vp9_put_8tap_smooth_64hv_10bpp_neon: 4.43 8.58 5.34 6.63 vp9_put_8tap_smooth_64v_10bpp_neon: 7.26 13.92 9.27 10.92 For the larger 8tap filters, the speedup vs C code is around 4-14x. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	cda9a3e80b	arm: vp9dsp: Restructure the bpp checks This work is sponsored by, and copyright, Google. This is more in line with how it will be extended for more bitdepths. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Michael Niedermayer	755933cb5c	avcodec/mjpegdec: Check remaining bitstream in ljpeg_decode_yuv_scan() Fixes timeout Fixes: 445/fuzz-3-ffmpeg_VIDEO_AV_CODEC_ID_MJPEG_fuzzer Fixes: 456/fuzz-2-ffmpeg_VIDEO_AV_CODEC_ID_JPEGLS_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Michael Niedermayer	25f4f08ba5	avcodec/h264dec: Fix regression with "make fate-h264-attachment-631 THREADS=8" This treats the case of no slices like no frames which it basically is. The field is added to the context as other nal related fields are also there and passing the has_slices field per *arguments is ugly and not consistent Found-by: ubitux Approved-by: ubitux Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Pavel Koshevoy	9ea2998586	avcodec/cuvid: fail early if GPU can't handle video resolution CUVID on GeForce GT 730 and GeForce GTX 1060 does not report any error when decoding 8K h264 packets. However, it does return an error during cuvidCreateDecoder call if the indicated video resolution is not supported. Given that stream resolution is typically known as a result of probing it is better to use this information during avcodec_open2 call to fail immediately, rather than proceeding to decode and never receiving any frames from the decoder nor receiving any indication of decode failure. Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>	8 years ago
Michael Niedermayer	e371f031b9	avcodec/pngdec: Fix off by 1 size in decode_zbuf() Fixes out of array access Fixes: 444/fuzz-2-ffmpeg_VIDEO_AV_CODEC_ID_PNG_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago

1 2 3 4 5 ...

36933 Commits (a91cedf79a92377300006b66b8707ae52021eea9)