Ramiro Polla
7e4784e40c
avcodec/mpegvideoencdsp: speed up draw_edges_8_c by inlining it for all used edge widths
...
This commit also restricts w to 4, 8, or 16.
Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz:
before after
draw_edges_8_1724_4_c: 46796.5 7141.7 ( 6.55x)
draw_edges_8_1724_8_c: 43584.5 7216.5 ( 6.04x)
draw_edges_8_1724_16_c: 47007.2 10080.5 ( 4.66x)
draw_edges_128_407_4_c: 11199.0 4185.0 ( 2.68x)
draw_edges_128_407_8_c: 10660.2 4418.0 ( 2.41x)
draw_edges_128_407_16_c: 11800.2 4634.5 ( 2.55x)
draw_edges_1080_31_4_c: 1356.5 634.7 ( 2.14x)
draw_edges_1080_31_8_c: 1972.0 1430.2 ( 1.38x)
draw_edges_1080_31_16_c: 4621.0 4009.7 ( 1.15x)
draw_edges_1920_4_4_c: 834.5 795.2 ( 1.05x)
draw_edges_1920_4_4_negstride_c: 821.7 802.0 ( 1.02x)
draw_edges_1920_4_8_c: 2782.2 2650.7 ( 1.05x)
draw_edges_1920_4_8_negstride_c: 2724.7 2670.0 ( 1.02x)
draw_edges_1920_4_16_c: 6437.5 6327.7 ( 1.02x)
draw_edges_1920_4_16_negstride_c: 6395.2 6349.5 ( 1.01x)
A55:
before after
draw_edges_8_1724_4_c: 52540.4 19739.2 ( 2.66x)
draw_edges_8_1724_8_c: 45386.9 19847.4 ( 2.29x)
draw_edges_8_1724_16_c: 51995.4 23284.7 ( 2.23x)
draw_edges_128_407_4_c: 13401.1 6988.2 ( 1.92x)
draw_edges_128_407_8_c: 12218.4 7527.9 ( 1.62x)
draw_edges_128_407_16_c: 13695.9 8207.2 ( 1.67x)
draw_edges_1080_31_4_c: 3702.9 3110.4 ( 1.19x)
draw_edges_1080_31_8_c: 6015.6 5643.2 ( 1.07x)
draw_edges_1080_31_16_c: 12281.9 11901.4 ( 1.03x)
draw_edges_1920_4_4_c: 3957.9 3970.2 ( 1.00x)
draw_edges_1920_4_4_negstride_c: 3964.1 3825.2 ( 1.04x)
draw_edges_1920_4_8_c: 7757.9 7676.4 ( 1.01x)
draw_edges_1920_4_8_negstride_c: 7923.6 7812.4 ( 1.01x)
draw_edges_1920_4_16_c: 14791.6 15143.9 ( 0.98x)
draw_edges_1920_4_16_negstride_c: 14788.6 15163.4 ( 0.98x)
A76:
before after
draw_edges_8_1724_4_c: 39786.0 4968.5 ( 8.01x)
draw_edges_8_1724_8_c: 32971.5 5069.5 ( 6.50x)
draw_edges_8_1724_16_c: 40056.0 6017.2 ( 6.66x)
draw_edges_128_407_4_c: 9517.2 1210.5 ( 7.86x)
draw_edges_128_407_8_c: 8035.7 1346.2 ( 5.97x)
draw_edges_128_407_16_c: 9946.5 1648.2 ( 6.03x)
draw_edges_1080_31_4_c: 1308.0 660.7 ( 1.98x)
draw_edges_1080_31_8_c: 1785.5 1270.7 ( 1.41x)
draw_edges_1080_31_16_c: 3266.7 2591.5 ( 1.26x)
draw_edges_1920_4_4_c: 1151.0 1090.7 ( 1.06x)
draw_edges_1920_4_4_negstride_c: 1153.7 1096.5 ( 1.05x)
draw_edges_1920_4_8_c: 2220.7 2186.5 ( 1.02x)
draw_edges_1920_4_8_negstride_c: 2218.5 2193.5 ( 1.01x)
draw_edges_1920_4_16_c: 4324.2 4230.0 ( 1.02x)
draw_edges_1920_4_16_negstride_c: 4310.7 4233.0 ( 1.02x)
4 months ago
Ramiro Polla
3bfce2a104
avcodec/x86/mpegvideoencdsp: speed up draw_edges_mmx by using memcpy()
...
The mmx memory copy code is not nearly as efficient as memcpy(), which
would make draw_edges_mmx much slower than draw_edges_8_c.
Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz:
before after
draw_edges_8_1724_4_mmx: 8700.5 8751.8 ( 0.99x)
draw_edges_8_1724_8_mmx: 10441.7 10558.0 ( 0.99x)
draw_edges_8_1724_16_mmx: 10660.7 10799.5 ( 0.99x)
draw_edges_128_407_4_mmx: 4202.2 4099.3 ( 1.03x)
draw_edges_128_407_8_mmx: 4579.0 4511.3 ( 1.02x)
draw_edges_128_407_16_mmx: 5479.7 4729.5 ( 1.16x)
draw_edges_1080_31_4_mmx: 1546.7 658.0 ( 2.35x)
draw_edges_1080_31_8_mmx: 2745.5 1442.5 ( 1.90x)
draw_edges_1080_31_16_mmx: 12511.5 4901.0 ( 2.55x)
draw_edges_1920_4_4_mmx: 2659.0 705.0 ( 3.77x)
draw_edges_1920_4_4_negstride_mmx: 2643.0 729.0 ( 3.63x)
draw_edges_1920_4_8_mmx: 7845.0 2819.0 ( 2.78x)
draw_edges_1920_4_8_negstride_mmx: 7777.0 2747.3 ( 2.83x)
draw_edges_1920_4_16_mmx: 24583.7 6358.3 ( 3.87x)
draw_edges_1920_4_16_negstride_mmx: 24589.0 6367.0 ( 3.86x)
4 months ago
Ramiro Polla
9cdcbb639a
avcodec/x86/mpegvideoencdsp: fix comment for draw_edges_mmx
...
Not only w == 8 and w == 16 are supported, but also w == 4.
4 months ago
Ramiro Polla
8c203ea7c7
avcodec/aarch64/mpegvideoencdsp: add dotprod implementation for pix_norm1
...
A55 A76
pix_norm1_c: 484.3 235.2
pix_norm1_neon: 193.8 ( 2.50x) 44.7 ( 5.26x)
pix_norm1_dotprod: 91.8 ( 5.28x) 21.2 (11.09x)
4 months ago
Ramiro Polla
9f68a3712e
avcodec/aarch64/mpegvideoencdsp: add neon implementations for pix_sum and pix_norm1
...
A55 A76
pix_norm1_c: 478.2 234.2
pix_norm1_neon: 188.2 ( 2.54x) 41.2 ( 5.68x)
pix_sum_c: 304.2 244.0
pix_sum_neon: 77.2 ( 3.94x) 21.5 (11.35x)
4 months ago
Ramiro Polla
834964ce1a
checkasm/mpegvideoencdsp: add pix_sum, pix_norm1, and draw_edges
4 months ago
Ramiro Polla
f9074427db
avcodec/x86/mpegvideoencdsp: support negative strides in draw_edges_mmx()
4 months ago
Ramiro Polla
98610fe95f
fate/checkasm: run the sw_yuv2yuv test
4 months ago
Zhao Zhili
12cdb30e37
avcodec/videotoolboxenc: Fix leaking of supported_props
...
There are two VTCompressionSessionRef been created, one for generating
extradata, and another for normal encoding. supported_props was been
overwritten without release.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
4 months ago
Ramiro Polla
420d443600
swscale/aarch64: cosmetics fix (spaces inside curly braces)
4 months ago
Ramiro Polla
52887683e9
swscale/aarch64: add nv24/nv42 to yuv420p unscaled converter
...
A55 A76
nv24_yuv420p_128_c: 4956.1 1267.0
nv24_yuv420p_128_neon: 3109.1 ( 1.59x) 640.0 ( 1.98x)
nv24_yuv420p_1920_c: 35728.4 11736.2
nv24_yuv420p_1920_neon: 8011.1 ( 4.46x) 2436.0 ( 4.82x)
nv42_yuv420p_128_c: 4956.4 1270.5
nv42_yuv420p_128_neon: 3074.6 ( 1.61x) 639.5 ( 1.99x)
nv42_yuv420p_1920_c: 35685.9 11732.5
nv42_yuv420p_1920_neon: 7995.1 ( 4.46x) 2437.2 ( 4.81x)
4 months ago
Ramiro Polla
88a563ad18
swscale: export ff_copyPlane so it may be used by simd code
4 months ago
Ramiro Polla
a2e01cade8
checkasm/yuv2yuv: add tests for semiplanar unscaled converters
4 months ago
Ramiro Polla
4eb5594295
swscale: add nv24/nv42 to yuv420p unscaled converter
4 months ago
Zhao Zhili
aa14f9fe63
avcodec/mediacodecdec: Skip dequeue buffer in draining state
...
There is no more packet to queue in draining state.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
4 months ago
Zhao Zhili
2e370805da
avfilter/unsharp: Merge header into .c
...
It was shared with opencl implementation.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
4 months ago
Stefan Oltmanns
d42cd5b75b
avformat/vapoursynth: load library at runtime
...
Signed-off-by: Stefan Oltmanns <stefan-oltmanns@gmx.net>
4 months ago
Stefan Oltmanns
eac611f1a4
avformat/vapoursynth: Update to API version 4
...
Signed-off-by: Stefan Oltmanns <stefan-oltmanns@gmx.net>
4 months ago
Ramiro Polla
abb4e13a0a
avutil/aarch64: add AV_COPY128 and AV_ZERO128 macros
4 months ago
Zhao Zhili
40dda881d6
avcodec/filter_units: Fix extradata and packets can have different bitstream format
...
Filter init can change extradata from avcc/hvcc to annexb format.
With different passthrough logic, packets can still in avcc/hvcc
format. Use same passthrough logic for init and filter.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
4 months ago
Zhao Zhili
523189c744
fftools/ffplay: handle flip in display matrix
...
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
4 months ago
Gnattu OC
30f090b4f8
avfilter: inherit input color range for videotoolbox filters
...
The color range should be set to match the input when creating
the VideoToolbox context. Otherwise, the new context will default
to limited range, creates inconsistencies with full range inputs.
Signed-off-by: Gnattu OC <gnattuoc@me.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
4 months ago
Martin Storsjö
cfe0a36352
libswscale: aarch64: Fix the indentation of some macro invocations
...
Signed-off-by: Martin Storsjö <martin@martin.st>
4 months ago
James Almer
9d15fe77e3
avcodec/container_fifo: add missing stddef.h include
...
Fixes make checkheaders
Signed-off-by: James Almer <jamrial@gmail.com>
4 months ago
James Almer
a754ee0844
avcodec/h2645_parse: replace three bool arguments in ff_h2645_packet_split with a single flags one
...
Signed-off-by: James Almer <jamrial@gmail.com>
4 months ago
James Almer
8060644237
avcodec/shorten: Fix discard of ‘const’ qualifier
...
Signed-off-by: James Almer <jamrial@gmail.com>
4 months ago
Martin Storsjö
507c2a5774
libswscale: arm: Don't assume aligned output in yuv2rgb functions
...
This fixes failures in recently added checkasm tests.
While the buffers in most cases are aligned, libswscale in general
can't assume the output to be aligned.
Signed-off-by: Martin Storsjö <martin@martin.st>
4 months ago
Anton Khirnov
52471b56ba
lavfi: make FFFilterContext private to generic code
...
Nothing in it needs to be visible to filters.
4 months ago
Anton Khirnov
f19c988911
lavfi/filters: move functions only used by generic code to avfilter_internal.h
4 months ago
Anton Khirnov
6d75d44d90
lavfi: drop internal.h
...
All that remains in it are things that belong in avfilter_internal.h.
Move them there and remove internal.h
4 months ago
Anton Khirnov
90e4af65e1
lavfi/f_streamselect: remove a no-op ff_filter_config_links() call
...
It does not do anything when the links are already configured.
4 months ago
Anton Khirnov
a2314308f2
lavfi/inernal: move ff_fmt_is_regular_yuv() declaration to video.h
4 months ago
Anton Khirnov
a83a30e899
lavfi: move ff_parse_{sample_rate,channel_layout}() to audio.[ch]
...
That is a more appropriate place for those functions.
4 months ago
Anton Khirnov
f4bfdf7893
lavfi: move ff_parse_pixel_format() to vf_format, its only caller
...
The only thing this function does beyond calling av_get_pix_fmt() is
falling back onto parsing the argument as a number. No other filters
should need to do this.
4 months ago
Anton Khirnov
1afe42852b
lavfi/internal: move functions used by filters to filters.h
...
internal.h currently mixes interfaces intended to be used by filters
with those that should be limited to generic filter- or graph-level
code.
4 months ago
Rémi Denis-Courmont
d8fb44c0aa
lavc/mpegvideoencdsp: R-V V add_8x8basis
...
T-Head C908:
add_8x8basis_c: 440.6
add_8x8basis_rvv_i32: 70.3
SpacemiT X60:
add_8x8basis_c: 436.3
add_8x8basis_rvv_i32: 40.5
4 months ago
Rémi Denis-Courmont
1907dd7f23
lavc/mpegvideoencdsp: R-V V try_8x8basis
...
T-Head C908:
try_8x8basis_c: 922.5
try_8x8basis_rvv_i32: 135.3
SpacemiT X60:
try_8x8basis_c: 926.1
try_8x8basis_rvv_i32: 103.1
4 months ago
Rémi Denis-Courmont
0fd37c00d7
lavc/mpegvideoencdsp: R-V V pix_norm1
...
T-Head C908:
pix_norm1_c: 480.2
pix_norm1_rvv_i64: 146.9
SpacemiT X60:
pix_norm1_c: 478.2
pix_norm1_rvv_i64: 92.7
4 months ago
Rémi Denis-Courmont
63d016aea5
lavc/mpegvideoencdsp: R-V V pix_sum
...
T-Head C908:
pix_sum_c: 332.2
pix_sum_rvv_i64: 91.2
SpacemiT X60:
pix_sum_c: 321.2
pix_sum_rvv_i64: 60.9
4 months ago
Anton Khirnov
631a725670
lavc/hevcdec: call ff_thread_finish_setup() even if hwaccel is in use
...
Serializing frame threading for non-threadsafe hwaccels is handled at the
generic level, the decoder does not need to care about it.
4 months ago
Anton Khirnov
4b9adb35b6
lavc/hevcdec: simplify output logic
...
Current code is written around the "simple" decode API's limitation that
a single input packet (AU/coded frame) triggers the output of at most
one output frame. However the spec contains two cases where a coded
frame may cause multiple frames to be output (cf. C.5.2.2.2):
* start of a new sequence
* overflowing sps_max_dec_pic_buffering
The decoder currently contains rather convoluted logic to handle these
cases:
* decode/output/per-frame sequence counters,
* HEVC_FRAME_FLAG_BUMPING
* ff_hevc_bump_frame()
* special clauses in ff_hevc_output_frame()
However, with the receive_frame() API none of that is necessary, as we
can just output multiple frames at once. Previously added ContainerFifo
allows that to be done in a straightforward and efficient manner.
4 months ago
Anton Khirnov
79afc45c03
lavc/hevcdec: use a ContainerFifo to hold frames scheduled for output
...
Instead of a single AVFrame.
Will be useful in future commits, where we will want to produce multiple
output frames for a single coded frame.
4 months ago
Anton Khirnov
4bda7f288c
lavc/videotoolbox: drop HEVC cropping from start_frame rather than end_frame
...
HEVCContext.output_frame will be removed in following commits.
Reported-By: Max Bykov
4 months ago
Anton Khirnov
6174818252
lavc: add private container FIFO API
...
It provides a FIFO for "container" objects like AVFrame/AVPacket and
features an integrated FFRefStructPool-based pool to avoid allocating an
freeing them repeatedly.
4 months ago
Anton Khirnov
2fdecbb239
lavc/hevcdec: switch to receive_frame()
...
Required by following commits, where we will want to output multiple
frames per packet.
4 months ago
sunyuechi
4e7b5ac48f
lavc/vp9dsp: R-V V mc bilin hv
...
C908 X60
vp9_avg_bilin_4hv_8bpp_c : 10.7 9.5
vp9_avg_bilin_4hv_8bpp_rvv_i32 : 4.0 3.5
vp9_avg_bilin_8hv_8bpp_c : 38.5 34.2
vp9_avg_bilin_8hv_8bpp_rvv_i32 : 7.2 6.5
vp9_avg_bilin_16hv_8bpp_c : 147.2 130.5
vp9_avg_bilin_16hv_8bpp_rvv_i32 : 14.5 12.7
vp9_avg_bilin_32hv_8bpp_c : 574.2 509.7
vp9_avg_bilin_32hv_8bpp_rvv_i32 : 42.5 38.0
vp9_avg_bilin_64hv_8bpp_c : 2321.2 2017.7
vp9_avg_bilin_64hv_8bpp_rvv_i32 : 163.5 131.0
vp9_put_bilin_4hv_8bpp_c : 10.0 8.7
vp9_put_bilin_4hv_8bpp_rvv_i32 : 3.5 3.0
vp9_put_bilin_8hv_8bpp_c : 35.2 31.2
vp9_put_bilin_8hv_8bpp_rvv_i32 : 6.5 5.7
vp9_put_bilin_16hv_8bpp_c : 134.0 119.0
vp9_put_bilin_16hv_8bpp_rvv_i32 : 12.7 11.5
vp9_put_bilin_32hv_8bpp_c : 538.5 464.2
vp9_put_bilin_32hv_8bpp_rvv_i32 : 39.7 35.2
vp9_put_bilin_64hv_8bpp_c : 2111.7 1833.2
vp9_put_bilin_64hv_8bpp_rvv_i32 : 138.5 122.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
4 months ago
sunyuechi
9edd2e723b
lavc/vp9dsp: R-V V mc bilin h v
...
C908 X60
vp9_avg_bilin_4h_8bpp_c : 5.5 4.7
vp9_avg_bilin_4h_8bpp_rvv_i32 : 1.7 1.5
vp9_avg_bilin_4v_8bpp_c : 5.5 4.7
vp9_avg_bilin_4v_8bpp_rvv_i32 : 1.5 1.2
vp9_avg_bilin_8h_8bpp_c : 20.0 17.7
vp9_avg_bilin_8h_8bpp_rvv_i32 : 3.0 2.7
vp9_avg_bilin_8v_8bpp_c : 20.7 18.7
vp9_avg_bilin_8v_8bpp_rvv_i32 : 3.0 2.7
vp9_avg_bilin_16h_8bpp_c : 78.2 69.7
vp9_avg_bilin_16h_8bpp_rvv_i32 : 7.0 6.2
vp9_avg_bilin_16v_8bpp_c : 98.5 73.2
vp9_avg_bilin_16v_8bpp_rvv_i32 : 7.0 6.0
vp9_avg_bilin_32h_8bpp_c : 325.5 275.5
vp9_avg_bilin_32h_8bpp_rvv_i32 : 23.0 20.5
vp9_avg_bilin_32v_8bpp_c : 342.2 290.0
vp9_avg_bilin_32v_8bpp_rvv_i32 : 21.7 19.5
vp9_avg_bilin_64h_8bpp_c : 1263.7 1095.7
vp9_avg_bilin_64h_8bpp_rvv_i32 : 91.2 81.2
vp9_avg_bilin_64v_8bpp_c : 1331.7 1155.2
vp9_avg_bilin_64v_8bpp_rvv_i32 : 91.2 81.0
vp9_put_bilin_4h_8bpp_c : 4.5 4.0
vp9_put_bilin_4h_8bpp_rvv_i32 : 1.0 1.0
vp9_put_bilin_4v_8bpp_c : 4.7 4.2
vp9_put_bilin_4v_8bpp_rvv_i32 : 1.0 1.0
vp9_put_bilin_8h_8bpp_c : 16.7 15.0
vp9_put_bilin_8h_8bpp_rvv_i32 : 2.2 2.0
vp9_put_bilin_8v_8bpp_c : 17.5 15.7
vp9_put_bilin_8v_8bpp_rvv_i32 : 2.2 2.0
vp9_put_bilin_16h_8bpp_c : 65.2 58.0
vp9_put_bilin_16h_8bpp_rvv_i32 : 6.0 5.5
vp9_put_bilin_16v_8bpp_c : 69.2 61.7
vp9_put_bilin_16v_8bpp_rvv_i32 : 5.7 5.2
vp9_put_bilin_32h_8bpp_c : 273.2 229.0
vp9_put_bilin_32h_8bpp_rvv_i32 : 19.7 17.7
vp9_put_bilin_32v_8bpp_c : 290.5 243.7
vp9_put_bilin_32v_8bpp_rvv_i32 : 18.7 16.7
vp9_put_bilin_64h_8bpp_c : 1040.5 910.5
vp9_put_bilin_64h_8bpp_rvv_i32 : 82.5 73.0
vp9_put_bilin_64v_8bpp_c : 1108.5 971.0
vp9_put_bilin_64v_8bpp_rvv_i32 : 82.2 73.2
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
4 months ago
Marvin Scholz
8f36c6f2e7
MAINTAINERS: add CC preference for myself
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
5 months ago
Michael Niedermayer
7e5410eadb
avformat/iamf_parse: clear padding
...
Fixes: use of uninitialized value
Fixes: 70929/clusterfuzz-testcase-minimized-ffmpeg_dem_IAMF_fuzzer-5931276639469568
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
5 months ago
Michael Niedermayer
67947f2a1c
avcodec/hevc/ps: use unsigned shift
...
Fixes: left shift of 1 by 31 places cannot be represented in type 'int'
Fixes: 70726/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-6149928703819776
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
5 months ago