FFmpeg

Commit Graph

Author	SHA1	Message	Date
Logan Lyu	55f28eb627	lavc/aarch64: new optimization for 8-bit hevc_qpel_hv checkasm bench: put_hevc_qpel_hv4_8_c: 422.1 put_hevc_qpel_hv4_8_i8mm: 101.6 put_hevc_qpel_hv6_8_c: 756.4 put_hevc_qpel_hv6_8_i8mm: 225.9 put_hevc_qpel_hv8_8_c: 1189.9 put_hevc_qpel_hv8_8_i8mm: 296.6 put_hevc_qpel_hv12_8_c: 2407.4 put_hevc_qpel_hv12_8_i8mm: 552.4 put_hevc_qpel_hv16_8_c: 4021.4 put_hevc_qpel_hv16_8_i8mm: 886.6 put_hevc_qpel_hv24_8_c: 8992.1 put_hevc_qpel_hv24_8_i8mm: 1968.9 put_hevc_qpel_hv32_8_c: 15197.9 put_hevc_qpel_hv32_8_i8mm: 3209.4 put_hevc_qpel_hv48_8_c: 32811.1 put_hevc_qpel_hv48_8_i8mm: 7442.1 put_hevc_qpel_hv64_8_c: 58106.1 put_hevc_qpel_hv64_8_i8mm: 12423.9 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	97a9d12657	lavc/aarch64: new optimization for 8-bit hevc_qpel_v checkasm bench: put_hevc_qpel_v4_8_c: 138.1 put_hevc_qpel_v4_8_neon: 41.1 put_hevc_qpel_v6_8_c: 276.6 put_hevc_qpel_v6_8_neon: 60.9 put_hevc_qpel_v8_8_c: 478.9 put_hevc_qpel_v8_8_neon: 72.9 put_hevc_qpel_v12_8_c: 1072.6 put_hevc_qpel_v12_8_neon: 203.9 put_hevc_qpel_v16_8_c: 1852.1 put_hevc_qpel_v16_8_neon: 264.1 put_hevc_qpel_v24_8_c: 4137.6 put_hevc_qpel_v24_8_neon: 586.9 put_hevc_qpel_v32_8_c: 7579.1 put_hevc_qpel_v32_8_neon: 1036.6 put_hevc_qpel_v48_8_c: 16355.6 put_hevc_qpel_v48_8_neon: 2326.4 put_hevc_qpel_v64_8_c: 33545.1 put_hevc_qpel_v64_8_neon: 4126.4 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	265450b89e	lavc/aarch64: new optimization for 8-bit hevc_epel_hv checkasm bench: put_hevc_epel_hv4_8_c: 213.7 put_hevc_epel_hv4_8_i8mm: 59.4 put_hevc_epel_hv6_8_c: 350.9 put_hevc_epel_hv6_8_i8mm: 130.2 put_hevc_epel_hv8_8_c: 548.7 put_hevc_epel_hv8_8_i8mm: 136.9 put_hevc_epel_hv12_8_c: 1126.7 put_hevc_epel_hv12_8_i8mm: 302.2 put_hevc_epel_hv16_8_c: 1925.2 put_hevc_epel_hv16_8_i8mm: 459.9 put_hevc_epel_hv24_8_c: 4301.9 put_hevc_epel_hv24_8_i8mm: 1024.9 put_hevc_epel_hv32_8_c: 7509.2 put_hevc_epel_hv32_8_i8mm: 1680.4 put_hevc_epel_hv48_8_c: 16566.9 put_hevc_epel_hv48_8_i8mm: 3945.4 put_hevc_epel_hv64_8_c: 29134.2 put_hevc_epel_hv64_8_i8mm: 6567.7 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	22c7291506	lavc/aarch64: new optimization for 8-bit hevc_epel_v checkasm bench: put_hevc_epel_v4_8_c: 79.9 put_hevc_epel_v4_8_neon: 25.7 put_hevc_epel_v6_8_c: 151.4 put_hevc_epel_v6_8_neon: 46.4 put_hevc_epel_v8_8_c: 250.9 put_hevc_epel_v8_8_neon: 41.7 put_hevc_epel_v12_8_c: 542.7 put_hevc_epel_v12_8_neon: 108.7 put_hevc_epel_v16_8_c: 939.4 put_hevc_epel_v16_8_neon: 169.2 put_hevc_epel_v24_8_c: 2104.9 put_hevc_epel_v24_8_neon: 307.9 put_hevc_epel_v32_8_c: 3713.9 put_hevc_epel_v32_8_neon: 524.2 put_hevc_epel_v48_8_c: 8175.2 put_hevc_epel_v48_8_neon: 1197.2 put_hevc_epel_v64_8_c: 16049.4 put_hevc_epel_v64_8_neon: 2094.9 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	772865717b	lavc/aarch64: new optimization for 8-bit hevc_epel_pixels and and hevc_qpel_pixels checkasm bench: put_hevc_pel_pixels4_8_c: 33.7 put_hevc_pel_pixels4_8_neon: 20.2 put_hevc_pel_pixels6_8_c: 61.4 put_hevc_pel_pixels6_8_neon: 25.4 put_hevc_pel_pixels8_8_c: 121.4 put_hevc_pel_pixels8_8_neon: 16.9 put_hevc_pel_pixels12_8_c: 199.9 put_hevc_pel_pixels12_8_neon: 40.2 put_hevc_pel_pixels16_8_c: 355.9 put_hevc_pel_pixels16_8_neon: 43.4 put_hevc_pel_pixels24_8_c: 774.7 put_hevc_pel_pixels24_8_neon: 78.9 put_hevc_pel_pixels32_8_c: 1345.2 put_hevc_pel_pixels32_8_neon: 152.2 put_hevc_pel_pixels48_8_c: 2963.7 put_hevc_pel_pixels48_8_neon: 309.4 put_hevc_pel_pixels64_8_c: 5236.2 put_hevc_pel_pixels64_8_neon: 514.2 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	a4877f1ec1	aarch64: Only enable extensions in the intended files/regions This eases actual development of the assembly functions, by only allowing extension instructions within the sections that explicitly enable them, instead of having all extensions enabled everywhere. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	1762975ba1	libavcodec/aarch64/hevc: Require consistent use of trailing semicolon Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	a76b409dd0	aarch64: Reindent all assembly to 8/24 column indentation libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally uses a layered indentation style to visually show how different unrolled/interleaved phases fit together. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	7f905f3672	aarch64: Make the indentation more consistent Some functions have slightly different indentation styles; try to match the surrounding code. libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally uses a layered indentation style to visually show how different unrolled/interleaved phases fit together. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	93cda5a9c2	aarch64: Lowercase UXTW/SXTW and similar flags Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	184103b310	aarch64: Consistently use lowercase for vector element specifiers Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Andreas Rheinhardt	6f7bf64dbc	avcodec: Remove DCT, FFT, MDCT and RDFT They were replaced by TX from libavutil; the tremendous work to get to this point (both creating TX as well as porting the users of the components removed in this commit) was completely performed by Lynne alone. Removing the subsystems from configure may break some command lines, because the --disable-fft etc. options are no longer recognized. Co-authored-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	1 year ago
Logan Lyu	8fa83ad70f	lavc/aarch64: new optimization for 8-bit hevc_qpel_uni_hv checkasm bench: put_hevc_qpel_uni_hv4_8_c: 489.2 put_hevc_qpel_uni_hv4_8_i8mm: 105.7 put_hevc_qpel_uni_hv6_8_c: 852.7 put_hevc_qpel_uni_hv6_8_i8mm: 268.7 put_hevc_qpel_uni_hv8_8_c: 1345.7 put_hevc_qpel_uni_hv8_8_i8mm: 300.4 put_hevc_qpel_uni_hv12_8_c: 2757.4 put_hevc_qpel_uni_hv12_8_i8mm: 581.4 put_hevc_qpel_uni_hv16_8_c: 4458.9 put_hevc_qpel_uni_hv16_8_i8mm: 860.2 put_hevc_qpel_uni_hv24_8_c: 9582.2 put_hevc_qpel_uni_hv24_8_i8mm: 2086.7 put_hevc_qpel_uni_hv32_8_c: 16401.9 put_hevc_qpel_uni_hv32_8_i8mm: 3217.4 put_hevc_qpel_uni_hv48_8_c: 36402.4 put_hevc_qpel_uni_hv48_8_i8mm: 7082.7 put_hevc_qpel_uni_hv64_8_c: 62713.2 put_hevc_qpel_uni_hv64_8_i8mm: 12408.9 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	23ca61b7de	lavc/aarch64: new optimization for 8-bit hevc_qpel_uni_v checkasm bench: put_hevc_qpel_uni_v4_8_c: 146.2 put_hevc_qpel_uni_v4_8_neon: 43.2 put_hevc_qpel_uni_v6_8_c: 303.9 put_hevc_qpel_uni_v6_8_neon: 69.7 put_hevc_qpel_uni_v8_8_c: 495.2 put_hevc_qpel_uni_v8_8_neon: 74.7 put_hevc_qpel_uni_v12_8_c: 1100.9 put_hevc_qpel_uni_v12_8_neon: 222.4 put_hevc_qpel_uni_v16_8_c: 1955.2 put_hevc_qpel_uni_v16_8_neon: 269.2 put_hevc_qpel_uni_v24_8_c: 4571.9 put_hevc_qpel_uni_v24_8_neon: 832.4 put_hevc_qpel_uni_v32_8_c: 8226.4 put_hevc_qpel_uni_v32_8_neon: 1035.7 put_hevc_qpel_uni_v48_8_c: 18324.2 put_hevc_qpel_uni_v48_8_neon: 2321.2 put_hevc_qpel_uni_v64_8_c: 37659.4 put_hevc_qpel_uni_v64_8_neon: 4122.2 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	b7a3150bc5	lavc/aarch64: new optimization for 8-bit hevc_epel_uni_hv checkasm bench: put_hevc_epel_uni_hv4_8_c: 204.7 put_hevc_epel_uni_hv4_8_i8mm: 70.2 put_hevc_epel_uni_hv6_8_c: 378.2 put_hevc_epel_uni_hv6_8_i8mm: 131.9 put_hevc_epel_uni_hv8_8_c: 637.7 put_hevc_epel_uni_hv8_8_i8mm: 137.9 put_hevc_epel_uni_hv12_8_c: 1301.9 put_hevc_epel_uni_hv12_8_i8mm: 314.2 put_hevc_epel_uni_hv16_8_c: 2203.4 put_hevc_epel_uni_hv16_8_i8mm: 454.7 put_hevc_epel_uni_hv24_8_c: 4848.2 put_hevc_epel_uni_hv24_8_i8mm: 1065.2 put_hevc_epel_uni_hv32_8_c: 8517.4 put_hevc_epel_uni_hv32_8_i8mm: 1898.4 put_hevc_epel_uni_hv48_8_c: 19591.7 put_hevc_epel_uni_hv48_8_i8mm: 4107.2 put_hevc_epel_uni_hv64_8_c: 33880.2 put_hevc_epel_uni_hv64_8_i8mm: 6568.7 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	c0374f77f4	lavc/aarch64: move macros calc_epelh, calc_epelh2, load_epel_filterh Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	7ce5a2f640	lavc/aarch64: new optimization for 8-bit hevc_epel_uni_v checkasm bench: put_hevc_epel_uni_hv64_8_i8mm: 6568.7 put_hevc_epel_uni_v4_8_c: 88.7 put_hevc_epel_uni_v4_8_neon: 32.7 put_hevc_epel_uni_v6_8_c: 185.4 put_hevc_epel_uni_v6_8_neon: 44.9 put_hevc_epel_uni_v8_8_c: 333.9 put_hevc_epel_uni_v8_8_neon: 44.4 put_hevc_epel_uni_v12_8_c: 728.7 put_hevc_epel_uni_v12_8_neon: 119.7 put_hevc_epel_uni_v16_8_c: 1224.2 put_hevc_epel_uni_v16_8_neon: 139.7 put_hevc_epel_uni_v24_8_c: 2531.2 put_hevc_epel_uni_v24_8_neon: 329.9 put_hevc_epel_uni_v32_8_c: 4739.9 put_hevc_epel_uni_v32_8_neon: 562.7 put_hevc_epel_uni_v48_8_c: 10618.7 put_hevc_epel_uni_v48_8_neon: 1256.2 put_hevc_epel_uni_v64_8_c: 19169.9 put_hevc_epel_uni_v64_8_neon: 2179.2 Co-Authored-By: J. Dekker <jdek@itanimul.li> Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Casey Smalley	b98ee1a355	aarch64/hevc: Replace br return with ret This patch changes the return instruction in the tr_32x4 macro from BR to RET. Function returns should always use the RET instruction instead of BR, to avoid interfering with branch prediction. On devices that support BTI, this is observeable as a landing pad is required when branching with BR. The change fixes fate-hevc-hdr-vivid-metadata when on hardware with BTI support. Signed-off-by: Casey Smalley <casey.smalley@arm.com> Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Reimar Döffinger	dcff15692d	hevcdsp_idct_neon.S: Avoid unnecessary mov. ret can be given an argument instead. This is also consistent with how other assembler code in FFmpeg does it. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	1 year ago
Rémi Denis-Courmont	82cb4b1c05	lavc/aarch64: remove bogus HAVE_VFP guard The IMDCT offset is only relevant for NEON optimisations. There are no VFP optimisations here that would justify the HAVE_VFP flag. In practice, this makes no difference because HAVE_NEON is practically always true for standard Armv8 platforms.	1 year ago
Logan Lyu	9557bf26b3	lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_hv put_hevc_epel_uni_w_hv4_8_c: 254.6 put_hevc_epel_uni_w_hv4_8_i8mm: 102.9 put_hevc_epel_uni_w_hv6_8_c: 411.6 put_hevc_epel_uni_w_hv6_8_i8mm: 221.6 put_hevc_epel_uni_w_hv8_8_c: 669.4 put_hevc_epel_uni_w_hv8_8_i8mm: 214.9 put_hevc_epel_uni_w_hv12_8_c: 1412.6 put_hevc_epel_uni_w_hv12_8_i8mm: 481.4 put_hevc_epel_uni_w_hv16_8_c: 2425.4 put_hevc_epel_uni_w_hv16_8_i8mm: 647.4 put_hevc_epel_uni_w_hv24_8_c: 5384.1 put_hevc_epel_uni_w_hv24_8_i8mm: 1450.6 put_hevc_epel_uni_w_hv32_8_c: 9470.9 put_hevc_epel_uni_w_hv32_8_i8mm: 2497.1 put_hevc_epel_uni_w_hv48_8_c: 20930.1 put_hevc_epel_uni_w_hv48_8_i8mm: 5635.9 put_hevc_epel_uni_w_hv64_8_c: 36682.9 put_hevc_epel_uni_w_hv64_8_i8mm: 9712.6 Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	d48c89701c	lavc/aarch64: new optimization for 8-bit hevc_epel_h put_hevc_epel_h4_8_c: 67.1 put_hevc_epel_h4_8_i8mm: 21.1 put_hevc_epel_h6_8_c: 147.1 put_hevc_epel_h6_8_i8mm: 45.1 put_hevc_epel_h8_8_c: 237.4 put_hevc_epel_h8_8_i8mm: 72.1 put_hevc_epel_h12_8_c: 527.4 put_hevc_epel_h12_8_i8mm: 115.4 put_hevc_epel_h16_8_c: 943.6 put_hevc_epel_h16_8_i8mm: 153.9 put_hevc_epel_h24_8_c: 2105.4 put_hevc_epel_h24_8_i8mm: 384.4 put_hevc_epel_h32_8_c: 3631.4 put_hevc_epel_h32_8_i8mm: 519.9 put_hevc_epel_h48_8_c: 8082.1 put_hevc_epel_h48_8_i8mm: 1110.4 put_hevc_epel_h64_8_c: 14400.6 put_hevc_epel_h64_8_i8mm: 2057.1 put_hevc_qpel_h4_8_c: 124.9 put_hevc_qpel_h4_8_neon: 43.1 put_hevc_qpel_h4_8_i8mm: 33.1 put_hevc_qpel_h6_8_c: 269.4 put_hevc_qpel_h6_8_neon: 90.6 put_hevc_qpel_h6_8_i8mm: 61.4 put_hevc_qpel_h8_8_c: 477.6 put_hevc_qpel_h8_8_neon: 82.1 put_hevc_qpel_h8_8_i8mm: 99.9 put_hevc_qpel_h12_8_c: 1062.4 put_hevc_qpel_h12_8_neon: 226.9 put_hevc_qpel_h12_8_i8mm: 170.9 put_hevc_qpel_h16_8_c: 1880.6 put_hevc_qpel_h16_8_neon: 302.9 put_hevc_qpel_h16_8_i8mm: 251.4 put_hevc_qpel_h24_8_c: 4221.9 put_hevc_qpel_h24_8_neon: 893.9 put_hevc_qpel_h24_8_i8mm: 626.1 put_hevc_qpel_h32_8_c: 7437.6 put_hevc_qpel_h32_8_neon: 1189.9 put_hevc_qpel_h32_8_i8mm: 959.1 put_hevc_qpel_h48_8_c: 16838.4 put_hevc_qpel_h48_8_neon: 2727.9 put_hevc_qpel_h48_8_i8mm: 2163.9 put_hevc_qpel_h64_8_c: 29982.1 put_hevc_qpel_h64_8_neon: 4777.6 Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	668eb4c00e	lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_v put_hevc_epel_uni_w_v4_8_c: 116.1 put_hevc_epel_uni_w_v4_8_neon: 48.6 put_hevc_epel_uni_w_v6_8_c: 248.9 put_hevc_epel_uni_w_v6_8_neon: 80.6 put_hevc_epel_uni_w_v8_8_c: 383.9 put_hevc_epel_uni_w_v8_8_neon: 91.9 put_hevc_epel_uni_w_v12_8_c: 806.1 put_hevc_epel_uni_w_v12_8_neon: 202.9 put_hevc_epel_uni_w_v16_8_c: 1411.1 put_hevc_epel_uni_w_v16_8_neon: 289.9 put_hevc_epel_uni_w_v24_8_c: 3168.9 put_hevc_epel_uni_w_v24_8_neon: 619.4 put_hevc_epel_uni_w_v32_8_c: 5632.9 put_hevc_epel_uni_w_v32_8_neon: 1161.1 put_hevc_epel_uni_w_v48_8_c: 12406.1 put_hevc_epel_uni_w_v48_8_neon: 2476.4 put_hevc_epel_uni_w_v64_8_c: 22001.4 put_hevc_epel_uni_w_v64_8_neon: 4343.9 Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	0c604b1913	lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_h put_hevc_epel_uni_w_h4_8_c: 126.1 put_hevc_epel_uni_w_h4_8_i8mm: 41.6 put_hevc_epel_uni_w_h6_8_c: 222.9 put_hevc_epel_uni_w_h6_8_i8mm: 91.4 put_hevc_epel_uni_w_h8_8_c: 374.4 put_hevc_epel_uni_w_h8_8_i8mm: 102.1 put_hevc_epel_uni_w_h12_8_c: 806.1 put_hevc_epel_uni_w_h12_8_i8mm: 225.6 put_hevc_epel_uni_w_h16_8_c: 1414.4 put_hevc_epel_uni_w_h16_8_i8mm: 333.4 put_hevc_epel_uni_w_h24_8_c: 3128.6 put_hevc_epel_uni_w_h24_8_i8mm: 713.1 put_hevc_epel_uni_w_h32_8_c: 5519.1 put_hevc_epel_uni_w_h32_8_i8mm: 1118.1 put_hevc_epel_uni_w_h48_8_c: 12364.4 put_hevc_epel_uni_w_h48_8_i8mm: 2541.1 put_hevc_epel_uni_w_h64_8_c: 21925.9 put_hevc_epel_uni_w_h64_8_i8mm: 4383.6 Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	e652e7dcda	lavc/aarch64: new optimization for 8-bit hevc_pel_uni_pixels put_hevc_pel_uni_pixels4_8_c: 35.9 put_hevc_pel_uni_pixels4_8_neon: 7.6 put_hevc_pel_uni_pixels6_8_c: 46.1 put_hevc_pel_uni_pixels6_8_neon: 20.6 put_hevc_pel_uni_pixels8_8_c: 53.4 put_hevc_pel_uni_pixels8_8_neon: 11.6 put_hevc_pel_uni_pixels12_8_c: 89.1 put_hevc_pel_uni_pixels12_8_neon: 25.9 put_hevc_pel_uni_pixels16_8_c: 106.4 put_hevc_pel_uni_pixels16_8_neon: 20.4 put_hevc_pel_uni_pixels24_8_c: 137.6 put_hevc_pel_uni_pixels24_8_neon: 47.1 put_hevc_pel_uni_pixels32_8_c: 173.6 put_hevc_pel_uni_pixels32_8_neon: 54.1 put_hevc_pel_uni_pixels48_8_c: 268.1 put_hevc_pel_uni_pixels48_8_neon: 117.1 put_hevc_pel_uni_pixels64_8_c: 346.1 put_hevc_pel_uni_pixels64_8_neon: 205.9 Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	e79686be96	lavc/aarch64: new optimization for 8-bit hevc_qpel_h hevc_qpel_uni_w_hv Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	15972cce8c	lavc/aarch64: new optimization for 8-bit hevc_qpel_uni_w_h Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Logan Lyu	0b7356c1b4	lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels and qpel_uni_w_v Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
xufuji456	bd2f00f665	codec/aarch64/hevc: add transform_luma_neon got 56% speed up (run_count=1000, CPU=Cortex A53) transform_4x4_luma_neon: 45 transform_4x4_luma_c: 103 Signed-off-by: xufuji456 <839789740@qq.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
xufuji456	00a062b8d5	codec/aarch64/hevc:add idct_32x32_neon got 73% speed up (run_count=1000, CPU=Cortex A53) idct_32x32_neon: 4826 idct_32x32_c: 18236 idct_32x32_neon: 4824 idct_32x32_c: 18149 idct_32x32_neon: 4937 idct_32x32_c: 18333 Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
J. Dekker	b564ad8eac	lavc/aarch64: add hevc deblock chroma 8-12bit Benched on Ampere Altra: hevc_h_loop_filter_chroma8_c: 367.7 hevc_h_loop_filter_chroma8_neon: 31.0 hevc_h_loop_filter_chroma10_c: 396.7 hevc_h_loop_filter_chroma10_neon: 27.5 hevc_h_loop_filter_chroma12_c: 377.0 hevc_h_loop_filter_chroma12_neon: 31.7 hevc_v_loop_filter_chroma8_c: 369.0 hevc_v_loop_filter_chroma8_neon: 55.0 hevc_v_loop_filter_chroma10_c: 389.0 hevc_v_loop_filter_chroma10_neon: 54.0 hevc_v_loop_filter_chroma12_c: 389.5 hevc_v_loop_filter_chroma12_neon: 53.0 Signed-off-by: J. Dekker <jdek@itanimul.li>	2 years ago
J. Dekker	37cde570bc	lavc/aarch64: add clip N macro Signed-off-by: J. Dekker <jdek@itanimul.li>	2 years ago
xufuji456	4b4de07721	libavcodec/hevc: add hevc idct4x4 neon of aarch64 Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	ec7fa13eb0	aarch64: hevcdsp_idct: Reuse preexisting macros for transposes Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Lynne	e0661fc805	dca_core: convert to lavu/tx Thanks to Martin Storsjö <martin@martin.st> for fixing and testing the arm32 and aarch64 changes.	2 years ago
J. Dekker	9bed814e1d	lavc/aarch64: add hevc horizontal qpel/uni/bi checkasm --benchmark on Ampere Altra (Neoverse N1): put_hevc_qpel_bi_h4_8_c: 170.7 put_hevc_qpel_bi_h4_8_neon: 64.5 put_hevc_qpel_bi_h6_8_c: 373.7 put_hevc_qpel_bi_h6_8_neon: 130.2 put_hevc_qpel_bi_h8_8_c: 662.0 put_hevc_qpel_bi_h8_8_neon: 138.5 put_hevc_qpel_bi_h12_8_c: 1529.5 put_hevc_qpel_bi_h12_8_neon: 422.0 put_hevc_qpel_bi_h16_8_c: 2735.5 put_hevc_qpel_bi_h16_8_neon: 560.5 put_hevc_qpel_bi_h24_8_c: 6015.7 put_hevc_qpel_bi_h24_8_neon: 1636.0 put_hevc_qpel_bi_h32_8_c: 10779.0 put_hevc_qpel_bi_h32_8_neon: 2204.5 put_hevc_qpel_bi_h48_8_c: 24375.0 put_hevc_qpel_bi_h48_8_neon: 4984.0 put_hevc_qpel_bi_h64_8_c: 42768.0 put_hevc_qpel_bi_h64_8_neon: 8795.7 put_hevc_qpel_h4_8_c: 149.0 put_hevc_qpel_h4_8_neon: 55.7 put_hevc_qpel_h6_8_c: 321.2 put_hevc_qpel_h6_8_neon: 106.0 put_hevc_qpel_h8_8_c: 578.7 put_hevc_qpel_h8_8_neon: 133.2 put_hevc_qpel_h12_8_c: 1279.0 put_hevc_qpel_h12_8_neon: 391.7 put_hevc_qpel_h16_8_c: 2286.2 put_hevc_qpel_h16_8_neon: 519.7 put_hevc_qpel_h24_8_c: 5100.7 put_hevc_qpel_h24_8_neon: 1546.2 put_hevc_qpel_h32_8_c: 9022.0 put_hevc_qpel_h32_8_neon: 2060.2 put_hevc_qpel_h48_8_c: 20293.5 put_hevc_qpel_h48_8_neon: 4656.7 put_hevc_qpel_h64_8_c: 36037.0 put_hevc_qpel_h64_8_neon: 8262.7 put_hevc_qpel_uni_h4_8_c: 162.2 put_hevc_qpel_uni_h4_8_neon: 61.7 put_hevc_qpel_uni_h6_8_c: 355.2 put_hevc_qpel_uni_h6_8_neon: 114.2 put_hevc_qpel_uni_h8_8_c: 651.0 put_hevc_qpel_uni_h8_8_neon: 135.7 put_hevc_qpel_uni_h12_8_c: 1412.5 put_hevc_qpel_uni_h12_8_neon: 402.7 put_hevc_qpel_uni_h16_8_c: 2551.0 put_hevc_qpel_uni_h16_8_neon: 533.5 put_hevc_qpel_uni_h24_8_c: 5782.2 put_hevc_qpel_uni_h24_8_neon: 1578.7 put_hevc_qpel_uni_h32_8_c: 10586.5 put_hevc_qpel_uni_h32_8_neon: 2102.2 put_hevc_qpel_uni_h48_8_c: 23812.0 put_hevc_qpel_uni_h48_8_neon: 4739.5 put_hevc_qpel_uni_h64_8_c: 42958.7 put_hevc_qpel_uni_h64_8_neon: 8366.5 Signed-off-by: J. Dekker <jdek@itanimul.li>	2 years ago
Reimar Döffinger	38cd829dce	aarch64: Implement stack spilling in a consistent way. Currently it is done in several different ways, which might cause needless dependencies or in case of tx_float_neon.S is incorrect. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2 years ago
Grzegorz Bernacki	8f4b000c37	lavc/aarch64: Add neon implementation for vsse_intra8 Provide optimized implementation for vsse_intra8 for arm64. Performance tests are shown below. - vsse_5_c: 87.7 - vsse_5_neon: 26.2 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Co-authored-by: Martin Storsjö <martin@martin.st> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Grzegorz Bernacki	bad67cb9fd	lavc/aarch64: Provide optimized implementation of vsse8 for arm64. Provide optimized implementation of vsse8 for arm64. Performance comparison tests are shown below. - vsse_1_c: 141.5 - vsse_1_neon: 32.5 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Grzegorz Bernacki	faea56c9c7	lavc/aarch64: Provide neon implementation of nsse8 Add vectorized implementation of nsse8 function. Performance comparison tests are shown below. - nsse_1_c: 256.0 - nsse_1_neon: 82.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Grzegorz Bernacki	f401a2af21	lavc/aarch64: Add neon implementation for pix_abs8 functions. Provide optimized implementation of pix_abs8 function for arm64. Performance comparison tests are shown below: pix_abs_1_1_c: 162.5 pix_abs_1_1_neon: 27.0 pix_abs_1_2_c: 174.0 pix_abs_1_2_neon: 23.5 pix_abs_1_3_c: 203.2 pix_abs_1_3_neon: 34.7 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Co-authored-by: Martin Storsjö <martin@martin.st> Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	8089fe072e	aarch64: me_cmp: Avoid using the non-unrolled codepath for the minimum unroll size Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	6f2ad7f951	aarch64: me_cmp: Avoid redundant loads in ff_pix_abs16_y2_neon This avoids one redundant load per row; pix3 from the previous iteration can be used as pix2 in the next one. Before: Cortex A53 A72 A73 pix_abs_0_2_neon: 138.0 59.7 48.0 After: pix_abs_0_2_neon: 109.7 50.2 39.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Andreas Rheinhardt	9beba05311	avcodec/fmtconvert: Remove unused AVCodecContext parameter Unused since `d74a8cb7e4`. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Hubert Mazur	b2732115dd	lavc/aarch64: Add neon implementation for pix_median_abs8 Provide optimized implementation for pix_median_abs8 function. Performance comparison tests are shown below. - median_sad_1_c: 277.0 - median_sad_1_neon: 82.0 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Hubert Mazur	e9a6170213	lavc/aarch64: Add neon implementation for vsad8_intra Provide optimized implementation for vsad8_intra function. Performance comparison tests are shown below. - vsad_5_c: 94.7 - vsad_5_neon: 20.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Hubert Mazur	0ee535b1db	lavc/aarch64: Add neon implementation for pix_median_abs16 Provide optimized implementation for pix_median_abs16 function. Performance comparison tests are shown below. - median_sad_0_c: 720.5 - median_sad_0_neon: 127.2 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Rémi Denis-Courmont	b52034270a	lavc/vorbisdsp: use ptrdiff_t rather than intptr_t ... for a difference between pointers.	2 years ago
Andreas Rheinhardt	a54e53a1c4	avcodec/vp8dsp: Constify src in vp8_mc_func Reviewed-by: Peter Ross <pross@xvid.org> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2 years ago
Hubert Mazur	06b98e396a	lavc/aarch64: Provide neon implementation of nsse16 Add vectorized implementation of nsse16 function. Performance comparison tests are shown below. - nsse_0_c: 682.2 - nsse_0_neon: 116.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Co-authored-by: Martin Storsjö <martin@martin.st> Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago

1 2 3 4 5 ...

361 Commits (c59a96fd08620bd8239c218f2e0dfb8429c81c3c)