FFmpeg

Commit Graph

Author	SHA1	Message	Date
Martin Storsjö	b1ee2af843	aarch64: Print the SVE vector length in libavutil/tests/cpu.c This makes this aspect more visible in test logs. Signed-off-by: Martin Storsjö <martin@martin.st>	3 months ago
Martin Storsjö	e6eabb7ce7	aarch64: Add CPU feature flags for SVE and SVE2 Add code for detecting the feature on Linux and Windows. Signed-off-by: Martin Storsjö <martin@martin.st>	3 months ago
Martin Storsjö	e6e56fd7a7	configure: Add detection of assembler support for SVE/SVE2 Signed-off-by: Martin Storsjö <martin@martin.st>	3 months ago
Martin Storsjö	067abbfe9d	aarch64: Detect I8MM on Windows via SVE-I8MM There's no direct processor feature constant for I8MM alone, but there is a flag for SVE-I8MM (added in WinSDK 10.0.26100 and recent versions of mingw-w64). If SVE-I8MM is available, we can assume that I8MM is available. While HW supporting these features isn't yet commonly running Windows, this at least allows detecting and running the I8MM codepaths in Windows builds in Wine (possibly running in QEMU). Signed-off-by: Martin Storsjö <martin@martin.st>	3 months ago
Brad Smith	a3f79fd22a	aarch64: Implement support for elf_aux_info(3) on FreeBSD and OpenBSD FreeBSD 12.0+, OpenBSD -current and what will be OpenBSD 7.6 support elf_aux_info(3). Signed-off-by: Brad Smith <brad@comstyle.com>	4 months ago
Brad Smith	fe4b9ef69f	avutil/cpu_internal: Provide ff_getauxval() wrapper for getauxvaul() Initially used for getauxval() but will be used to add support for other API, such as elf_aux_info(). Signed-off-by: Brad Smith <brad@comstyle.com>	4 months ago
Ramiro Polla	abb4e13a0a	avutil/aarch64: add AV_COPY128 and AV_ZERO128 macros	4 months ago
Brad Smith	41190da9e1	aarch64: Add OpenBSD runtime detection of dotprod and i8mm using sysctl Signed-off-by: Brad Smith <brad@comstyle.com>	6 months ago
Martin Storsjö	ab8f7030bc	aarch64: Use cntvct_el0 as timer register on Android and macOS The default timer register pmccntr_el0 usually requires enabling access with e.g. a kernel module (while it is accessible by default on Windows). On Linux, the default for checkasm benchmarks is to use perf (if suitable headers are available) though. On macOS, using cntvct_el0 gives measurements with the same magnitude as mach_absolute_time (which is used currently), but possibly with a little less overhead/noise. Signed-off-by: Martin Storsjö <martin@martin.st>	6 months ago
Rémi Denis-Courmont	c5f69719bc	lavu/bswap: remove some inline assembler C code or compiler built-ins are preferable over inline assembler for byte-swaps as it allows for better optimisations (e.g. instruction scheduling) which would otherwise be impossible. As with `f64c2e710f` for x86 and Arm, this removes the inline assembler on GCC (and Clang) since we now require recent enough compiler versions. This indeed seems to work on AArch64, SuperH and, if Zbb is enabled, RISC-V. (AVR32 was not tested since it has no known working compilers at this time.)	7 months ago
Zhao Zhili	6a18c0bc87	avutil/aarch64: Skip define AV_READ_TIME for apple It will fallback to mach_absolute_time inside libavutil/timer.h Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	7 months ago
Martin Storsjö	8339a45400	aarch64: Factorize code for CPU feature detection on Apple platforms Signed-off-by: Martin Storsjö <martin@martin.st>	9 months ago
Martin Storsjö	e30369bc1c	aarch64: Use regular hwcaps flags instead of HWCAP_CPUID for CPU feature detection on Linux This makes the code much simpler (especially for adding support for other instruction set extensions), avoids needing inline assembly for this feature, and generally is more of the canonical way to do this. The CPU feature detection was added in `493fcde50a`, using HWCAP_CPUID. The argument for using that, was that HWCAP_CPUID was added much earlier in the kernel (in Linux v4.11), while the HWCAP flags for individual features always come later. This allows detecting support for new CPU extensions before the kernel exposes information about them via hwcap flags. However in practice, there's probably quite little advantage in this. E.g. HWCAP2_I8MM was added in Linux v5.10 - long after HWCAP_CPUID, but there's probably very little practical cases where one would run a kernel older than that on a CPU that supports those instructions. Additionally, we provide our own definitions of the flag values to check (as they are fixed constants anyway), with names not conflicting with the ones from system headers. This reduces the number of ifdefs needed, and allows detecting those features even if building with userland headers that are lacking the definitions of those flags. Also, slightly older versions of QEMU, e.g. 6.2 in Ubuntu 22.04, do expose support for these features via HWCAP flags, but the emulated cpuid registers are missing the bits for exposing e.g. I8MM. (This issue is fixed in later versions of QEMU though.) Signed-off-by: Martin Storsjö <martin@martin.st>	10 months ago
Reimar Döffinger	0ea184fc39	libavutil/aarch64/cpu.c: HWCAPS requires inline asm support. Fixes compilation with tcc, which does not have aarch64 inline asm support.	1 year ago
Martin Storsjö	f05948ada4	aarch64: Simplify the linux runtime cpu detection code Skip doing the whole getauxval(AT_HWCAP) if HWCAP_CPUID isn't defined. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	a4877f1ec1	aarch64: Only enable extensions in the intended files/regions This eases actual development of the assembly functions, by only allowing extension instructions within the sections that explicitly enable them, instead of having all extensions enabled everywhere. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	0679e85331	aarch64: Stop using asm/hwcap.h for the HWCAP_* detection Including sys/auxv.h should be enough (it pulls in bits/hwcap.h, which provides the same defines). Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	cada4597ca	aarch64: Manually tweak vertical alignment/indentation in tx_float_neon.S Favour left aligned columns over right aligned columns. In principle either style should be ok, but some of the cases easily lead to incorrect indentation in the surrounding code (see a couple of cases fixed up in the preceding patch), and show up in automatic indentation correction attempts. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	7f905f3672	aarch64: Make the indentation more consistent Some functions have slightly different indentation styles; try to match the surrounding code. libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally uses a layered indentation style to visually show how different unrolled/interleaved phases fit together. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	184103b310	aarch64: Consistently use lowercase for vector element specifiers Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Rémi Denis-Courmont	f032234953	aarch64: remove VFP feature check This is not actually used for anything. The configure check causes the CPU feature flag to be set, but nothing consumes it at all. While AArch64 does have VFP, it is only used for the scalar C code. Conversely, it is still possible to disable VFP, by changing the C compiler flags as before (though that only makes sense for an hypothetical non-standard Armv8 platform without VFP). Note that this retains the "vfp" option flag, for backward compatibility and on the very remote but theoretically possible chance that FFmpeg actually makes use of it in the future. AV_CPU_FLAG_VFP is retained as it is actually used by AArch32.	1 year ago
Martin Storsjö	c76643021e	aarch64: Add Windows runtime detection of the dotprod instructions For Windows, there's no publicly defined constant for checking for the i8mm extension yet. Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	9b0052200a	aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl For now, there's not much value in this since Clang don't support enabling the dotprod or i8mm features with either .arch_extension or .arch (it has to be enabled by the base arch flags passed to the compiler). But it may be supported in the future. Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	493fcde50a	aarch64: Add Linux runtime cpu feature detection using HWCAP_CPUID Based partially on code by Janne Grunau. Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	397cb623c8	aarch64: Add cpu flags for the dotprod and i8mm extensions Set these available if they are available unconditionally for the compiler. Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	fb1b88af77	configure: aarch64: Support assembling the dotprod and i8mm arch extensions These are available since ARMv8.4-a and ARMv8.6-a respectively, but can also be available optionally since ARMv8.2-a. Check if ".arch armv8.2-a" and ".arch_extension {dotprod,i8mm}" are supported, and check if the instructions can be assembled. Current clang versions fail to support the dotprod and i8mm features in the .arch_extension directive, but do support them if enabled with -march=armv8.4-a on the command line. (Curiously, lowering the arch level with ".arch armv8.2-a" doesn't make the extensions unavailable if they were enabled with -march; if that changes, Clang should also learn to support these extensions via .arch_extension for them to remain usable here.) Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Lynne	87bae6b018	lavu/tx: refactor to explicitly track and convert lookup table order Necessary for generalizing PFAs.	2 years ago
Reimar Döffinger	38cd829dce	aarch64: Implement stack spilling in a consistent way. Currently it is done in several different ways, which might cause needless dependencies or in case of tx_float_neon.S is incorrect. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2 years ago
Lynne	a89025f74d	aarch64/tx_float: fix compilation Forgot to add the new function arguments.	2 years ago
Rémi Denis-Courmont	164021423a	aarch64: relax byte-swap assembler constraints There are no particular reasons to force the compiler to use the same register as output and input operand. This forces an extra MOV instruction if the input value needs to be reused after the swap. In most cases, this makes no differences, as the compiler will seleect the same register for both operands either way. Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Lynne	f932b89ea3	lavu/tx: implement aarch64 NEON SIMD FFT The fastest fast Fourier transform in not just the west, but the world, now for the most popular toy ISA. On a high level, it follows the design of the AVX2 version closely, with the exception that the input is slightly less permuted as we don't have to do lane switching with the input on double 4pt and 8pt. On a low level, the lack of subadd/addsub instructions REALLY penalizes any attempt at writing an FFT. That single register matters a lot, and reloading it simply takes unacceptably long. In x86 land, vendors would've noticed developers need this. In ARM land, you get a badly designed complex multiplication instruction we cannot use, that's not present on 95% of devices. Because only compilers matter, right? Future optimization options are very few, perhaps better register management to use more ld1/st1s. All timings below are in cycles: A53: Length \| C \| New (lavu) \| Old (lavc) \| FFTW ------ \|-------------\|-------------\|-------------\|----- 4 \| 842 \| 420 \| 1210 \| 1460 8 \| 1538 \| 1020 \| 1850 \| 2520 16 \| 3717 \| 1900 \| 3700 \| 3990 32 \| 9156 \| 4070 \| 8289 \| 8860 64 \| 21160 \| 9931 \| 18600 \| 19625 128 \| 49180 \| 23278 \| 41922 \| 41922 256 \| 112073 \| 53876 \| 93202 \| 101092 512 \| 252864 \| 122884 \| 205897 \| 207868 1024 \| 560512 \| 278322 \| 458071 \| 453053 2048 \| 1295402 \| 775835 \| 1038205 \| 1020265 4096 \| 3281263 \| 2021221 \| 2409718 \| 2577554 8192 \| 8577845 \| 4780526 \| 5673041 \| 6802722 Apple M1 New - Total for len 512 reps 2097152 = 1.459141 s Old - Total for len 512 reps 2097152 = 2.251344 s FFTW - Total for len 512 reps 2097152 = 1.868429 s New - Total for len 1024 reps 4194304 = 6.490080 s Old - Total for len 1024 reps 4194304 = 9.604949 s FFTW - Total for len 1024 reps 4194304 = 7.889281 s New - Total for len 16384 reps 262144 = 10.374001 s Old - Total for len 16384 reps 262144 = 15.266713 s FFTW - Total for len 16384 reps 262144 = 12.341745 s New - Total for len 65536 reps 8192 = 1.769812 s Old - Total for len 65536 reps 8192 = 4.209413 s FFTW - Total for len 65536 reps 8192 = 3.012365 s New - Total for len 131072 reps 4096 = 1.942836 s Old - Segfaults FFTW - Total for len 131072 reps 4096 = 3.713713 s Thanks to wbs for some simplifications, assembler fixes and a review and to jannau for giving it a look.	2 years ago
Martin Storsjö	c3fea6d83b	aarch64: Only emit the PAC/BTI note section when targeting ELF This avoids build errors if such features are enabled while targeting another binary format. (Using such features on other platforms might require some other form of signaling/setup though, but the ELF specific .note section isn't applicable at least.) Signed-off-by: Martin Storsjö <martin@martin.st>	3 years ago
Andre Kempe	248986a0db	arm64: Add Armv8.3-A PAC support to assembly files This patch adds optional support for Arm Pointer Authentication Codes. PAC support is turned on or off at compile time using additional compiler flags. Unless any of these is enabled explicitly, no additional code will be emitted at all. Signed-off-by: André Kempe <andre.kempe@arm.com> Signed-off-by: Martin Storsjö <martin@martin.st>	3 years ago
Jonathan Wright	08b4716a9e	aarch64: Add Armv8.5-A BTI support Add Branch Target Identifiers (BTIs) to all functions defined in AArch64 assembly files. Most of the BTI landing pads are added automatically by the 'function' macro. BTI support is turned on or off at compile time based on the presence of the __ARM_FEATURE_BTI_DEFAULT feature macro. A binary compiled with BTI support can be executed on an Armv8-A processor without BTI support because the instructions are defined in NOP space. Signed-off-by: Jonathan Wright <jonathan.wright@arm.com> Signed-off-by: Elijah Ahmad <elijah.ahmad@arm.com> Signed-off-by: Martin Storsjö <martin@martin.st>	3 years ago
Martin Storsjö	ee040a7fc2	arm/aarch64: Use mach_absolute_time as timer on apple platforms This is much less precise than the cycle counter register, but the cycle counter register is not available on apple platforms (and on linux, it requires a kernel module for allowing user mode access). Signed-off-by: Martin Storsjö <martin@martin.st>	4 years ago
Martin Storsjö	07948f3d38	aarch64: Explicitly forbid using the x18 register On windows and darwin (and modern android), the x18 register is reserved and shouldn't be modified by user code, while it is freely available on linux. Strictly avoid it, to keep the assembly code portable. This would have helped catch the issue fixed in `872790b1f9` immediately. Signed-off-by: Martin Storsjö <martin@martin.st>	5 years ago
Peter Collingbourne	9bcb1cb6ed	Add assembly support for -fsanitize=hwaddress tagged globals. As of LLVM r368102, Clang will set a pointer tag in bits 56-63 of the address of a global when compiling with -fsanitize=hwaddress. This requires an adjustment to assembly code that takes the address of such globals: the code cannot use the regular R_AARCH64_ADR_PREL_PG_HI21 relocation to refer to the global, since the tag would take the address out of range. Instead, the code must use the non-checking (_NC) variant of the relocation (the link-time check is substituted by a runtime check). This change makes the necessary adjustment in the movrel macro, where it is needed when compiling with -fsanitize=hwaddress. Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Martin Storsjö Reviewed-by: Janne Grunau	5 years ago
Martin Storsjö	41cf3e3b1c	arm: Create proper .rdata sections for COFF As .rodata isn't one of the default created sections for COFF, it was created as a read-write data section. By using the default .rdata section name for COFF, it automatically becomes a read-only data section. The existing ".section .rodata" works as intended for ELF though. This is based on an original patch and diagnose by Tom Tan <Tom.Tan@microsoft.com>. Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Diego Biurrun	4cf84e254a	Drop some unnecessary config.h #includes	7 years ago
Martin Storsjö	69ac24e556	aarch64: Get rid of a stray double space The extra space got included as part of the expansion of ELF, which later interfered with gas-preprocessor which earlier only stripped out leftover lines starting with '#' if the line started with that char. Signed-off-by: Martin Storsjö <martin@martin.st>	7 years ago
James Almer	3d828c9fd5	cpu: split flag checks per arch in av_cpu_max_align() Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	7 years ago
James Almer	3b345d389b	avutil/cpu: split flag checks per arch in av_cpu_max_align() Signed-off-by: James Almer <jamrial@gmail.com>	7 years ago
Martin Storsjö	7b7760ad6e	aarch64: Fix negative movrel offsets for windows On windows, the offset for the relocation doesn't get stored in the relocation itself, but as an unsigned immediate in the opcode. Therefore, negative offsets has to be handled via a separate sub instruction, just as on MachO. Signed-off-by: Martin Storsjö <martin@martin.st>	7 years ago
Martin Storsjö	dda45c087b	aarch64: Add parentheses around the offset parameter in movrel This fixes building with clang for linux with PIC enabled. This is cherrypicked from libav commit `8847eeaa14`. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	8847eeaa14	aarch64: Add parentheses around the offset parameter in movrel This fixes building with clang for linux with PIC enabled. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	7fe898dbb9	aarch64: Add an offset parameter to the movrel macro With apple tools, the linker fails with errors like these, if the offset is negative: ld: in section __TEXT,__text reloc 8: symbol index out of range for architecture arm64 This is cherry-picked from libav commit `c44a8a3eab`. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	8 years ago
Martin Storsjö	c44a8a3eab	aarch64: Add an offset parameter to the movrel macro With apple tools, the linker fails with errors like these, if the offset is negative: ld: in section __TEXT,__text reloc 8: symbol index out of range for architecture arm64 Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Timothy Gu	44304ae322	all: Add missing header guards	9 years ago
Janne Grunau	64034849da	arm64: add cycle counter support The ISB (instruction synchronization barrier) might be too heavy for START/STOPTIMER use but should be more accurate in checkasm where the timing overhead is subtracted.	9 years ago
Martin Storsjö	780cd20b00	aarch64: Use .data.rel.ro for const data with relocations This reverts commit `c00365b46d` in addition to using a different section. Signed-off-by: Martin Storsjö <martin@martin.st>	10 years ago

1 2

80 Commits (534eef2acea153b25303f406c6de2efae067a5e7)