FFmpeg

Commit Graph

Author	SHA1	Message	Date
Rémi Denis-Courmont	f032234953	aarch64: remove VFP feature check This is not actually used for anything. The configure check causes the CPU feature flag to be set, but nothing consumes it at all. While AArch64 does have VFP, it is only used for the scalar C code. Conversely, it is still possible to disable VFP, by changing the C compiler flags as before (though that only makes sense for an hypothetical non-standard Armv8 platform without VFP). Note that this retains the "vfp" option flag, for backward compatibility and on the very remote but theoretically possible chance that FFmpeg actually makes use of it in the future. AV_CPU_FLAG_VFP is retained as it is actually used by AArch32.	1 year ago
Martin Storsjö	c76643021e	aarch64: Add Windows runtime detection of the dotprod instructions For Windows, there's no publicly defined constant for checking for the i8mm extension yet. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	9b0052200a	aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl For now, there's not much value in this since Clang don't support enabling the dotprod or i8mm features with either .arch_extension or .arch (it has to be enabled by the base arch flags passed to the compiler). But it may be supported in the future. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	493fcde50a	aarch64: Add Linux runtime cpu feature detection using HWCAP_CPUID Based partially on code by Janne Grunau. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	397cb623c8	aarch64: Add cpu flags for the dotprod and i8mm extensions Set these available if they are available unconditionally for the compiler. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Martin Storsjö	fb1b88af77	configure: aarch64: Support assembling the dotprod and i8mm arch extensions These are available since ARMv8.4-a and ARMv8.6-a respectively, but can also be available optionally since ARMv8.2-a. Check if ".arch armv8.2-a" and ".arch_extension {dotprod,i8mm}" are supported, and check if the instructions can be assembled. Current clang versions fail to support the dotprod and i8mm features in the .arch_extension directive, but do support them if enabled with -march=armv8.4-a on the command line. (Curiously, lowering the arch level with ".arch armv8.2-a" doesn't make the extensions unavailable if they were enabled with -march; if that changes, Clang should also learn to support these extensions via .arch_extension for them to remain usable here.) Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Lynne	87bae6b018	lavu/tx: refactor to explicitly track and convert lookup table order Necessary for generalizing PFAs.	2 years ago
Reimar Döffinger	38cd829dce	aarch64: Implement stack spilling in a consistent way. Currently it is done in several different ways, which might cause needless dependencies or in case of tx_float_neon.S is incorrect. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2 years ago
Lynne	a89025f74d	aarch64/tx_float: fix compilation Forgot to add the new function arguments.	2 years ago
Rémi Denis-Courmont	164021423a	aarch64: relax byte-swap assembler constraints There are no particular reasons to force the compiler to use the same register as output and input operand. This forces an extra MOV instruction if the input value needs to be reused after the swap. In most cases, this makes no differences, as the compiler will seleect the same register for both operands either way. Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Lynne	f932b89ea3	lavu/tx: implement aarch64 NEON SIMD FFT The fastest fast Fourier transform in not just the west, but the world, now for the most popular toy ISA. On a high level, it follows the design of the AVX2 version closely, with the exception that the input is slightly less permuted as we don't have to do lane switching with the input on double 4pt and 8pt. On a low level, the lack of subadd/addsub instructions REALLY penalizes any attempt at writing an FFT. That single register matters a lot, and reloading it simply takes unacceptably long. In x86 land, vendors would've noticed developers need this. In ARM land, you get a badly designed complex multiplication instruction we cannot use, that's not present on 95% of devices. Because only compilers matter, right? Future optimization options are very few, perhaps better register management to use more ld1/st1s. All timings below are in cycles: A53: Length \| C \| New (lavu) \| Old (lavc) \| FFTW ------ \|-------------\|-------------\|-------------\|----- 4 \| 842 \| 420 \| 1210 \| 1460 8 \| 1538 \| 1020 \| 1850 \| 2520 16 \| 3717 \| 1900 \| 3700 \| 3990 32 \| 9156 \| 4070 \| 8289 \| 8860 64 \| 21160 \| 9931 \| 18600 \| 19625 128 \| 49180 \| 23278 \| 41922 \| 41922 256 \| 112073 \| 53876 \| 93202 \| 101092 512 \| 252864 \| 122884 \| 205897 \| 207868 1024 \| 560512 \| 278322 \| 458071 \| 453053 2048 \| 1295402 \| 775835 \| 1038205 \| 1020265 4096 \| 3281263 \| 2021221 \| 2409718 \| 2577554 8192 \| 8577845 \| 4780526 \| 5673041 \| 6802722 Apple M1 New - Total for len 512 reps 2097152 = 1.459141 s Old - Total for len 512 reps 2097152 = 2.251344 s FFTW - Total for len 512 reps 2097152 = 1.868429 s New - Total for len 1024 reps 4194304 = 6.490080 s Old - Total for len 1024 reps 4194304 = 9.604949 s FFTW - Total for len 1024 reps 4194304 = 7.889281 s New - Total for len 16384 reps 262144 = 10.374001 s Old - Total for len 16384 reps 262144 = 15.266713 s FFTW - Total for len 16384 reps 262144 = 12.341745 s New - Total for len 65536 reps 8192 = 1.769812 s Old - Total for len 65536 reps 8192 = 4.209413 s FFTW - Total for len 65536 reps 8192 = 3.012365 s New - Total for len 131072 reps 4096 = 1.942836 s Old - Segfaults FFTW - Total for len 131072 reps 4096 = 3.713713 s Thanks to wbs for some simplifications, assembler fixes and a review and to jannau for giving it a look.	2 years ago
Martin Storsjö	c3fea6d83b	aarch64: Only emit the PAC/BTI note section when targeting ELF This avoids build errors if such features are enabled while targeting another binary format. (Using such features on other platforms might require some other form of signaling/setup though, but the ELF specific .note section isn't applicable at least.) Signed-off-by: Martin Storsjö <martin@martin.st>	3 years ago
Andre Kempe	248986a0db	arm64: Add Armv8.3-A PAC support to assembly files This patch adds optional support for Arm Pointer Authentication Codes. PAC support is turned on or off at compile time using additional compiler flags. Unless any of these is enabled explicitly, no additional code will be emitted at all. Signed-off-by: André Kempe <andre.kempe@arm.com> Signed-off-by: Martin Storsjö <martin@martin.st>	3 years ago
Jonathan Wright	08b4716a9e	aarch64: Add Armv8.5-A BTI support Add Branch Target Identifiers (BTIs) to all functions defined in AArch64 assembly files. Most of the BTI landing pads are added automatically by the 'function' macro. BTI support is turned on or off at compile time based on the presence of the __ARM_FEATURE_BTI_DEFAULT feature macro. A binary compiled with BTI support can be executed on an Armv8-A processor without BTI support because the instructions are defined in NOP space. Signed-off-by: Jonathan Wright <jonathan.wright@arm.com> Signed-off-by: Elijah Ahmad <elijah.ahmad@arm.com> Signed-off-by: Martin Storsjö <martin@martin.st>	3 years ago
Martin Storsjö	ee040a7fc2	arm/aarch64: Use mach_absolute_time as timer on apple platforms This is much less precise than the cycle counter register, but the cycle counter register is not available on apple platforms (and on linux, it requires a kernel module for allowing user mode access). Signed-off-by: Martin Storsjö <martin@martin.st>	4 years ago
Martin Storsjö	07948f3d38	aarch64: Explicitly forbid using the x18 register On windows and darwin (and modern android), the x18 register is reserved and shouldn't be modified by user code, while it is freely available on linux. Strictly avoid it, to keep the assembly code portable. This would have helped catch the issue fixed in `872790b1f9` immediately. Signed-off-by: Martin Storsjö <martin@martin.st>	5 years ago
Peter Collingbourne	9bcb1cb6ed	Add assembly support for -fsanitize=hwaddress tagged globals. As of LLVM r368102, Clang will set a pointer tag in bits 56-63 of the address of a global when compiling with -fsanitize=hwaddress. This requires an adjustment to assembly code that takes the address of such globals: the code cannot use the regular R_AARCH64_ADR_PREL_PG_HI21 relocation to refer to the global, since the tag would take the address out of range. Instead, the code must use the non-checking (_NC) variant of the relocation (the link-time check is substituted by a runtime check). This change makes the necessary adjustment in the movrel macro, where it is needed when compiling with -fsanitize=hwaddress. Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Martin Storsjö Reviewed-by: Janne Grunau	5 years ago
Martin Storsjö	41cf3e3b1c	arm: Create proper .rdata sections for COFF As .rodata isn't one of the default created sections for COFF, it was created as a read-write data section. By using the default .rdata section name for COFF, it automatically becomes a read-only data section. The existing ".section .rodata" works as intended for ELF though. This is based on an original patch and diagnose by Tom Tan <Tom.Tan@microsoft.com>. Signed-off-by: Martin Storsjö <martin@martin.st>	6 years ago
Diego Biurrun	4cf84e254a	Drop some unnecessary config.h #includes	7 years ago
Martin Storsjö	69ac24e556	aarch64: Get rid of a stray double space The extra space got included as part of the expansion of ELF, which later interfered with gas-preprocessor which earlier only stripped out leftover lines starting with '#' if the line started with that char. Signed-off-by: Martin Storsjö <martin@martin.st>	7 years ago
James Almer	3d828c9fd5	cpu: split flag checks per arch in av_cpu_max_align() Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	7 years ago
James Almer	3b345d389b	avutil/cpu: split flag checks per arch in av_cpu_max_align() Signed-off-by: James Almer <jamrial@gmail.com>	7 years ago
Martin Storsjö	7b7760ad6e	aarch64: Fix negative movrel offsets for windows On windows, the offset for the relocation doesn't get stored in the relocation itself, but as an unsigned immediate in the opcode. Therefore, negative offsets has to be handled via a separate sub instruction, just as on MachO. Signed-off-by: Martin Storsjö <martin@martin.st>	7 years ago
Martin Storsjö	dda45c087b	aarch64: Add parentheses around the offset parameter in movrel This fixes building with clang for linux with PIC enabled. This is cherrypicked from libav commit `8847eeaa14`. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	8847eeaa14	aarch64: Add parentheses around the offset parameter in movrel This fixes building with clang for linux with PIC enabled. Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Martin Storsjö	7fe898dbb9	aarch64: Add an offset parameter to the movrel macro With apple tools, the linker fails with errors like these, if the offset is negative: ld: in section __TEXT,__text reloc 8: symbol index out of range for architecture arm64 This is cherry-picked from libav commit `c44a8a3eab`. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	8 years ago
Martin Storsjö	c44a8a3eab	aarch64: Add an offset parameter to the movrel macro With apple tools, the linker fails with errors like these, if the offset is negative: ld: in section __TEXT,__text reloc 8: symbol index out of range for architecture arm64 Signed-off-by: Martin Storsjö <martin@martin.st>	8 years ago
Timothy Gu	44304ae322	all: Add missing header guards	9 years ago
Janne Grunau	64034849da	arm64: add cycle counter support The ISB (instruction synchronization barrier) might be too heavy for START/STOPTIMER use but should be more accurate in checkasm where the timing overhead is subtracted.	9 years ago
Martin Storsjö	780cd20b00	aarch64: Use .data.rel.ro for const data with relocations This reverts commit `c00365b46d` in addition to using a different section. Signed-off-by: Martin Storsjö <martin@martin.st>	10 years ago
Janne Grunau	a238b83b13	aarch64: use MACH-O const data asm directive in const macro	10 years ago
Janne Grunau	d5a5598198	build: check if AS supports the '.func' directive Not supported by Clang's integrated assembler. Since it just adds debug information it can safely omitted.	11 years ago
Janne Grunau	68a06b3a63	aarch64: use '#' for whole line asm comments Both gnu as and clang treat lines starting with '#' as comments if they aren't consumed by the C-style preprocessor. Using '//' does not work with clang since comments are removed before macro expansion.	11 years ago
Janne Grunau	6a0fa4d86f	aarch64: remove optional :pg_hi21: for adrp instruction Clang's integrated assembler does not support it.	11 years ago
Janne Grunau	fd2981ea92	aarch64: add darwin style PAGE/PAGEOFF relocations	11 years ago
Martin Storsjö	08cd92144e	aarch64: Use the correct syntax for relocations This fixes building in PIC mode with gas. The examples in the gas manual showed using a # here even though gas itself actually didn't support that syntax (and the gas test suite only tests it without the extra hash sign). CC: libav-stable@libav.org Signed-off-by: Martin Storsjö <martin@martin.st>	11 years ago
Janne Grunau	8675bcb0ad	aarch64: add armv8 CPU flag	11 years ago
Janne Grunau	dbd12523a4	aarch64: float_dsp NEON assembler Ported from arm NEON and added vector_dmul_scalar. Functions between 1.5 and 5 times faster than the C implementations using Apple's clang-503.0.19 on A7.	11 years ago
Janne Grunau	9c029f67ca	aarch64: use EXTERN_ASM consistently for exported symbols Based on `e3fec3f095` for arm.	11 years ago
Janne Grunau	fe96769bed	aarch64: port neon clobber test from arm	11 years ago
Janne Grunau	b7b17ed66e	aarch64: add cpuflags support for NEON and VFP NEON and VFP are currently mandatory for all ARMv8 profiles. Both are handled as extensions as far as cpuflags are concerned. This is consistent with handling x86_64 which always has SSE2, but still handles it as an extension.	11 years ago
Janne Grunau	d0cd2a8c46	aarch64: bswap inline assembly Signed-off-by: Janne Grunau <janne-libav@jannau.net>	11 years ago

1 2

60 Commits (463a472426cc665e64e1c6f6677bb2142abe85e3)