There are no particular reasons to force the compiler to use the same
register as output and input operand. This forces an extra MOV
instruction if the input value needs to be reused after the swap.
In most cases, this makes no differences, as the compiler will seleect
the same register for both operands either way.
Signed-off-by: Martin Storsjö <martin@martin.st>
getauxval is marginally faster, and works even when procfs is not mounted
support on Linux was added in glibc 2.16
support on Android was added in 4.4 (API 20)
fixes#6578
Signed-off-by: Aman Karmani <aman@tmm1.net>
This is much less precise than the cycle counter register, but
the cycle counter register is not available on apple platforms
(and on linux, it requires a kernel module for allowing user mode
access).
Signed-off-by: Martin Storsjö <martin@martin.st>
As .rodata isn't one of the default created sections for COFF, it was
created as a read-write data section. By using the default .rdata
section name for COFF, it automatically becomes a read-only data section.
The existing ".section .rodata" works as intended for ELF though.
This is based on an original patch and diagnose by Tom Tan
<Tom.Tan@microsoft.com>.
Signed-off-by: Martin Storsjö <martin@martin.st>
Prior to Xcode 9.3, the clang built-in assembler didn't support
altmacro, and gas-preprocessor was used for assembling for arm/darwin.
For thumb functions, gas-preprocessor took care of adding the .thumb_func
directives, but when now being able to assemble without gas-preprocessor,
we need to add these directives ourselves.
Signed-off-by: Martin Storsjö <martin@martin.st>
This is the same combination of .section directives as used in
aarch64/asm.S.
Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.
Signed-off-by: Martin Storsjö <martin@martin.st>
Add av_sat_sub32 and av_sat_dsub32 as the subtraction analogues to
av_sat_add32/av_sat_dadd32.
Also clarify the formulas for dadd32/dsub32.
Signed-off-by: Andrew D'Addesio <modchipv12@gmail.com>
When targeting windows, the .arch directive isn't available.
So far, when building for windows, we've always used gas-preprocessor,
both when using msvc's armasm and when using clang. Lately, clang/llvm
has implemented the last missing piece (altmacro support) for building
our assembly without gas-preprocessor. This means that we now build
for arm/windows with clang without any extra compatibility layer.
Signed-off-by: Martin Storsjö <martin@martin.st>
We reset .Lpic_gp to zero at the start of each function, which means
that the logic within movrelx for clearing gp when necessary will
be missed.
This fixes using movrelx in different functions with a different
helper register.
This is cherry-picked from libav commit
824e8c2840.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
We reset .Lpic_gp to zero at the start of each function, which means
that the logic within movrelx for clearing gp when necessary will
be missed.
This fixes using movrelx in different functions with a different
helper register.
Signed-off-by: Martin Storsjö <martin@martin.st>
When targeting COFF (windows), clang doesn't support this
directive (while binutils supports it for all targets).
Signed-off-by: Martin Storsjö <martin@martin.st>
This fixes builds with --disable-vfp.
Checking for the armv6 cpu flag is incorrect, since vfpv2 isn't
armv6 specific.
Signed-off-by: Martin Storsjö <martin@martin.st>
The vector mode was deprecated in ARMv7-A/VFPv3 and various cpu
implementations do not support it in hardware. Vector mode code will
depending the OS either be emulated in software or result in an illegal
instruction on cpus which does not support it. This was not really
problem in practice since NEON implementations of the same functions are
preferred. It will however become a problem for checkasm which tests
every cpu flag separately.
Since this is a cpu feature newer cpu do not support anymore the
behaviour of this flag differs from the other flags. It can be only
activated by runtime cpu feature selection.
The C functions return uint8/16_t but that is effectively int not unsigned int
Fixes fate-filter-tblend
We do not return uint8/16_t as that would require the compiler to truncate the
values, slowing it down.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Without this check it causes SIGILL crashes on ARMv5.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
When all the codepaths using manually set .arch/.fpu code is
behind runtime detection, the elf attributes should be suppressed.
This allows tools to know that the final built binary doesn't
strictly require these extensions.
Signed-off-by: Martin Storsjö <martin@martin.st>
add ARM code for implementing av_clip_intp2 using the ssat instruction
on Cortex-A8, av_clip_intp2_arm() is faster than av_clip_intp2_c() and
the generic av_clip(), about -19%
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
this allows disabling and enabling it
it also prevents crashes if vfpv3 and neon are disabled which previously
would have enabled the flag
And last but not least one can enable setend on cpus like cortex-a8 where
its fast but disabled by default
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in butterflies_float_c() / ff_butterflies_float_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1542.8 43.7 1470.5 41.5 100.0% +4.9%
butterflies_float 130.0 11.9 70.2 12.1 100.0% +85.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1598.2 47.4 1529.2 25.4 100.0% +4.5%
vector_fmul_window 244.0 22.1 188.9 22.3 100.0% +29.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in butterflies_float_c() / ff_butterflies_float_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1542.8 43.7 1470.5 41.5 100.0% +4.9%
butterflies_float 130.0 11.9 70.2 12.1 100.0% +85.2%
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1598.2 47.4 1529.2 25.4 100.0% +4.5%
vector_fmul_window 244.0 22.1 188.9 22.3 100.0% +29.2%
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
When running on a 64 bit kernel, /proc/cpuinfo lists different
optional features than on 32 bit kernels (because some of them
are mandatory in the 64 bit implemenations).
The kernel does list the old features properly if they are queried
via /proc/self/auxv though - however this file is not always readable
(e.g. on most android systems). The getauxval function could also
provide the same info as /proc/self/auxv even if this file isn't
readable, but this function is not always available (and thus would
need to be loaded with dlsym for compatibility with older android
versions).
The android cpufeatures library does this slightly differently,
by assuming that these are available if the "CPU architecture"
line is >= 8, see [1] for details.
It has been suggested to include the old, non-optional features in
/proc/cpuinfo as well, but that suggested patch never was merged.
See [2] for the discussion around this suggestion.
[1] https://android-review.googlesource.com/91380
[2] http://marc.info/?l=linux-arm-kernel&m=139087240101974
Signed-off-by: Martin Storsjö <martin@martin.st>
If linking in an object file without this attribute set, the
linker will assume that an executable stack might be needed.
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes the generated assembly more internally consistent,
avoiding declaring two labels for the same function (for cases
where EXTERN_ASM is empty) and not declaring a separate unprefixed
label in other cases.
This also makes sure the .func and .type delcarations have the same
prefix. They have previously not been used on the platforms
that have prefixed symbols on arm (iOS), but gas-preprocessor
has recently started using the .func declarations for adding
.thumb_func declarations for such functions.
Signed-off-by: Martin Storsjö <martin@martin.st>
The function macro always sets .align 2 before declaring the
function label (since 5c5e1ea3) and always sets the section to
.text (since 278caa6a).
The .align 5 before certain functions, added in fc252eba, were added
before .text and .align were added to the function macro and thus
became useless/unused when the function macro got them.
This restores the original intention, to align the loop entry
points.
Signed-off-by: Martin Storsjö <martin@martin.st>
This matches the other eabi attribute in the same file. This is
required in order to build for arm/hardfloat with other object
file formats than ELF.
Signed-off-by: Martin Storsjö <martin@martin.st>
On recent android versions, /proc/self/auxw is unreadable
(unless the process is running running under the shell uid or
in debuggable mode, which makes it hard to notice). See
http://b.android.com/43055 and
https://android-review.googlesource.com/51271 for more information
about the issue.
This makes sure e.g. neon optimizations are enabled at runtime in
android apps even when built in release mode, if configured to
use the runtime detection.
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>