Diego Biurrun
dae1d507af
x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags
12 years ago
Diego Biurrun
320e1d0df3
x86: ABSB2: port to cpuflags
12 years ago
Diego Biurrun
094a7405e5
x86: ABSB: port to cpuflags
12 years ago
Diego Biurrun
51969a652c
x86: ABS2: port to cpuflags
12 years ago
Diego Biurrun
5b4dfbffc2
x86: ABS1: port to cpuflags
12 years ago
Ronald S. Bultje
a34d9ad969
lavc: merge latest x86inc.asm fixes with x264
...
Unbreak NASM support.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
12 years ago
Janne Grunau
0995ad8db4
x86inc: fully concatenate tokens to fix macro expansion for nasm
...
Fixes build errors with nasm introduced in 6f40e9f070
for stack
memory alignment. Noticed by BugMaster.
12 years ago
Ronald S. Bultje
140367aff9
x86inc: fix stack alignment on win64
...
Signed-off-by: Martin Storsjö <martin@martin.st>
12 years ago
Ronald S. Bultje
6f40e9f070
x86inc: support stack mem allocation and re-alignment in PROLOGUE
...
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
12 years ago
Justin Ruggles
1c012e6bfb
x86: float_dsp: fix loading of the len parameter on x86-32
12 years ago
Justin Ruggles
ecc8b02194
x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32
...
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
12 years ago
Justin Ruggles
b30a363331
x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling
12 years ago
Justin Ruggles
ac7eb4cb20
float_dsp: add vector_dmul_scalar() to multiply a vector of doubles
...
Include x86-optimized versions for SSE2 and AVX.
12 years ago
Diego Biurrun
490df522c7
x86: cpu: Drop unused HAVE_RWEFLAGS condition
...
The test for rweflags was dropped in a previous commit.
12 years ago
Justin Ruggles
947f933687
x86: float_dsp: add SSE version of vector_fmul_scalar()
12 years ago
Diego Biurrun
87af05c575
x86: SPLATD: port to cpuflags
12 years ago
Diego Biurrun
26301caaa1
x86: mmx2 ---> mmxext in asm constructs
12 years ago
Diego Biurrun
2b479bcab0
build: Drop AVX assembly ifdefs
...
An assembler able to cope with AVX instructions is now required.
12 years ago
Diego Biurrun
f0d124f005
x86inc: Set program_name outside of x86inc.asm
...
This reduces the local difference to the x264 upstream version.
12 years ago
Diego Biurrun
4b60fac419
x86: PALIGNR: port to cpuflags
12 years ago
Diego Biurrun
dbb37e7711
x86: PABSW: port to cpuflags
12 years ago
Diego Biurrun
0a7a94f2e5
x86: Refactor PSWAPD fallback implementations and port to cpuflags
12 years ago
Diego Biurrun
26f01bd106
x86: PMINUB: port to cpuflags
12 years ago
Diego Biurrun
61bc2bc7d4
x86util: Add cpuflags_mmxext alias for cpuflags_mmx2
...
"mmxext" is a more sensible name and more common in outside projects.
12 years ago
Diego Biurrun
012f73e271
x86inc: Only define program_name if the macro is unset
...
This allows overriding the value from outside of the file.
12 years ago
Dave Yeo
9c167914a1
x86: Fix assembly with NASM
...
Unlike YASM, NASM only looks for include files in the current
directory, not in the directory that included files reside in.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
12 years ago
Diego Biurrun
588fafe7f3
x86: MMX2 ---> MMXEXT in macro names
12 years ago
Diego Biurrun
6860b4081d
x86: include x86inc.asm in x86util.asm
...
This is necessary to allow refactoring some x86util macros with cpuflags.
12 years ago
Ronald S. Bultje
08b028c18d
Remove INIT_AVX from x86inc.asm.
12 years ago
Diego Biurrun
a7329e5fc2
x86: get_cpu_flags: add necessary ifdefs around function body
...
ff_get_cpu_flags_x86() requires cpuid(), which is conditionally defined
elsewhere in the file. Surrounding the function body with ifdefs allows
building even when cpuid is not defined. An empty cpuflags mask is
returned in this case.
12 years ago
Diego Biurrun
f6fbce761e
x86: Drop CPU detection intrinsics
...
Now that there is CPU detection in YASM, there will always be one of
inline or external assembly enabled, which obviates the need to fall
back on CPU detection through compiler intrinsics.
12 years ago
Diego Biurrun
1f6d86991f
x86: Add YASM implementations of cpuid and xgetbv from x264
...
This allows detecting CPU features with builds that have neither
gcc inline assembly nor the right compiler intrinsics enabled.
12 years ago
Diego Biurrun
54b243141e
x86: cpu: Break out test for cpuid capabilities into separate function
12 years ago
Diego Biurrun
cc5e9e5ff0
x86: ff_get_cpu_flags_x86(): Avoid a pointless variable indirection
12 years ago
Diego Biurrun
e0c6cce447
x86: Replace checks for CPU extensions and flags by convenience macros
...
This separates code relying on inline from that relying on external
assembly and fixes instances where the coalesced check was incorrect.
13 years ago
Justin Ruggles
7327525997
x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64
...
The SWAP macro does not work for explicit xmm/ymm usage, so instead just move
the scalar value from xmm2 to xmm0.
13 years ago
Diego Biurrun
f82c4fb27f
x86: Add convenience macros to check for CPU extensions and flags
13 years ago
Diego Biurrun
17337f54c0
x86: Split inline and external assembly #ifdefs
13 years ago
Diego Biurrun
a886b279a0
x86: cosmetics: Comment some #endifs for better readability
13 years ago
Loren Merritt
7a1944b907
vf_hqdn3d: x86 asm
...
13% faster on penryn, 16% on sandybridge, 15% on bulldozer
Not simd; a compiler should have generated this, but gcc didn't.
13 years ago
Justin Ruggles
6092dafb5a
lavr: x86: optimized 6-channel s16 to fltp conversion
13 years ago
Mans Rullgard
5b170c0bea
x86: remove FASTDIV inline asm
...
GCC 4.3 and later do the right thing with the plain C code. Earlier
versions in 32-bit mode generate one extra instruction, needlessly
zeroing what would be the high half of the shifted value. At least
two gcc configurations miscompile the inline asm in some situations.
In 64-bit mode, all gcc versions generate imul r64, r64 followed by
shr. On Intel i7 and later, this imul is faster 32-bit mul. On
older Intel and all AMD, it is slightly slower. On Atom it is much
slower.
Considering where the FASTDIV macro is used, any overall negative
performance impact of this change should be negligible. If anyone
cares, they should file a bug against gcc and get the instruction
selection fixed.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Martin Storsjö
33e112847d
Add more missing includes after removing the implicit common.h
...
Signed-off-by: Martin Storsjö <martin@martin.st>
13 years ago
Martin Storsjö
70766c2182
Add some more missing includes after removing the implicit common.h
...
Signed-off-by: Martin Storsjö <martin@martin.st>
13 years ago
Mans Rullgard
070a402b60
x86: move MANGLE() and related macros to libavutil/x86/asm.h
...
These x86-specific macros do not belong in generic code.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Mans Rullgard
c318626ce2
x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
...
This puts x86-specific things in the x86/ subdirectory where they
belong.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Mans Rullgard
edd8226795
x86: fix build with nasm 2.08
...
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro. This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
13 years ago
Mans Rullgard
180d43bc67
x86: use nop cpu directives only if supported
...
nasm does not support 'CPU foonop' directives. This adds a configure
test for the directive and uses it only if supported.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Mans Rullgard
7238265052
x86: fix rNmp macros with nasm
...
For some reason, nasm requires this. No harm done to yasm.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago
Mans Rullgard
a3df4781f4
x86: add colons after labels
...
nasm prints a warning if the colon is missing.
Signed-off-by: Mans Rullgard <mans@mansr.com>
13 years ago