This patch modifies HEVC mc MIPS-SIMD optimized code according to improved version of generic macros.
Overall, this patch is just upgrading the code with styling changes and will bring it in sync with MIPS-SIMD optimized latest codebase at our end.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch moves HEVC code of uni mc cases to new file hevc_mc_uni_msa.c.
(There are total 5 sub-modules of HEVC mc functions, if we add all these modules in one single file, its size would be huge (~750k) & difficult to maintain, so splitting it in multiple files)
This patch also adds new HEVC header file libavcodec/mips/hevc_macros_msa.h
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch modifies H264 loopfilter, weighted & bi-weighted prediction MIPS-SIMD optimized code according to improved version of generic macros.
Also there are minor code alignment changes.
Overall, this patch is just upgrading the code with styling changes and will bring it in sync with MIPS-SIMD optimized latest codebase at our end.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
HAVE_LOONGSON is replaced by HAVE_LOONGSON3. Even Loongson-2E and 2F support
Loongson SIMD instructs but have low performance for decoding. We plan to focus
on optimizing Loongson-3A1000, 3B1500 and 3A1500, and modify the configure file
to support Loongson-2 series later by adding HAVE_LOONGSON2.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This mainly consists of replacing all the pointer arithmatic 'addiu'
instructions with PTR_ADDIU which will handle the differences in pointer
sizes when compiled on 64 bit mips systems.
The header asmdefs.h contains the PTR_ macros which expend to the correct mips
instructions to manipulate registers containing pointers.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
There are no independant uses of mips32r2 instructions except for the
FPU parts. Due to the heavy use of mips32r2 specifc fpu extensions, I
am guessing the original author intended MIPSFPU to imply MIPS32R2 anyway.
Since these fpu instructions are available on mips64 (non-r2), enable them
there as well.
Also remove the last occurence of HAVE_MIPS32R2 (which is coupled to
HAVE_MIPSFPU anyway).
mips32r2 is left in the list of options form compatability so that using
--disable-mips32r2 doesn't break anything.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Removing these removes the dependency of this code on mips32r2 which would
allow it to be used on processors which have FPU instructions, but not r2
instructions (like the mips64el debian port for instance).
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
On mips64, the registers t[4-7] do not exist. Instead of using a lot of #ifdef
or defines to handle differing register names, use variables and let GCC
allocate the registers automatically (like in the other mips assembly files).
In get_band_cost_ESC_mips, t4 and t5 were renamed to t6 and t7 to avoid a
variable name conflict.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This is obviously needed for 64-bit support.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Change register constraint on the v variable from = to +. This was causing GCC
to think that the v variable was never read and therefore not initialize it.
This fixes about 20 fate failures on mips64el.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The float_copy and fmul_and_reverse functions are refactored out from the
multiple copies in this file.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The optimized C version of this code actually runs faster than this
version, so remove it.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Remove some assembly that the compiler can easily handle optimally on its own.
GCC produces almost identical assembly.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Q_fract should have be declared as 'const float*'.
Also fix the constness of some local variables affected by this.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
GCC is perfectly happy generating optimized multiplication code on its own for
64-bit arches. GCC refuses to optimize the loongson code when in 32-bit mode,
so I've left that.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Iterative implementation of 32 bit fixed point split-radix FFT.
Max FFT that can be calculated currently is 2^12.
Signed-off-by: Nedeljko Babic <nbabic@mips.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Floating point FFT (nips optimized) breaks when hard coded tables are
not enabled because MIPS optimization of floating point FFT uses only
ff_init_ff_cos_tabs(16) which is not enabled by default in that case.
This patch is fixing it.
Signed-off-by: Nedeljko Babic <nbabic@mips.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
List of clobbered registers fixed and added where it is lacking.
Signed-off-by: Nedeljko Babic <nbabic@mips.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Add dependencies on HAVE_INLINE_ASM for files and parts of code
where it is necessary.
Signed-off-by: Nedeljko Babic <nbabic@mips.com>
Reviewed-by: Vitor Sessak <vitor1001@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
FFT in MIPS implementation is working iteratively instead
of "recursively" calling functions for smaller FFT sizes.
Some of DSP and format convert utils functions are also optimized.
Signed-off-by: Nedeljko Babic <nbabic@mips.com>
Reviewed-by: Vitor Sessak <vitor1001@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
__volatile__ can cause problems with some compilers and volatile is a standard keyword.
Found-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
MP3 fixed and floating point decoders are optimized
for MIPS architecture.
Signed-off-by: Nedeljko Babic <nbabic@mips.com>
Reviewed-by: Vitor Sessak <vitor1001@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
AMR NB and WB decoders are optimized for MIPS architecture.
Appropriate Makefiles are changed accordingly.
Cnfigure script is changed in order to support optimizations.
Optimizations are enabled by default when compiling is done for
mips architecture.
Appropriate cflags are automatically set.
Support for several mips CPUs is added in configure script.
New ffmpeg options are added for disabling optimizations.
The FFMPEG option --disable-mipsfpu disables MIPS floating point
optimizations.
The FFMPEG option --disable-mips32r2 disables MIPS32R2
optimizations.
The FFMPEG option --disable-mipsdspr1 disables MIPS DSP ASE R1
optimizations.
The FFMPEG option --disable-mipsdspr2 disables MIPS DSP ASE R2
optimizations.
Signed-off-by: Nedeljko Babic <nbabic@mips.com>
Reviewed-by: Vitor Sessak <vitor1001@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>