This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC uni mc epel functions.
Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC uniw mc functions (qpel as well as epel) in new file hevc_mc_uniw_msa.c
Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC bi mc functions (qpel as well as epel) in new file hevc_mc_bi_msa.c
Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h
Adds HEVC specific macros (needed for this patch) in libavcodec/mips/hevc_macros_msa.h
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch moves HEVC code of uni mc cases to new file hevc_mc_uni_msa.c.
(There are total 5 sub-modules of HEVC mc functions, if we add all these modules in one single file, its size would be huge (~750k) & difficult to maintain, so splitting it in multiple files)
This patch also adds new HEVC header file libavcodec/mips/hevc_macros_msa.h
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch includes restructuring of existing macros and addition of more generic macros.
This change was necessary to avoid repeated review comments in remaining patches which we were about to submit.
Also this patch reduces number of code lines due to maximum use of generic macros, allows better code alignment & readability etc.
These modifications in commonly used .libavutil/mips/generic_macros_msa.h. impacts the already accepted code, hence re-submitting it in 2/4,3/4 & 4/4.
Overall, this patch set is just upgrading the code with styling changes and will bring it in sync with MIPS-SIMD optimized latest codebase at our end.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Commit dfa9208074 ("mips/float_dsp: fix a bug in vector_fmul_window_mips")
fixed vector_fmul_window_mips by unrolling the loop only 4 times, but also
removed the outer C loop and replaced it with assembly branches and pointer
arithmetic. When submitting my 64-bit porting patch I missed this new
assembly which also needed porting.
This patch fixes a bus error in the fate-float-dsp test when run on 64-bit
mips.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Reviewed-by: Nedeljko Babic <nedeljko.babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Unfortunately android < api 21 (lollipop) doesn't have the sgidefs.h header,
the easiest way around this is to just use the preprocessor definitions from
gcc / clang.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This mainly consists of replacing all the pointer arithmatic 'addiu'
instructions with PTR_ADDIU which will handle the differences in pointer
sizes when compiled on 64 bit mips systems.
The header asmdefs.h contains the PTR_ macros which expend to the correct mips
instructions to manipulate registers containing pointers.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Loop was unrolled eight times although in heder there is assumption
that len is multiple of 4.
This is fixed, and assembly code is rewritten to be more optimal and
to simplify clobber list.
Signed-off-by: Nedeljko Babic <nedeljko.babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
__volatile__ can cause problems with some compilers and volatile is a standard keyword.
Found-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
AMR NB and WB decoders are optimized for MIPS architecture.
Appropriate Makefiles are changed accordingly.
Cnfigure script is changed in order to support optimizations.
Optimizations are enabled by default when compiling is done for
mips architecture.
Appropriate cflags are automatically set.
Support for several mips CPUs is added in configure script.
New ffmpeg options are added for disabling optimizations.
The FFMPEG option --disable-mipsfpu disables MIPS floating point
optimizations.
The FFMPEG option --disable-mips32r2 disables MIPS32R2
optimizations.
The FFMPEG option --disable-mipsdspr1 disables MIPS DSP ASE R1
optimizations.
The FFMPEG option --disable-mipsdspr2 disables MIPS DSP ASE R2
optimizations.
Signed-off-by: Nedeljko Babic <nbabic@mips.com>
Reviewed-by: Vitor Sessak <vitor1001@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Just like gcc 4.6 and later on ARM, gcc 4.8 on MIPS generates
inefficient code when a known-unaligned location is used as a
memory input operand. This applies the same fix as has been
previously done to the ARM version of the code.
Signed-off-by: Mans Rullgard <mans@mansr.com>
GCC actually handles unaligned accesses correctly in all cases
except, absurdly, 32-bit loads on mips64. The remaining asm is
thus not needed, and removing it results in better code.
Signed-off-by: Mans Rullgard <mans@mansr.com>