Loongson3's extention instructions (prefixed with gs) are widely used
in our MMI codebase. However, these instructions are not avilable on
Loongson-2E/F while MMI code should work on these processors.
Previously we introduced mmiutils marcos to provide backward compactbility
but newly commited code didn't follow that. In this patch I revised the
codebase and converted all these instructions into MMI marcos to get
Loongson2 supproted again.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Clang is more strict on the type of asm operands, float or double
type variable should use constraint 'f', integer variable should
use constraint 'r'.
Signed-off-by: Jin Bo <jinbo@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1.'xor,or,and' to 'pxor,por,pand'. In the case of operating FPR,
gcc supports both of them, clang only supports the second type.
2.'dsrl,srl' to 'ssrld,ssrlw'. In the case of operating FPR, gcc
supports both of them, clang only supports the second type.
Signed-off-by: Jin Bo <jinbo@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Remove invalid operation in the case x and y all equal 0,
this refine made about 2% speedup for H264 decode on loongson platform.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Reoptimize function ff_put_h264_chroma_mc8_mmi and ff_avg_h264_chroma_mc8_mmi.
Performance of h264 decoding improved about 5%(from 69fps to 73fps, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1.MMI_ load/store macros are defined in libavutil/mips/mmiutils.h
2.Replace some unnecessary unaligned access with aligned operator
3.The MMI_ load/store is compatible with cpu loongson2e/2f which not support instructions start with gs
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1. no longer use the register names directly and optimized code format
2. to be compatible with O32, specify type of address variable with mips_reg and handle the address variable with PTR_ operator
3. use uld and mtc1 to workaround cpu 3A2000 gslwlc1 bug (gslwlc1 instruction extension bug in O32 ABI)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>