./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
-s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
-cpuflags 0 -v error -
32-bit mul, power8 only.
~2x speedup:
rgb24
24431 UNITS in yuv2packed2, 16384 runs, 0 skips
13783 UNITS in yuv2packed2, 16383 runs, 1 skips
bgr24
24396 UNITS in yuv2packed2, 16384 runs, 0 skips
14059 UNITS in yuv2packed2, 16384 runs, 0 skips
rgba
26815 UNITS in yuv2packed2, 16383 runs, 1 skips
12797 UNITS in yuv2packed2, 16383 runs, 1 skips
bgra
27060 UNITS in yuv2packed2, 16384 runs, 0 skips
13138 UNITS in yuv2packed2, 16384 runs, 0 skips
argb
26998 UNITS in yuv2packed2, 16384 runs, 0 skips
12728 UNITS in yuv2packed2, 16381 runs, 3 skips
bgra
26651 UNITS in yuv2packed2, 16384 runs, 0 skips
13124 UNITS in yuv2packed2, 16384 runs, 0 skips
This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.