|
|
|
@ -8,7 +8,7 @@ I - Introduction |
|
|
|
|
|
|
|
|
|
The PowerPC architecture and its SIMD extension AltiVec offer some |
|
|
|
|
interesting tools to evaluate performance and improve the code. |
|
|
|
|
This document try to explain how to use those tools with FFmpeg. |
|
|
|
|
This document tries to explain how to use those tools with FFmpeg. |
|
|
|
|
|
|
|
|
|
The architecture itself offers two ways to evaluate the performance of |
|
|
|
|
a given piece of code: |
|
|
|
@ -16,31 +16,31 @@ a given piece of code: |
|
|
|
|
1) The Time Base Registers (TBL) |
|
|
|
|
2) The Performance Monitor Counter Registers (PMC) |
|
|
|
|
|
|
|
|
|
The firsts are always available, always active, but they're not very |
|
|
|
|
accurate : the registers increment by one every four *bus* cycle. On |
|
|
|
|
my 667 Mhz tibook (ppc7450) , this means once every twenty *processor* |
|
|
|
|
cycle. So we won't use that. |
|
|
|
|
The first ones are always available, always active, but they're not very |
|
|
|
|
accurate: the registers increment by one every four *bus* cycles. On |
|
|
|
|
my 667 Mhz tiBook (ppc7450), this means once every twenty *processor* |
|
|
|
|
cycles. So we won't use that. |
|
|
|
|
|
|
|
|
|
The PMC are much more useful : not only they can report cycle-accurate |
|
|
|
|
The PMC are much more useful: not only can they report cycle-accurate |
|
|
|
|
timing, but they can also be used to monitor many other parameters, |
|
|
|
|
such as the number of AltiVec stalls for every kind of instructions, |
|
|
|
|
such as the number of AltiVec stalls for every kind of instruction, |
|
|
|
|
or instruction cache misses. The downside is that not all processors |
|
|
|
|
support the PMC (all G3, all G4 and the 970 do support them), and |
|
|
|
|
they're inactive by default - you need to activate them with a |
|
|
|
|
dedicated tool. Also, the number of available PMC depend on the |
|
|
|
|
procesor : the various 604 have 2, the various 75x (aka. G3) have 4, |
|
|
|
|
anbd the various 74xx (aka G4) have 6. |
|
|
|
|
dedicated tool. Also, the number of available PMC depends on the |
|
|
|
|
procesor: the various 604 have 2, the various 75x (aka. G3) have 4, |
|
|
|
|
and the various 74xx (aka G4) have 6. |
|
|
|
|
|
|
|
|
|
*WARNING*: The powerpc 970 is not very well documented, and its PMC |
|
|
|
|
registers are 64bits wide. To properly notify the code, you *must* |
|
|
|
|
tune for the 970 (using --tune=970), or the code will assume 32bits |
|
|
|
|
*WARNING*: The PowerPC 970 is not very well documented, and its PMC |
|
|
|
|
registers are 64 bits wide. To properly notify the code, you *must* |
|
|
|
|
tune for the 970 (using --tune=970), or the code will assume 32 bit |
|
|
|
|
registers. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
II - Enabling FFmpeg PowerPC performance support |
|
|
|
|
|
|
|
|
|
This need to be done by hand. First, you need to configure FFmpeg as |
|
|
|
|
usual, plus using the "--powerpc-perf-enable". for instance : |
|
|
|
|
This needs to be done by hand. First, you need to configure FFmpeg as |
|
|
|
|
usual, but add the "--powerpc-perf-enable" option. For instance: |
|
|
|
|
|
|
|
|
|
##### |
|
|
|
|
./configure --prefix=/usr/local/ffmpeg-cvs --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable |
|
|
|
@ -48,8 +48,8 @@ usual, plus using the "--powerpc-perf-enable". for instance : |
|
|
|
|
|
|
|
|
|
This will configure FFmpeg to install inside /usr/local/ffmpeg-cvs, |
|
|
|
|
compiling with gcc-3.3 (you should try to use this one or a newer |
|
|
|
|
gcc), and tuning for the PowerPC7450 (i.e. the newer G4 ; as a rule of |
|
|
|
|
thumb, those at 550Mhz and more). It will also enables the PMCs. |
|
|
|
|
gcc), and tuning for the PowerPC 7450 (i.e. the newer G4; as a rule of |
|
|
|
|
thumb, those at 550Mhz and more). It will also enable the PMC. |
|
|
|
|
|
|
|
|
|
You may also edit the file "config.h" to enable the following line: |
|
|
|
|
|
|
|
|
@ -59,24 +59,24 @@ You may also edit the file "config.h" to enable the following line: |
|
|
|
|
|
|
|
|
|
If you enable this line, then the code will not make use of AltiVec, |
|
|
|
|
but will use the reference C code instead. This is useful to compare |
|
|
|
|
performance between the two versions of the code. |
|
|
|
|
performance between two versions of the code. |
|
|
|
|
|
|
|
|
|
Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h" : |
|
|
|
|
Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h": |
|
|
|
|
|
|
|
|
|
##### |
|
|
|
|
#define POWERPC_NUM_PMC_ENABLED 4 |
|
|
|
|
##### |
|
|
|
|
|
|
|
|
|
If you have a G4 cpus, you can enable all 6 PMCs. DO NOT enable more |
|
|
|
|
PMCs than available on your cpu ! |
|
|
|
|
If you have a G4 CPU, you can enable all 6 PMC. DO NOT enable more |
|
|
|
|
PMC than available on your CPU! |
|
|
|
|
|
|
|
|
|
Then, simply compile ffmpeg as usual (make && make install). |
|
|
|
|
Then, simply compile FFmpeg as usual (make && make install). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
III - Using FFmpeg PowerPC performance support |
|
|
|
|
|
|
|
|
|
This FFmeg can be used exactly as usual. But before exiting, Ffmpeg |
|
|
|
|
This FFmeg can be used exactly as usual. But before exiting, FFmpeg |
|
|
|
|
will dump a per-function report that looks like this: |
|
|
|
|
|
|
|
|
|
##### |
|
|
|
@ -99,16 +99,16 @@ PowerPC performance report |
|
|
|
|
##### |
|
|
|
|
|
|
|
|
|
In this example, PMC1 was set to record CPU cycles, PMC2 was set to |
|
|
|
|
record AltiVec Permute Stall Cycle, and PMC3 was set to record AltiVec |
|
|
|
|
record AltiVec Permute Stall Cycles, and PMC3 was set to record AltiVec |
|
|
|
|
Issue Stalls. |
|
|
|
|
|
|
|
|
|
The function "gmc1_altivec" was monitored 255302 times, and the |
|
|
|
|
minimum execution time was 231 processor cycles. The max and average |
|
|
|
|
aren't much use, as it's very likely the OS interrupted execution for |
|
|
|
|
reasons of it's own :-( |
|
|
|
|
reasons of its own :-( |
|
|
|
|
|
|
|
|
|
With the exact same setting and source file, but using the reference C |
|
|
|
|
code we get : |
|
|
|
|
With the exact same settings and source file, but using the reference C |
|
|
|
|
code we get: |
|
|
|
|
|
|
|
|
|
##### |
|
|
|
|
PowerPC performance report |
|
|
|
@ -134,27 +134,27 @@ the fastest C execution in this example. It's not perfect but it's not |
|
|
|
|
bad (well I wrote this function so I can't say otherwise :-). |
|
|
|
|
|
|
|
|
|
Once you have that kind of report, you can try to improve things by |
|
|
|
|
finding what goes wrong and fixing it ; in the example above, one |
|
|
|
|
shoud try to diminish the number of AltiVec stalls, as this *may* |
|
|
|
|
improve performances. |
|
|
|
|
finding what goes wrong and fixing it; in the example above, one |
|
|
|
|
should try to diminish the number of AltiVec stalls, as this *may* |
|
|
|
|
improve performance. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
IV) Enabling the PMC in MacOS X |
|
|
|
|
IV) Enabling the PMC in Mac OS X |
|
|
|
|
|
|
|
|
|
This is easy. Use "Monster" and "monster". Those tools come from |
|
|
|
|
Apple's CHUD package, and can be found hidden in the developer web |
|
|
|
|
site & ftp site. "MONster" is the graphical application, use it to |
|
|
|
|
site & FTP site. "MONster" is the graphical application, use it to |
|
|
|
|
generate a config file specifying what each register should |
|
|
|
|
monitor. Then use the command-line application "monster" to use that |
|
|
|
|
config file, and enjoy the results. |
|
|
|
|
|
|
|
|
|
Note that "MONster" can be used for many other stuff, but it's |
|
|
|
|
Note that "MONster" can be used for many other things, but it's |
|
|
|
|
documented by Apple, it's not my subject. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
V) Enabling the PMC in Linux |
|
|
|
|
V) Enabling the PMC on Linux |
|
|
|
|
|
|
|
|
|
I don't know how to do it, sorry :-) Any idea very much welcome. |
|
|
|
|
|
|
|
|
|