mirror of https://github.com/FFmpeg/FFmpeg.git
This functionality is better accessed through tools like oprofile. Originally committed as revision 23808 to svn://svn.ffmpeg.org/ffmpeg/trunkoldabi
parent
a788196e20
commit
2829ce4b40
13 changed files with 0 additions and 513 deletions
@ -1,172 +0,0 @@ |
||||
FFmpeg & evaluating performance on the PowerPC Architecture HOWTO |
||||
|
||||
(c) 2003-2004 Romain Dolbeau <romain@dolbeau.org> |
||||
|
||||
|
||||
|
||||
I - Introduction |
||||
|
||||
The PowerPC architecture and its SIMD extension AltiVec offer some |
||||
interesting tools to evaluate performance and improve the code. |
||||
This document tries to explain how to use those tools with FFmpeg. |
||||
|
||||
The architecture itself offers two ways to evaluate the performance of |
||||
a given piece of code: |
||||
|
||||
1) The Time Base Registers (TBL) |
||||
2) The Performance Monitor Counter Registers (PMC) |
||||
|
||||
The first ones are always available, always active, but they're not very |
||||
accurate: the registers increment by one every four *bus* cycles. On |
||||
my 667 Mhz tiBook (ppc7450), this means once every twenty *processor* |
||||
cycles. So we won't use that. |
||||
|
||||
The PMC are much more useful: not only can they report cycle-accurate |
||||
timing, but they can also be used to monitor many other parameters, |
||||
such as the number of AltiVec stalls for every kind of instruction, |
||||
or instruction cache misses. The downside is that not all processors |
||||
support the PMC (all G3, all G4 and the 970 do support them), and |
||||
they're inactive by default - you need to activate them with a |
||||
dedicated tool. Also, the number of available PMC depends on the |
||||
procesor: the various 604 have 2, the various 75x (aka. G3) have 4, |
||||
and the various 74xx (aka G4) have 6. |
||||
|
||||
*WARNING*: The PowerPC 970 is not very well documented, and its PMC |
||||
registers are 64 bits wide. To properly notify the code, you *must* |
||||
tune for the 970 (using --tune=970), or the code will assume 32 bit |
||||
registers. |
||||
|
||||
|
||||
II - Enabling FFmpeg PowerPC performance support |
||||
|
||||
This needs to be done by hand. First, you need to configure FFmpeg as |
||||
usual, but add the "--powerpc-perf-enable" option. For instance: |
||||
|
||||
##### |
||||
./configure --prefix=/usr/local/ffmpeg-svn --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable |
||||
##### |
||||
|
||||
This will configure FFmpeg to install inside /usr/local/ffmpeg-svn, |
||||
compiling with gcc-3.3 (you should try to use this one or a newer |
||||
gcc), and tuning for the PowerPC 7450 (i.e. the newer G4; as a rule of |
||||
thumb, those at 550Mhz and more). It will also enable the PMC. |
||||
|
||||
You may also edit the file "config.h" to enable the following line: |
||||
|
||||
##### |
||||
// #define ALTIVEC_USE_REFERENCE_C_CODE 1 |
||||
##### |
||||
|
||||
If you enable this line, then the code will not make use of AltiVec, |
||||
but will use the reference C code instead. This is useful to compare |
||||
performance between two versions of the code. |
||||
|
||||
Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h": |
||||
|
||||
##### |
||||
#define POWERPC_NUM_PMC_ENABLED 4 |
||||
##### |
||||
|
||||
If you have a G4 CPU, you can enable all 6 PMC. DO NOT enable more |
||||
PMC than available on your CPU! |
||||
|
||||
Then, simply compile FFmpeg as usual (make && make install). |
||||
|
||||
|
||||
|
||||
III - Using FFmpeg PowerPC performance support |
||||
|
||||
This FFmeg can be used exactly as usual. But before exiting, FFmpeg |
||||
will dump a per-function report that looks like this: |
||||
|
||||
##### |
||||
PowerPC performance report |
||||
Values are from the PMC registers, and represent whatever the |
||||
registers are set to record. |
||||
Function "gmc1_altivec" (pmc1): |
||||
min: 231 |
||||
max: 1339867 |
||||
avg: 558.25 (255302) |
||||
Function "gmc1_altivec" (pmc2): |
||||
min: 93 |
||||
max: 2164 |
||||
avg: 267.31 (255302) |
||||
Function "gmc1_altivec" (pmc3): |
||||
min: 72 |
||||
max: 1987 |
||||
avg: 276.20 (255302) |
||||
(...) |
||||
##### |
||||
|
||||
In this example, PMC1 was set to record CPU cycles, PMC2 was set to |
||||
record AltiVec Permute Stall Cycles, and PMC3 was set to record AltiVec |
||||
Issue Stalls. |
||||
|
||||
The function "gmc1_altivec" was monitored 255302 times, and the |
||||
minimum execution time was 231 processor cycles. The max and average |
||||
aren't much use, as it's very likely the OS interrupted execution for |
||||
reasons of its own :-( |
||||
|
||||
With the exact same settings and source file, but using the reference C |
||||
code we get: |
||||
|
||||
##### |
||||
PowerPC performance report |
||||
Values are from the PMC registers, and represent whatever the |
||||
registers are set to record. |
||||
Function "gmc1_altivec" (pmc1): |
||||
min: 592 |
||||
max: 2532235 |
||||
avg: 962.88 (255302) |
||||
Function "gmc1_altivec" (pmc2): |
||||
min: 0 |
||||
max: 33 |
||||
avg: 0.00 (255302) |
||||
Function "gmc1_altivec" (pmc3): |
||||
min: 0 |
||||
max: 350 |
||||
avg: 0.03 (255302) |
||||
(...) |
||||
##### |
||||
|
||||
592 cycles, so the fastest AltiVec execution is about 2.5x faster than |
||||
the fastest C execution in this example. It's not perfect but it's not |
||||
bad (well I wrote this function so I can't say otherwise :-). |
||||
|
||||
Once you have that kind of report, you can try to improve things by |
||||
finding what goes wrong and fixing it; in the example above, one |
||||
should try to diminish the number of AltiVec stalls, as this *may* |
||||
improve performance. |
||||
|
||||
|
||||
|
||||
IV) Enabling the PMC in Mac OS X |
||||
|
||||
This is easy. Use "Monster" and "monster". Those tools come from |
||||
Apple's CHUD package, and can be found hidden in the developer web |
||||
site & FTP site. "MONster" is the graphical application, use it to |
||||
generate a config file specifying what each register should |
||||
monitor. Then use the command-line application "monster" to use that |
||||
config file, and enjoy the results. |
||||
|
||||
Note that "MONster" can be used for many other things, but it's |
||||
documented by Apple, it's not my subject. |
||||
|
||||
If you are using CHUD 4.4.2 or later, you'll notice that MONster is |
||||
no longer available. It's been superseeded by Shark, where |
||||
configuration of PMCs is available as a plugin. |
||||
|
||||
|
||||
|
||||
V) Enabling the PMC on Linux |
||||
|
||||
On linux you may use oprofile from http://oprofile.sf.net, depending on the |
||||
version and the cpu you may need to apply a patch[1] to access a set of the |
||||
possibile counters from the userspace application. You can always define them |
||||
using the kernel interface /dev/oprofile/* . |
||||
|
||||
[1] http://dev.gentoo.org/~lu_zero/development/oprofile-g4-20060423.patch |
||||
|
||||
-- |
||||
Romain Dolbeau <romain@dolbeau.org> |
||||
Luca Barbato <lu_zero@gentoo.org> |
@ -1,154 +0,0 @@ |
||||
/*
|
||||
* Copyright (c) 2003-2004 Romain Dolbeau <romain@dolbeau.org> |
||||
* |
||||
* This file is part of FFmpeg. |
||||
* |
||||
* FFmpeg is free software; you can redistribute it and/or |
||||
* modify it under the terms of the GNU Lesser General Public |
||||
* License as published by the Free Software Foundation; either |
||||
* version 2.1 of the License, or (at your option) any later version. |
||||
* |
||||
* FFmpeg is distributed in the hope that it will be useful, |
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
||||
* Lesser General Public License for more details. |
||||
* |
||||
* You should have received a copy of the GNU Lesser General Public |
||||
* License along with FFmpeg; if not, write to the Free Software |
||||
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
||||
*/ |
||||
|
||||
#ifndef AVCODEC_PPC_DSPUTIL_PPC_H |
||||
#define AVCODEC_PPC_DSPUTIL_PPC_H |
||||
|
||||
#include "config.h" |
||||
|
||||
#if CONFIG_POWERPC_PERF |
||||
void powerpc_display_perf_report(void); |
||||
/* the 604* have 2, the G3* have 4, the G4s have 6,
|
||||
and the G5 are completely different (they MUST use |
||||
ARCH_PPC64, and let's hope all future 64 bis PPC |
||||
will use the same PMCs... */ |
||||
#define POWERPC_NUM_PMC_ENABLED 6 |
||||
/* if you add to the enum below, also add to the perfname array
|
||||
in dsputil_ppc.c */ |
||||
enum powerpc_perf_index { |
||||
altivec_fft_num = 0, |
||||
altivec_gmc1_num, |
||||
altivec_dct_unquantize_h263_num, |
||||
altivec_fdct, |
||||
altivec_idct_add_num, |
||||
altivec_idct_put_num, |
||||
altivec_put_pixels16_num, |
||||
altivec_avg_pixels16_num, |
||||
altivec_avg_pixels8_num, |
||||
altivec_put_pixels8_xy2_num, |
||||
altivec_put_no_rnd_pixels8_xy2_num, |
||||
altivec_put_pixels16_xy2_num, |
||||
altivec_put_no_rnd_pixels16_xy2_num, |
||||
altivec_hadamard8_diff8x8_num, |
||||
altivec_hadamard8_diff16_num, |
||||
altivec_avg_pixels8_xy2_num, |
||||
powerpc_clear_blocks_dcbz32, |
||||
powerpc_clear_blocks_dcbz128, |
||||
altivec_put_h264_chroma_mc8_num, |
||||
altivec_avg_h264_chroma_mc8_num, |
||||
altivec_put_h264_qpel16_h_lowpass_num, |
||||
altivec_avg_h264_qpel16_h_lowpass_num, |
||||
altivec_put_h264_qpel16_v_lowpass_num, |
||||
altivec_avg_h264_qpel16_v_lowpass_num, |
||||
altivec_put_h264_qpel16_hv_lowpass_num, |
||||
altivec_avg_h264_qpel16_hv_lowpass_num, |
||||
powerpc_perf_total |
||||
}; |
||||
enum powerpc_data_index { |
||||
powerpc_data_min = 0, |
||||
powerpc_data_max, |
||||
powerpc_data_sum, |
||||
powerpc_data_num, |
||||
powerpc_data_total |
||||
}; |
||||
extern unsigned long long perfdata[POWERPC_NUM_PMC_ENABLED][powerpc_perf_total][powerpc_data_total]; |
||||
|
||||
#if !ARCH_PPC64 |
||||
#define POWERP_PMC_DATATYPE unsigned long |
||||
#define POWERPC_GET_PMC1(a) __asm__ volatile("mfspr %0, 937" : "=r" (a)) |
||||
#define POWERPC_GET_PMC2(a) __asm__ volatile("mfspr %0, 938" : "=r" (a)) |
||||
#if (POWERPC_NUM_PMC_ENABLED > 2) |
||||
#define POWERPC_GET_PMC3(a) __asm__ volatile("mfspr %0, 941" : "=r" (a)) |
||||
#define POWERPC_GET_PMC4(a) __asm__ volatile("mfspr %0, 942" : "=r" (a)) |
||||
#else |
||||
#define POWERPC_GET_PMC3(a) do {} while (0) |
||||
#define POWERPC_GET_PMC4(a) do {} while (0) |
||||
#endif |
||||
#if (POWERPC_NUM_PMC_ENABLED > 4) |
||||
#define POWERPC_GET_PMC5(a) __asm__ volatile("mfspr %0, 929" : "=r" (a)) |
||||
#define POWERPC_GET_PMC6(a) __asm__ volatile("mfspr %0, 930" : "=r" (a)) |
||||
#else |
||||
#define POWERPC_GET_PMC5(a) do {} while (0) |
||||
#define POWERPC_GET_PMC6(a) do {} while (0) |
||||
#endif |
||||
#else /* ARCH_PPC64 */ |
||||
#define POWERP_PMC_DATATYPE unsigned long long |
||||
#define POWERPC_GET_PMC1(a) __asm__ volatile("mfspr %0, 771" : "=r" (a)) |
||||
#define POWERPC_GET_PMC2(a) __asm__ volatile("mfspr %0, 772" : "=r" (a)) |
||||
#if (POWERPC_NUM_PMC_ENABLED > 2) |
||||
#define POWERPC_GET_PMC3(a) __asm__ volatile("mfspr %0, 773" : "=r" (a)) |
||||
#define POWERPC_GET_PMC4(a) __asm__ volatile("mfspr %0, 774" : "=r" (a)) |
||||
#else |
||||
#define POWERPC_GET_PMC3(a) do {} while (0) |
||||
#define POWERPC_GET_PMC4(a) do {} while (0) |
||||
#endif |
||||
#if (POWERPC_NUM_PMC_ENABLED > 4) |
||||
#define POWERPC_GET_PMC5(a) __asm__ volatile("mfspr %0, 775" : "=r" (a)) |
||||
#define POWERPC_GET_PMC6(a) __asm__ volatile("mfspr %0, 776" : "=r" (a)) |
||||
#else |
||||
#define POWERPC_GET_PMC5(a) do {} while (0) |
||||
#define POWERPC_GET_PMC6(a) do {} while (0) |
||||
#endif |
||||
#endif /* ARCH_PPC64 */ |
||||
#define POWERPC_PERF_DECLARE(a, cond) \ |
||||
POWERP_PMC_DATATYPE \
|
||||
pmc_start[POWERPC_NUM_PMC_ENABLED], \
|
||||
pmc_stop[POWERPC_NUM_PMC_ENABLED], \
|
||||
pmc_loop_index; |
||||
#define POWERPC_PERF_START_COUNT(a, cond) do { \ |
||||
POWERPC_GET_PMC6(pmc_start[5]); \
|
||||
POWERPC_GET_PMC5(pmc_start[4]); \
|
||||
POWERPC_GET_PMC4(pmc_start[3]); \
|
||||
POWERPC_GET_PMC3(pmc_start[2]); \
|
||||
POWERPC_GET_PMC2(pmc_start[1]); \
|
||||
POWERPC_GET_PMC1(pmc_start[0]); \
|
||||
} while (0) |
||||
#define POWERPC_PERF_STOP_COUNT(a, cond) do { \ |
||||
POWERPC_GET_PMC1(pmc_stop[0]); \
|
||||
POWERPC_GET_PMC2(pmc_stop[1]); \
|
||||
POWERPC_GET_PMC3(pmc_stop[2]); \
|
||||
POWERPC_GET_PMC4(pmc_stop[3]); \
|
||||
POWERPC_GET_PMC5(pmc_stop[4]); \
|
||||
POWERPC_GET_PMC6(pmc_stop[5]); \
|
||||
if (cond) { \
|
||||
for(pmc_loop_index = 0; \
|
||||
pmc_loop_index < POWERPC_NUM_PMC_ENABLED; \
|
||||
pmc_loop_index++) { \
|
||||
if (pmc_stop[pmc_loop_index] >= pmc_start[pmc_loop_index]) { \
|
||||
POWERP_PMC_DATATYPE diff = \
|
||||
pmc_stop[pmc_loop_index] - pmc_start[pmc_loop_index]; \
|
||||
if (diff < perfdata[pmc_loop_index][a][powerpc_data_min]) \
|
||||
perfdata[pmc_loop_index][a][powerpc_data_min] = diff; \
|
||||
if (diff > perfdata[pmc_loop_index][a][powerpc_data_max]) \
|
||||
perfdata[pmc_loop_index][a][powerpc_data_max] = diff; \
|
||||
perfdata[pmc_loop_index][a][powerpc_data_sum] += diff; \
|
||||
perfdata[pmc_loop_index][a][powerpc_data_num] ++; \
|
||||
} \
|
||||
} \
|
||||
} \
|
||||
} while (0) |
||||
#else /* CONFIG_POWERPC_PERF */ |
||||
// those are needed to avoid empty statements.
|
||||
#define POWERPC_PERF_DECLARE(a, cond) int altivec_placeholder __attribute__ ((unused)) |
||||
#define POWERPC_PERF_START_COUNT(a, cond) do {} while (0) |
||||
#define POWERPC_PERF_STOP_COUNT(a, cond) do {} while (0) |
||||
#endif /* CONFIG_POWERPC_PERF */ |
||||
|
||||
#endif /* AVCODEC_PPC_DSPUTIL_PPC_H */ |
Loading…
Reference in new issue