Both the fixed as well as the floating point mpegaudio decoders use
LUTs of type int8_t and uint32_t with 32K entries each; these tables
are completely the same, yet they are not shared. This commit makes
them shared. When both fixed as well as floating point decoders are
enabled, this saves 160KiB from the bss segment for a normal build
(translating into 160KiB less memory usage if both a shared as well as
a floating point decoder have actually been used) and 160KiB from the
binary for a build with hardcoded tables.
It also means that the code to create said LUTs is no longer duplicated
(for a normal build).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Each invocation of this function is only entered once, so using a static
array makes no sense (and given that the whole array is reinitialized at
the beginning of this function, it wouldn't even make sense if the
function were called multiple times).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The mpegaudio_tablegen header contains code to initialize several
tables; it is included in both the fixed as well as the floating point
mpegaudio decoders and some of these tables are only used by the fixed
resp. floating point decoders; yet both types are always initialized,
leaving the compiler to figure out that one of them is unused.
GCC 9.3 fails at this (even with -O3):
$ readelf -s mpegaudiodec_fixed.o|grep _float
28: 0000000000001660 32768 OBJECT LOCAL DEFAULT 4 expval_table_float
An actually unused table (expval_table_fixed/float) of size 32KiB is kept
and initialized (the reason for this is probably that this table is read
from, namely to initialize another table: exp_table_fixed/float; of course
the float resp. fixed tables are not used in the fixed resp. floating point
decoder).
Therefore #ifdef the unneeded tables away.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
This further speeds up runtime initialization, with identical generated tables.
Sample benchmark (x86-64, Haswell, GNU/Linux):
old:
34441423 decicycles in mpegaudio_tableinit, 8192 runs, 0 skips
new:
10776291 decicycles in mpegaudio_tableinit, 8192 runs, 0 skips
Most low hanging fruit is taken care of here. For some idea, note that
83,064 array elements totalling 233,722 bytes need to be initialized.
Thus, with this patch, we average ~ 12.9 cycles per element or ~ 4.6
cycles per byte.
Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This does some miscellaneous stuff mainly avoiding the usage of pow to
achieve significant speedups. This is not speed critical, but is
unnecessary latency and cycles wasted for a user.
All tables tested and are identical to the old ones
(bit-exact even in floating point case).
Sample benchmark (x86-64, Haswell, GNU/Linux):
old:
102329530 decicycles in mpegaudio_tableinit, 1 runs, 0 skips
new:
34111900 decicycles in mpegaudio_tableinit, 1 runs, 0 skips
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
cbrtf() took floats but it represented 1/3 exactly
and even if not more precission should be better in theory
for the table generation
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
You cannot count on it being present on all systems, and you
cannot include libm.h in a host tool, so just hard code a baseline
implementation.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
You cannot count on them being present on all systems, and you
cannot include libm.h in a host tool, so just hard code baseline
implementations.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
You cannot count on it being present on all systems, and you
cannot include libm.h in a host tool, so just hard code a baseline
implementation.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
You cannot count on them being present on all systems, and you
cannot include libm.h in a host tool, so just hard code baseline
implementations.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
This adds a _fixed suffix to the fixed-point versions of things
with both float and fixed-point variants. This makes it more
consistent with other dual-implementation things, e.g. fft.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The low quality mode is off by default and never tested. The high
quality mode is also plenty fast enough.
Signed-off-by: Mans Rullgard <mans@mansr.com>
config.h must not be included in that file. The table generator runs
on the host system, but config.h describes the target.
Originally committed as revision 20620 to svn://svn.ffmpeg.org/ffmpeg/trunk