This patch refactors the AAC coders to reuse code
between the MIPS port and the regular, portable C code.
There were two main functions that had to use
hand-optimized versions of quantization code:
- search_for_quantizers_twoloop
- codebook_trellis_rate
Those two were split into their own template header
files so they can be inlined inside both the MIPS port
and the generic code. In each context, they'll link
to their specialized implementations, and thus be
optimized by the compiler.
This approach I believe is better than maintaining
several copies of each function. As past experience has
proven, having to keep those in sync was error prone.
In this way, they will remain in sync by default.
Also, an implementation of the dequantized output
argument for the optimized quantize_and_encode
functions is included in the patch. While the current
implementation of search_for_pred still isn't using
it, future iterations of main prediction probably will.
It should not imply any measurable performance hit while
not being used.
Makes more sense as users usually set the -cutoff option
to low pass filter the signal. The encoder will still over
shoot slightly when encoding normal coefficients however
that's normal.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This commit changes a few things about the noise substitution
logic:
- Brings back the quantization factor (reduced to 3) during
scalefactor index calculations.
- Rejects any zeroed bands. They should be inaudiable and it's
a waste transmitting the scalefactor indices for these.
- Uses swb_offsets instead of incrementing a 'start' with every
window group size.
- Rejects all PNS during short windows.
Overall improves quality. There was a plan to use the lfg system
to create the random numbers instead of using whatever the decoder
uses but for now this works fine. Entropy is far from important here.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This commit once again improves the PNS implementation by scaling the
thresholds with frequency. The thresholds get looser as the frequency
increases since higher frequencies are basically noise to human ears.
Also, this introduces quantization error correction for PNS. Should
the error be too much, no PNS will be used. The energy_ratio is used
to regulate the actual encoded PNS energy: if the generated PNS
energy is higher than the energy from the psy system, energy_ratio
is used to correct it so that hopefully once requantized and
transmitted the value in the decoder will be closer to what the
encoder has.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This commit rewrites the PNS implementation and significantly
improves sonic quality.
The previous implementation marked an incredibly big amount
of SFBs to predict when there was no need for this and this
resulted in quite a large amount of artifacts. Also the
quantization was incorrect (av_clip(4+log2f(...))) which
led to 3x the intensity for PNS values leading to even more
artifacts.
This commit rewrites the PNS search function and introduces
a major change: the PNS values are synthesized and are compared
to the current coefficients in addition to passing through
the revised checks to see whether PNS can be used.
This decreases distortions and makes the current PNS implementation
mainly focused on replacing any low-power non-zero bands as well
as adding any zeroed bands back.
The current encoder's performance is enough (especially with
IS) so PNS isn't really required except to fill in the occasional
few bands as well as extend any zeroed high frequency, so this
combination which is already enabled by default works
to get as much quality as it can within the bits allowed.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
The specifications explicitly state to use roundf() which
also rounds half-integer values away from zero.
This does fix a few IS artifacts.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This commit abandons the way the specifications state to
quantize the coefficients, makes use of the new LPC float
functions and is much better.
The original way of converting non-normalized float samples
to int32_t which out LPC system expects was wrong and it was
wrong to assume the coefficients that are generated are also
valid. It was essentially a full garbage-in, garbage-out
system and it definitely shows when looking at spectrals
and listening. The high frequencies were very overattenuated.
The new LPC function performs the analysis directly.
The specifications state to quantize the coefficients into
four bit index values using an asin() function which of course
had to have ugly ternary operators because the function turns
negative if the coefficients are negative which when encoding
causes invalid bitstream to get generated.
This deviates from this by using the direct TNS tables, which
are fairly small since you only have 4 bits at most for index
values. The LPC values are directly quantized against the tables
and are then used to perform filtering after the requantization,
which simply fetches the array values.
The end result is that TNS works much better now and doesn't
attenuate anything but the actual signal, e.g. TNS removes
quantization errors and does it's job correctly now.
It might be enabled by default soon since it doesn't hurt and
helps reduce nastyness at low bitrates.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This commit completely alters the algorithm of prediction.
The original commit which introduced prediction was completely
incorrect to even remotely care about what the actual coefficients
contain or whether any options were enabled. Not my actual fault.
This commit treats prediction the way the decoder does and expects
to do: like lossy encryption. Everything related to prediction now
happens at the very end but just before quantization and encoding
of coefficients. On the decoder side, prediction happens before
anything has had a chance to even access the coefficients.
Also the original implementation had problems because it actually
touched the band_type of special bands which already had their
scalefactor indices marked and it's a wonder the asserion wasn't
triggered when transmitting those.
Overall, this now drastically increases audio quality and you should
think about enabling it if you don't plan on playing anything encoded
on really old low power ultra-embedded devices since they might not
support decoding of prediction or AAC-Main. Though the specifications
were written ages ago and as times change so do the FLOPS.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This commit finalizes AAC-Main profile encoding support
by implementing all mandatory and optional tools available
in the specifications and current decoders.
The AAC-Main profile reqires that prediction support be
present (although decoders don't require it to be enabled)
for an encoder to be deemed capable of AAC-Main encoding,
as well as TNS, PNS and IS, all of which were implemented
with previous commits or earlier of this year.
Users are encouraged to test the new functionality using either
-profile:a aac_main or -aac_pred 1, the former of which will enable
the prediction option by default and the latter will change the
profile to AAC-Main. No other options shall be changed by enabling
either, it's currently up to the users to decide what's best.
The current implementation works best using M/S and/or IS,
so users are also welcome to enable both options and any
other options (TNS, PNS) for maximum quality.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This commit implements temporal noise shaping support in the
encoder, along with an -aac_tns option to toggle it on or off
(off by default for now). TNS will increase audio quality
and reduce quantization noise by applying a multitap FIR filter
across allowed coefficients and transmit side information to the
decoder so it could create an inverse filter.
Users are encouraged to test the new functionality by enabling
-aac_tns 1 during encoding.
No major bugs are observable at this time so after a while if no
new problems appear and if the current implementation is deemed
of high enough quality and stability it will be enabled by default,
possibly at the same time the encoder has its experimental flag
removed and becomes the standard aac encoder in ffmpeg.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This commit moves the intensity stereo implementation
out from aaccoder and into a separate file. This was
possible using the previous commits.
This commit also drastically improves the IS implementation
by making it phase invariant e.g. it will always choose the
best possible phase regardless of whether M/S coding is on
or most of the coefficients have identical phases.
This also increases the quality and reduces any distortions
introduced by enablind intensity stereo.
Users are encouraged to test it out using the -aac_is 1
parameter as it has always been.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This commit moves the quantizer to a separate header file.
This allows the quantizer to be used from a separate files outside
of aaccoder without having to put another function pointer and will
result in a slight speedup as the compiler can do more optimizations.
This is required for commits following.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This commit moves the resetting of special bands (above RESERVED_BT)
to the main frame encoding function rather than the way it was done
previously in their corresponding search_for_... functions.
The reason why special bands need to be reset is that while normal
bands get chosen for every frame by the coder (twoloop by default)
the coders do not touch any special sfbs and will therefore
make them persist throughout the file.
If we zero them out any bands left unmarked will be chosen by
the second part of the coder (the trellis function in aaccoder.c).
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
This commit only changes the coding style to a saner way
of accessing coefficients (makes more sense to get the
memory address of a coefficients and start from there
rather than adding arbitrary numbers to offset a pointer).
Some compilers might detect an out of bounds access easier.
Also the way M/S and IS coefficients are calculated has been
changed, but should still have the same result (with the exception
that IS now applies from the normal coefficients rather than the
pristine ones, this is needed for upcoming commits).
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
As well as tables littered everywhere, functions were spread
out all across the encoder's files. This moves them to a single
place where they can be used by either the encoder's main files
or additional encoder files. Additionally, it changes the type
of some to 'inline' to enable us to simply put them in a header
file and possibly gain some speed due to compiler optimizations.
Signed-off-by: Claudio Freire <klaussfreire@gmail.com>
This commit moves any tables specific to the encoder from aacenc
and aaccoder to a separate file called 'aacenctab.c/.h'.
This was done as a clean up attempt as the encoder was filled with
tables pasted in between functions which made it confusing to follow
and track where each table and definition had been used.
This commit solves this by simply exporting the smaller tables out to
the aacenctab.h while the larger ones are compiled using aacenctab.c
and are referenced from the header file.
Signed-off-by: Claudio Freire <klaussfreire@gmail.com>
This commit removes a redundant argument from the functions in aaccoder.
The argument lambda was redundant as it was just a copy of s->lambda,
to which all functions have access to anyway. This cleans up the function
pointers a bit which is helpful as there are a lot of other search_for_*
functions under development and with them populated it gets messy.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Avoid clipping due to quantization noise to produce audible
artifacts, by detecting near-clipping signals and both attenuating
them a little and encoding escape-encoded bands (usually the
loudest) rounding towards zero instead of nearest, which tends to
decrease overall energy and thus clipping.
Currently fate tests measure numerical error so this change makes
tests using asynth (which are near clipping) report higher error
not less, because of window attenuation. Yet, they sound better,
not worse (albeit subtle, other samples aren't subtle at all).
Only measuring psychoacoustically weighted error would make for
a representative test, so that will be left for a future patch.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This commit moves the generation of ff_aac_pow34sf_tab[] out of the
encoder and into the table generator. The original commit log for
this table in 2011 actually mentions that it should be moved outside
but this never happened.
This is the first commit which cleans up the encoder a little.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This commit implements intensity stereo coding support
to the native aac encoder. This is a way to increase the efficiency
of the encoder by zeroing the right channel's spectral coefficients
(in a channel pair) and rederiving them in the decoder using information
from the scalefactor indices of special band types. This commit
confomrs to the official ISO 13818-7 specifications, although due to
their ambiguity certain deviations have been taken to ensure maximum
sound quality. This commit has been extensively tested and has shown
to not result in audiable audio artifacts unless in extreme cases.
This commit also adds an option, aac_is, which has the value of
0 by default. Intensity Stereo is part of the scalable aac profile
and is thus non-default.
The way IS coding works is that it rederives the right channel's
spectral coefficients from the left channel via the scalefactor
index values left in the right channel. Since an entire band's
spectral coefficients do not need to be coded, the encoder's
efficiency jumps up and it unzeroes some high frequency values
which it previously did not have enough bits to encode. That way
less information is lost than the information lost by rederiving
the spectral coefficients with some error. This is why the
filesize of files encoded with IS do not decrease significantly.
Users wishing that IS coding should reduce filesize are expected
to reduce their encoding bitrates appropriately.
This is V2 of the commit. The old version did not mark ms_mask as
0 since M/S and IS coding are incompactible, which resulted in
distortions with M/S coding enabled. This version also improves
phase detection by measuring it for every spectral coefficient in
the band and using a simple majority rule to determine whether the
coefficients are in or out of phase. Also, the energy values per
spectral coefficient were changed as to reflect the
official specifications.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit finalizes the PNS implementation previously added to the encoder
by moving it to a seperate function search_for_pns() and thus making it
coder-generic. This new implementation makes use of the spread field of
the psy bands and the lambda quality feedback paremeter. The spread of the
spectrum in a band prevents PNS from being used excessively and thus preserve
more phase information in high frequencies. The lambda parameter allows
the number of PNS-marked bands to vary based on the lambda parameter and the
amount of bits available, making better choices on which bands are to be marked
as noise. Comparisons with the previous PNS implementation can be found
here: https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/
This is V2 of the patch, the changes from the previous version being that this
version uses the new band->spread metric from aacpsy and normalizes the
energy using the group size. These changes were suggested by Claudio Freire
on the mailing list. Another change is the use of lambda to alter the
frequency threshold. This change makes the actual threshold frequencies
vary between +-2Khz of what's specified, depending on frame encoding performance.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit undoes commit c5d4f87e81
and removes PNS band marking from the twoloop coder, which has
been reimplemented in a better way in this series of patches.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit enables the function added with commit 7c10b87 and uses that
new function for setting any special scalefactor indices. This commit does
not change the behaviour of the encoder since no bands are being marked as
either NOISE_BT(due to the previous PNS implementation removed in the
previous commit) or INTENSITY_BT2/INTENSITY_BT.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
There were some mistakes in the code for M/S stereo, this commit fixes them.
The start variable was not being reset for every window and every access to
the coefficients was incorrect as well. This fixes that by properly
addressing the coefficients using both windows and setting the start on every window to zero.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit adds support for the coding of intensity stereo scalefactor indices.
It does not do any marking of such bands and as such does no functional changes
to the encoder. It removes any old twoloop specific code for PNS and moves it
into a seperate function which handles setting of scalefactor indices for
PNS and IS bands.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit adds support for both PNS and IS (intensity stereo) codebooks to the
encode_window_bands_info() quantizer, used by the faast, faac and anmr non-default,
native coders. This does not mean that both extensions now work with those coders,
some are simply unsuited and will trigger an assertion in the encoder while
others simply ignore the changed scalefactor indices and band types.
This commit simply adds support for encoding said band types with the alternative
coders. Future commits to the coders will be required to make them suitable.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit extends the trellis quantizer (used by the default twoloop coder)
to accept and correctly encode codebooks needed for intensity stereo and perceptual noise substitution.
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This commit implements the perceptual noise substitution AAC extension. This is a proof of concept
implementation, and as such, is not enabled by default. This is the fourth revision of this patch,
made after some problems were noted out. Any changes made since the previous revisions have been indicated.
In order to extend the encoder to use an additional codebook, the array holding each codebook has been
modified with two additional entries - 13 for the NOISE_BT codebook and 12 which has a placeholder function.
The cost system was modified to skip the 12th entry using an array to map the input and outputs it has. It
also does not accept using the 13th codebook for any band which is not marked as containing noise, thereby
restricting its ability to arbitrarily choose it for bands. The use of arrays allows the system to be easily
extended to allow for intensity stereo encoding, which uses additional codebooks.
The 12th entry in the codebook function array points to a function which stops the execution of the program
by calling an assert with an always 'false' argument. It was pointed out in an email discussion with
Claudio Freire that having a 'NULL' entry can result in unexpected behaviour and could be used as
a security hole. There is no danger of this function being called during encoding due to the codebook maps introduced.
Another change from version 1 of the patch is the addition of an argument to the encoder, '-aac_pns' to
enable and disable the PNS. This currently defaults to disable the PNS, as it is experimental.
The switch will be removed in the future, when the algorithm to select noise bands has been improved.
The current algorithm simply compares the energy to the threshold (multiplied by a constant) to determine
noise, however the FFPsyBand structure contains other useful figures to determine which bands carry noise more accurately.
Some of the sample files provided triggered an assertion when the parameter to tune the threshold was set to
a value of '2.2'. Claudio Freire reported the problem's source could be in the range of the scalefactor
indices for noise and advised to measure the minimal index and clip anything above the maximum allowed
value. This has been implemented and all the files which used to trigger the asserion now encode without error.
The third revision of the problem also removes unneded variabes and comparisons. All of them were
redundant and were of little use for when the PNS implementation would be extended.
The fourth revision moved the clipping of the noise scalefactors outside the second loop of the two-loop
algorithm in order to prevent their redundant calculations. Also, freq_mult has been changed to a float
variable due to the fact that rounding errors can prove to be a problem at low frequencies.
Considerations were taken whether the entire expression could be evaluated inside the expression
, but in the end it was decided that it would be for the best if just the type of the variable were
to change. Claudio Freire reported the two problems. There is no change of functionality
(except for low sampling frequencies) so the spectral demonstrations at the end of this commit's message were not updated.
Finally, the way energy values are converted to scalefactor indices has changed since the first commit,
as per the suggestion of Claudio Freire. This may still have some drawbacks, but unlike the first commit
it works without having redundant offsets and outputs what the decoder expects to have, in terms of the
ranges of the scalefactor indices.
Some spectral comparisons: https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/Original.png (original),
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/PNS_NO.png (encoded without PNS),
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/PNS1.2.png (encoded with PNS, const = 1.2),
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/Difference1.png (spectral difference).
The constant is the value which multiplies the threshold when it gets compared to the energy, larger
values means more noise will be substituded by PNS values. Example when const = 2.2:
https://trac.ffmpeg.org/attachment/wiki/Encode/AAC/PNS_2.2.png
Reviewed-by: Claudio Freire <klaussfreire@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch fixes a pointer arithmetic bug in adjust_frame_information that resulted in heavily corrupted audio when using M/S encoding. Also, a backup copy of untransformed coefficients has to be kept around or attempts at re-processing the frame (which happens when hevavily overspending bits during transients) will result in re-encoding of the coefficients and subsequent corruption of the resulting stream.
A/B testing shows the bug as corrected, but still cannot prove that M/S coding is a win at least in numbers. Limited listening tests do show improvement on M/S encoded samples in lower bitrates, but they're hidden among the other artifacts that remain to be corrected in the encoder.
Some of the regressions flagged in the report do show poor stereo image (but not buggy), so M/S encoding is clearly not good enough yet to be defaulted to auto.
In numbers, Patched against Unpatched, stereo_mode auto:
Files: 114
Bitrates: 6
Tests: 683
Serious Regressions: 0 (0%)
Regressions: 0 (0%)
Improvements: 227 (33%)
Big improvements: 92 (13%)
Worst regression - mybloodrusts.wv - 256k
- StdDev: 28.61 pSNR: -0.43 maxdiff: 1372.00
Best improvement - 60.wv - 384k
- StdDev: -369.57 pSNR: 45.02 maxdiff: -13322.00
Average - StdDev: -80.56 pSNR: 2.49 maxdiff: -8858.00
Patched against Unpatched stereo_mode ms_off shows no difference.
Patched stereo_mode auto vs Unpatched stereo_mode ms_off shows a small average improvement, just not too significant:
Serious Regressions: 0 (0%)
Regressions: 10 (1%)
Improvements: 45 (6%)
Big improvements: 2 (0%)
Worst regression - Illinois.wv - 256k
- StdDev: 33.20 pSNR: -2.03 maxdiff: 477.00
Best improvement - song_of_circomstances.flac - 384k
- StdDev: -3.97 pSNR: 7.61 maxdiff: -826.00
Average - StdDev: -10.25 pSNR: 0.20 maxdiff: -281.00
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This fixes a case where multichannel bitrate isn't accurately
targetted by psy model alone, never achieving the target bitrate.
Signed-off-by: Martin Storsjö <martin@martin.st>
Fixes a case where multichannel bitrate isn't accurately
targetted by psy model alone, never achieving the target bitrate.
Now fixed.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Do not pointlessly call ff_alloc_packet multiple times,
and fix an infinite loop by clamping the maximum
number of bits to target in the algorithm that does
not use lambda.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Do not pointlessly call ff_alloc_packet2 multiple times,
and fix an infinite loop by clamping the maximum
number of bits to target in the algorithm that does
not use lambda.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>