x86 is 4% faster on P3
C sign stuff + x86 code for everything else is also faster then before (sorry forgot to test pure C)
... and if i replace the second occurance of the sign decoding in decode_residual by the asm too then everything gets slower iam starting to think that it might be best to write the whole function in asm, playing this avoid random deoptimizations game with gcc is not fun at all
Originally committed as revision 6732 to svn://svn.ffmpeg.org/ffmpeg/trunk
//FIXME the x86 code from this file should be moved into i386/h264 or cabac something.c/h (note ill kill you if you move my code away from under my fingers before iam finished with it!)
//FIXME use some macros to avoid duplicatin get_cabac (cant be done yet as that would make optimization work hard)