This completes the conversion of h264dsp to yasm; note that h264 also uses some dsputil functions, most notably qpel. Performance-wise, the yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles faster (201->193) on x86-32.