Log Message: |
MMXed the calculation of SSE for 8x8 16bit blocks. This helps quite
a lot VHQ=4 mode.
My tests show with trellis:chroma_me:
- ~20% speed improvement for vhq=4.
- at least 5% when using vhq=1.
Of course this speedup vanishes if more CPU intensive features are
used. CruNcher who used gmc/qpel, noticed "only" a ~5% speed
improvement.
NB: i'm of course talking about overall speed improvement. Such a
small patch for such a big improvement :-)
|