dav1d 1.2.0 release

3 May 2023

dav1d 1.2.0

If you’ve followed a bit my blogposts, or Phoronix, in the last few years, you should have seen that we’ve focused on SIMD optimizations for all the platforms supported. dav1d 1.2.0 is no different.

History of optimizations in dav1d

At the beginning, we focused on normal 8-bit assembly, until we reached release 0.7.1 where we started doing a lot of work on high-bitdepth.
The major part of that work was finished with 0.9.1, where we reached 140 000 lines of asm inside the project. On dav1d 0.9.2, we did a lot of smaller tasks for film grain, SSE4 and miscellaneous improvements on all platforms, and concluded most of those hbd optimizations.

For the 1.0.0 release of dav1d, we focused on the API, the threading and the usability of the library. And we also started working seriously on AVX-512 optimizations (and added, of course, small improvements everywhere).

Dav1d 1.1.0 and 1.2.0 are both focused on the following optimizations:

  • z1/z2/z3 for all bitdepth (both 8bpc and high bitdepth) on ARM (NEON) and x86 (mostly SSSE3)
  • AVX-512 optimizations.

z1/z2/z3

So, if you read what I wrote in 0.9.1, you could think that those optimizations are not very useful. But they are :)

z1/z2/z3 are what we call intra-tools, meaning they only act inside the same image (a contrario from inter-tools). In terms of pure CPU usage, those tools are smaller than other tools, notably compared to the various filters; and they are also way more difficult to optimize.
However, when we’re talking about AVIF the image format, those intra-tools can become more prominent than when dealing with video.

As for AVX-512, while the previous generations had a lot of issues with AVX-512, notably issues of downclocking, this is less an issue recently with newer CPUs supporting this instruction set. It can yield important improvements on the decoding time.

Lines of Code

So, very often people ask me how much handwritten assembly there is now in dav1d.

Today, I’ve counted, for the 1.2.0 release:

  • 144 000 lines of x86 asm (SSE2 to AVX-512, 32b and 64b)
  • 59 500 lines of ARM NEON asm (ARMv7 32b to ARMv8 64b).

This makes the total of asm go beyond 200 000 lines of code, with this release!

Enjoy!

Jean-Baptiste Kempf