Quality Metrics and Texture Compression

We were on a call with a customer, who happens to be an expert in video compression, updating him on Basis progress. We read off the stats we'd gotten on quality, showed him the images our compressor produced so he could judge quality for himself.

"Great, I look forward to seeing what our human testers say."

This made me smile-- he totally gets it. This is the best way to judge quality in a codec. It was crucial to how JPEG was developed and tested over years. We aren't showing images to computers at the end of the day, we're showing them to other humans. And the fascinating aspect here is that human visual perception is something so complex that we have not been able to devise a quality metric that judges it accurately.

We experimented with a new compression technique yesterday. I cringed when I saw some of the quality metrics, when I saw how many pixels had been altered. But then I looked at the image.

The image actually looked better and higher quality.

It was mind-boggling. Neither Rich nor I could figure out why this was true.

One thing to note is that Basis always outputs a GPU format, which is lossy in itself. Having the right kind of artifacts in Basis actually boosts perceived image quality compared to the normally compressed GPU format, on top of being half the compressed size!

What's out there now?

Let's take a look at two popular metrics, PSNR and SSIM.

Keep in mind that texture quality is not the same thing as typical image quality. We're not just dealing with photographs, but also normal maps, specular maps, depth data, light fields, texture arrays, video, and so much more. On top of that, the texture is often stretched and mapped over complex objects. We have to be very careful that any kind of quality metric we choose works across all these applications.

PSNR stands for "peak signal to noise ratio". It compares your compressed image to the original, and outputs a scale of how much noise has corrupted the original image.

SSIM is more complex. It seeks to improve upon PSNR by comparing the two images and examining the "structural similarity," aiming to better simulate human perception.

PSNR is nice because it's much easier to understand-- how much error/noise has been introduced to your image? There's also a sharper curve to SSIM-- if someone is mediocre quality, SSIM will often slam it down to a lower value. You can see this demonstrated in these graphs. This property of SSIM can make it very beneficial when graphing quality attributes and trying to draw conclusions based on them, and can be sometimes problematic when using it to tune algorithms (where mediocre quality might not deserve to get weighted so negatively).

These metrics are fascinating because they do a pretty good job, but neither simulates the human visual system well enough. If your error's grossly off, something's probably wrong. Most likely. You should still have someone look at it.

In practice, we use both PSNR and SSIM to tune Basis at times when it's impractical to pour over thousands of images ourselves. We complement this by spending hours and hours pouring over the images, examining quality and tweaking the code and making sure it's up to standards.

To conclude, if you're serious about texture compression you need to be like our customer and have a wide variety of human testers look at and rate the textures.

I'm also very excited to see what new image metrics we get in the future, and how these current metrics evolve. This is a fascinating opportunity to blend the fields of computer graphics, compression, and cognitive science. I'm especially looking forward to how this will affect generic texture compression, and not just photographic images.

One Percent Stake

Texture Compression is Changing