Hashing algorithms map data to an arbitrary fixed-size value. Most hashing algorithms actively try to avoid collisions – e.g., minimizing the probability of two different keys having the same hash. Perceptual hashes do the opposite – they maximize collisions by creating some dimension of data locality – similar keys have similar hashes.

A simple implementation of perceptual hashing of an image might be the following:

  1. First, minimize the image dimensions, convert to grayscale.
  2. Calculate the average grayscale pixel value
  3. For each pixel, encode 1 if the color is lighter than the average, otherwise 0.

By removing some of the dimensions (downsampling the color, size, and resolution), you can

Of course, perpetual hashing algorithms are vulnerable to adversarial attacks. For example, steganography methods make it trivial to encode nearly unrecognizable image metadata.