We generate a new benchmark of synthetic image triplets that span a wide range of mid-level variations, labeled with human similarity judgments.
The dots below each image indicate which image is considered most similar to the reference by humans vs several existing metrics and our new metric, DreamSim.