Gates Building, Room 100, 3:15-4:30

Jan 29, 1999

Range searches -- nearest neighbor search

Weights to match customer needs

joint work with Yossi Rubner and Leonidas J. Guiba

{rubner,guibas,tomasi}@cs.stanford.edu

**Abstract**

A new distance between two distributions, called the Earth Mover's Distance (EMD) is introduced. The EMD reflects the minimal amount of work that must be performed to transform one distribution into the other by moving "distribution mass" around. This is a special case of the transportation problem from linear optimization, for which efficient algorithms are available. The EMD also allows for partial matching. When used to compare distributions that have the same overall mass, the EMD is a true metric, and has easy-to-computer lower bounds. In this talk I focus on applications to image databases, especially color and texture. The EMD can be used to exhibit the structure of color-distribution and texture spaces by means of Multi-Dimensional Scaling displays. I also propose a novel approach to the problem of navigating through a collection of color images, which leads to a new paradigm for image database search.

**Color Signatures**

The color information of each image is reduced to a compact representation
that we call the *signature* of the image. In general, the signature
contains a varying number of points in a Euclidean space where a weight
is attached to each point. In the case of color images, the points represent
clusters of similar colors and the weight of each point is the fraction
of the image area with that color.

To compute the signature of a color image, we first slightly smooth each band of the image's RGB representation to reduce possible color quantization and dithering artifactrs. We then transform the image into the CIE-LAB color space [G. Wyszecki and W.S. Styles, "Color Science: Concepts and Methods, Wiley, NY, 1982]. We coalesce this distribution into clusters of similar colors using a novel two-stage algorithm based on a k-d tree.

The signatures thus obtained are compact: the color distribution of an entire image is summarized by a handful of points, typically eight to twelve. Because of the clustering algorithm used, signatures represent well the color distribution of the image. Since signatures represent distributions in the CIE-LAB color space, they are perceptually significant, in that Euclidean distances between points are strongly correlated with percepual differences.

**Distance Between Color Signatures**

We define the distance between two signatures to be the minimum amount
of 'work' needed to transform one signature into the other. The work needed
to move a point, or a fraction of a point, to a new location is the portion
of the weight being moved, multiplied by the Euclidean distance between
the old and the new locations. When changing one signature to another,
the work is the sum of the work done by moving the weights of the individual
points of the source signature to those of the destination siignature.
We allow the weight of a single source signature point to be partitioned
among several destination signature points, and vice versa. We call this
distance function the *earth mover's distance.* The computation is
formalized as a linear programming problem.

**Database Visualization**

We also propose the use of multi-dimensional scaling (MDS) techniques to embed a group of images as points in a two- or three-dimensional Euclidean space so that their distances are preserved as much as possible. The MDS problem is: Given a set of n objects together with a matrix of the distances between each pair, to find a set of n points in d-dimensional space whose distances between pairs of points are as close as possible to the original distances.

For example, given a set of cities and the distances between each pair, it is possible using MDS techniques to derive a two-dimensional map of the cities.

Such geometric embeddings allow the user to perceive the dominant axes of variation in the displayed image group. In particular, displays of 2-d MDS embeddings can be used to organize and refine the results of a nearest-neighbor query in a perceptually intuitive way. By iterating this process, the user is able to quickly navigate to the portion of the image space of interest.

Using our color metric, it is possible to display an entire image database in either two- or three- dimensional space.