My brother pointed me to a Slashdot post about using "digital fingerprints" to catch intellectual property thieves.
Attributor analyzes the content of clients, who could range from individuals to big media companies, using a technique known as 'digital fingerprinting,' which determines unique and identifying characteristics of content. It uses these digital fingerprints to search its index of the Web for the content. The company claims to be able to spot a customer's content based on the appearance of as little as a few sentences of text or a few seconds of audio or video.
That's a giant search problem, and it won't scale well. Each "fingerprint" (or more technically, "feature") that is extracted from the input data will be searched for in a database of the web's content. Assuming a piece of intellectual property will have thousands of features that need to be found together, in the right order, you're looking at thousands of seconds of search for each song, picture, video, book, or whatever that you're trying to protect. (Searches on Google typtically take between 0.1 and 1 second.) So they're going to need processing and storage capacity on par with Google's to do their matching.
On the plus side, they'll probably only need to check each piece of IP once a week or month; catching infringements faster than that wouldn't serve much of a purpose. This isn't a difficult project from an algorithm standpoint, but setting up the hardware will be daunting.