Duplicate content is a persistent and nagging problem with many webmasters. Employees of Yahoo, Google, and IBM jontly presented a paper at the 10th International Conference on Extending Database Technology conference in Munich this March. Titled Indexing Shared Content in Information Retrieval Systems (pdf), it discusses how to limit index sizes of search engines by reducing the amount of duplicate content contained in their indexes.
There is also a post by Todd Malicoat (aka Stuntdubl) which is one of the comprehensive guides when it comes to duplicate content. These two papers are more than worth a read.