Fair Use Week Day 2: The *Feist*-y Reason That Text and Data Mining is Fair Use

cross-posted at the Fair Use Week Blog at Harvard.

Happy Fair Use Week! This is a happy week, indeed, for me, because fair use is my favorite copyright doctrine. But my favorite copyright decision just may be Feist v. Rural Telephone Co., a case about…telephone books!

Among the many wonderful qualities of the Feist opinion is the bright neon line that it draws between the purpose of copyright (to give incentives for the creation and distribution of creative, expressive works) and what way, way, WAY too many people think is copyright’s purpose: to ensure that someone who works hard to make something gets paid every time someone else uses it. If you understand why Feist draws that line, you’ll understand why text and data mining is clearly a fair use. (See, I got there! Now hang in a little longer and I’ll get back to fair use in a minute…)

The idea that whoever makes something should control it, or get paid whenever it gets used, is sometimes called “labor-desert theory,” and it sounds pretty tempting. There’s even an Enlightenment philosopher that people invoke to support it: John Locke, who is said to have argued that when someone takes something from “the commons” and mixes it with their labor, the result is a delicious property gumbo, and it is theirs.

It’s been a minute since I last read Locke, so I can’t promise that’s the most faithful representation of his thinking. But I can tell you it is a pretty faithful representation of the arguments that some copyright holders and property rights enthusiasts make in favor of long, strong copyright. They talk about how hard it is to make a movie, how much time and energy must be devoted to various forms of creative work, how many jobs are required to make the creative economy hum, and so on.

That may all be true, but the fact (ha!) is that how hard you work to make something is irrelevant to the question of whether copyright protects it. Why? Well, it is an axiom of US copyright law that the author’s monopoly protects her expressive contributions to a work, but does not protect any facts (or ideas) that might be embedded in the work.

For example, where two authors write about the same underlying historical event, the first author may prevent the second author from copying too much of her expressive prose (these were the facts of the pioneering fair use decision Folsom v. Marsh, in which verbatim copying from an exhaustive biography of George Washington to create a second, shorter biography was found to be infringing), but she certainly can’t prevent the second author from relying on facts uncovered in her research (as, for example, in Miller v. Universal, where an author’s “research” on a famous kidnapping case was held not to be the proper subject of copyright protection as against a second author). Facts are not created by anyone (pace post-modernism etc.), and are no one’s property, according to copyright law. And, crucially, wrapping facts in a crunchy, flaky layer of your copyrighted expression is not enough to give you rights in the underlying facts.

Despite the bedrock status of this proposition, and its seemingly clear embodiment in the statute at § 102(b) of the Copyright Act, courts had trouble resisting the impulse to reward “sweat of the brow” or “industrious collection” by granting copyright protection to facts first revealed in a work of authorship. It wasn’t until the 1991 resolution of a dispute over the wholesale copying of names and numbers in telephone directories in Feist that the Supreme Court gave us a strong, clear articulation of both the principle and its deep Constitutional foundations:

The mere fact that a work is copyrighted does not mean that every element of the work may be protected. Originality remains the sine qua non of copyright; accordingly, copyright protection may extend only to those components of a work that are original to the author. [citations omitted] Thus, if the compilation author clothes facts with an original collocation of words, he or she may be able to claim a copyright in this written expression. Others may copy the underlying facts from the publication, but not the precise words used to present them.

[snip]

It may seem unfair that much of the fruit of the compiler’s labor may be used by others without compensation. As Justice Brennan has correctly observed, however, this is not “some unforeseen byproduct of a statutory scheme.” Harper & Row, 471 U. S., at 589 (dissenting opinion). It is, rather, “the essence of copyright,” ibid., and a constitutional requirement. The primary objective of copyright is not to reward the labor of authors, but “[t]o promote the Progress of Science and useful Arts.” Art. I, § 8, cl. 8. Accord, Twentieth Century Music Corp. v. Aiken, 422 U. S. 151, 156 (1975). To this end, copyright assures authors the right to their original expression, but encourages others to build freely upon the ideas and information conveyed by a work. Harper & Row, supra, at 556-557. This principle, known as the idea/expression or fact/expression dichotomy, applies to all works of authorship. …This result is neither unfair nor unfortunate. It is the means by which copyright advances the progress of science and art. (Emphases added.)

The Supreme Court subsequently called this distinction (also known as the “idea/expression dichotomy”) part of the “traditional contours of copyright” and a “built-in First Amendment safety valve.” This is, in other words, about as fundamental a proposition as there can be in copyright law, grounded in both the Copyright Clause and the First Amendment of the Constitution. To the extent that fact and expression in a protected work can be separated, the facts are free for the taking. Whether it’s a phonebook or a newspaper article, expression is protected, but facts are free.

But, it turns out that one of the most powerful ways to extract and use all the facts embedded in a wide variety of creative works, to separate them from the expression in which they subsist, is to use text and data mining. But in order to perform text and data mining, a computer has to do things that ordinarily require the permission of the copyright holder, namely, copying the full text of the works into a computer, and in many cases displaying to the public contextual snippets that substantiate your claims. All this takes place thanks to technology that the Founders certainly couldn’t have foreseen, and that even the drafters of the 1976 Copyright Act might not have anticipated. Enter fair use, with the flexibility required to adapt to a changing world.

While there was already plenty of smart writing on the issue, and a long line of cases pointing in the right direction, the question of whether using computers to read in-copyright texts and extract facts from them got its fullest, and perhaps final, answer when Judge Pierre Leval decided the Google Books case. Google Books was the result of a massive digitization effort in which university libraries (including ours) provided millions of books to Google to digitize and crawl, just like they crawl websites, to help people find books. (Libraries got to keep the digital copies, which we deposited with the HathiTrust Digital Library.) Leval more or less created the modern fair use doctrine in a law review article first published 30 years ago, so it was fitting that he was the judge to finally give a broad blessing to text and data mining. In his opinion, Judge Leval answers two fundamental questions:

Is Google’s purpose transformative, i.e., is it different from the author’s original expressive purpose and does it “serve[] copyright’s goal of enriching public knowledge” by using the protected material to “communicate[] something new and different from the original or expand[] its utility.” And,
Does Google’s use provide the public with a “substitute” in the market for the original works in a way that does “meaningful” “significant” harm to the market for the work?

The ethos of Feist informs these two questions in a fundamental way. First, Judge Leval finds Google’s purpose to be transformative because of its fundamentally factual, informative character. The core purposes of Google Book Search—to locate relevant books by providing facts about the occurrence of search terms inside of books, and to reveal facts about the occurrence of words and phrases throughout the entire corpus of books—are of course radically different from the expressive purpose(s) of any particular book. And, not only is that purpose different, but it is consonant with the design of copyright itself, which is tailored to facilitate the free circulation of facts. It also serves the ultimate purpose of copyright, which is to “promote the Progress of Science” (where “Science” means all manner of learning and culture). Google Books is transformative because it is Feist-y——it liberates facts from expression in a way that adds to the world’s knowledge and doesn’t implicate the expressive monopoly of authors.

Which brings us to the question of market harm and substitution, which is also filtered through a Feist-ian lens. In addition to the obvious point that Google Book Search results are not a substitute for access to the underlying books (snippets are too small, and they are impossible to reassemble into the original work), which is certainly of fundamental importance, the court must contend with two other market-based challenges.

First, the Authors Guild argues that some users will find the information they need in snippets, which will forestall sales of the relevant works (either directly to researchers, or to libraries that serve them). The court’s response here is fundamentally Feist-ian: so what? That is, to the extent that the snippet reveals a fact that obviates a researcher’s need to buy a copy of the book containing that fact, that is all to the good.

Leval observes, by way of example, that a student looking for the year Franklin D. Roosevelt was first stricken by polio can find it in a snippet from Richard Thayer Goldberg’s The Making of Franklin D. Roosevelt (1981) that is returned from a Google Book Search query. The student will not have to buy Goldberg’s book, or even check it out from a library, to find this fact. And that’s fine; this is not a “harm” that copyright cares about. Judge Leval writes:

[The author’s] copyright does not extend to the facts communicated by his book. It protects only the author’s manner of expression.… Google would be entitled, without infringement of [the author’s] copyright, to answer the student’s query about the year Roosevelt was afflicted, taking the information from Goldberg’s book.The fact that, in the case of the student’s snippet search, the information came embedded in three lines of Goldberg’s writing, which were superfluous to the searcher’s needs, would not change the taking of an unprotected fact into a copyright infringement.

Or, as Justice O’Connor says in Feist, “This result is neither unfair nor unfortunate.”

The Authors Guild also argued that Google’s scanning harms a “derivative” market, namely the market for creating search databases and displaying snippets. At first glance, this may be the Guild’s most compelling argument. Maybe Google Book Search users never see the entire work, but of course Google itself necessarily does copy the full text, so the status of Google’s use behind the curtain could be less clear.

Judge Leval doesn’t think so. To the contrary, he says “There is no merit to this argument.” Why? Because

“The copyright resulting from the Plaintiffs’ authorship of their works does not include an exclusive right to furnish the kind of information about the works that Google’s programs provide to the public. For substantially the same reasons, the copyright that protects Plaintiffs’ works does not include an exclusive derivative right to supply such information through query of a digitized copy.”

Judge Leval goes on to argue that the right to create derivative works is limited to works that “re-present the protected aspects of the original work, i.e., its expressive content, converted into an altered form.” As has already been established, the Google Book Search project does no such thing. Indeed, Judge Leval distinguishes Google Book Search from other projects that have sought permission to display shorter portions of books or songs (as in ringtones) by observing that,

Unlike the reading experience that the Google Partners program or the Amazon Search Inside the Book program provides [or the listening experience that Ringtones provide], the snippet function does not provide searchers with any meaningful experience of the expressive content of the book. (emphasis added)

So, the fact/expression dichotomy, defended most memorably in Feist, does a lot of work in the Google Books opinion. And that is a good thing, because it grounds the right to text and data mine in fundamental copyright and Constitutional principles with roots as deep and broad as the fair use doctrine itself.

The Taper

The Taper
Copyright and Information Policy at the UVA Library

Fair Use Week Day 2: The Feist-y Reason That Text and Data Mining is Fair Use

The Taper

The Taper Copyright and Information Policy at the UVA Library

The Taper
Copyright and Information Policy at the UVA Library