Google Scholar Blog: Caselaw is Set Free, What Next?

The next article in our 10th Anniversary Series is by Thomas Bruce. He is the director of the Legal Information Institute at Cornell. He co-founded the LII in 1992. Today, its legal collections are used widely and have inspired the Free Access to Law Movement which has helped citizens worldwide learn about the laws that govern them. Thomas is also the author of Cello, first Web browser for Microsoft Windows. -- Anurag Acharya

Caselaw is Set Free, What Next?

Thomas Bruce, Director, Legal Information Institute, Cornell

A lawyer story

Google Scholar’s caselaw collection is a victory for open access to legal information and the democratization of law. It would be more than worthy of celebration from that standpoint alone. But caselaw is above all an obsession of lawyers, and I’d like to start by telling the tale from their point of view.

Five years ago, when Google Scholar added judicial opinions to its portfolio, it created an immediate sensation among lawyers. Small-office and solo practitioners were the most vocal about it; they had always had a difficult time affording the services of commercial publishers, even in print. And now there was access to a significant chunk of material that had previously been lodged firmly behind paywalls. It was linked and searchable, and still better, it offered a version of the citation-tracking and evaluation features that lawyers knew and loved in expensive commercial systems. It had first-class sorting and filtering features. It had Bluebook-form citations for each case (pretty much the epitome of something that nobody but lawyers knows or cares about, but a very thoughtful touch indeed). Nobody in the open-access arena had tried such a thing, and probably only Google could have. One commentator said that, “Google fired (arguably) the loudest...salvo in the battle for free access to caselaw… and it apparently came as a tweet”.

Scholar’s immediate impact on the legal profession was owed in large part to its technical virtuosity. It was an unusual display of ingenuity used to democratize services and features whose value had mostly been known only to lawyers. But, for the legal profession, it was happening in the middle of a long-brewing, near-perfect storm. Since at least the early 90’s, clients had complained about surcharges that law firms added to legal research costs. By 2000, there was growing refusal to reimburse legal-research fees at all; clients felt that the firm’s online charges were just a part of overhead, like water and electricity. That was not an isolated gripe; rather, it was a visible crack in a business model that we now know had been eroding for quite some time. By one estimate, the 2008 implosion of the financial-services industry destroyed over a third of the legal employment in New York. A lot of firms changed radically or disappeared altogether in the aftermath. You could talk, in dry academic terms, about downward price pressure on the industry. One suspects that the feeling was more like riding in an elevator whose cables had been cut.

There had been free offerings of caselaw online for some time, starting with a BBS system offered by the Cleveland Freenet in 1989; the first web-based effort started here at Cornell in 1992, and was followed with a full edition of all Federal statutes in 1994. Elsewhere -- notably in Canada and Australia -- open-access systems offered by third parties had evolved into the de facto national standard. And government was catching up, with many law creators publishing their materials online, for free.

Free services had never been the first choice of lawyers in the US. Some of the reasons were rational -- free services often lacked features that lawyers depend on, most provided very little in the way of commentary or annotation, and in any case they were highly distributed. There was no “one-stop shopping” in the world of open access to law, just a lot of websites offering different collections. The irrational reasons were, if anything, even more interesting and far more influential, though much more deeply buried in lawyer psyches. Lawyers are notoriously conservative in their work methods, and many law librarians even more so. Anything that was both new and noncommercial was inherently suspect. And the commercial services had had more than a century to reinforce the idea that size and comprehensiveness were the only measures of quality that mattered.

Even so, it’s hard to convey the degree to which lawyers mistrust distributed systems. As John Lederer once remarked, “Lawyers don’t buy books -- they buy systems of books”, and so it was with electronic products as well. It was easy for lawyers to dismiss what they saw as isolated pockets of legal information offered by volunteers at wildly different levels of added value, and marketers of commercial services had been quick to emphasize these qualities. That said, in the year prior to the addition of caselaw to Scholar, Cornell’s website had delivered well over 81 million pageviews to nearly 14 million unique visitors. 4.5 million of those pageviews went to the Federal Rules of Civil Procedure, a collection unlikely to be used by anyone but lawyers.

Comes now Google, a company with unparalleled capacity and legendary technical skills, offering a large collection of caselaw under one roof, with a workable citator and advanced search functionality. That was a big story, and it was often reported as “Google takes on commercial legal-research behemoths”. It was free access offered from a source that could not be dismissed as somehow beneath notice or unlikely to survive. Google’s offerings in Scholar thus became a validation of, and a capstone on, the things that open-access advocates had been doing for years. Apart from its inherent value -- which was, and is, huge -- it was a sign that freely accessible legal information was technically advanced and more than sufficient for many if not most professional needs. Most of all, it signaled that free legal information was something to be taken seriously. It sent that signal at a time when circumstances compelled the profession to pay far more attention than it otherwise might have. Scholar not only brought us a new and capable collection, it brought a new level and quality of attention to the entire open-access enterprise.

Everyone else

I began by telling a story about law and lawyers, but of course there’s an even more compelling story about law and everyone else. Laws -- and particularly statutes and regulations -- affect everybody. They describe what’s possible and permissible, what it costs to do business, what we can expect from government and what government can expect from us. On any given day, an open-access legal web site such as ours, or Scholar, is used by people who are helping veterans get the benefits to which they’re entitled, small businesses planning new courses of action, and students at all levels who are learning about the Constitution and our system of government. There are law-enforcement personnel learning about the limits and obligations of their position, hospital managers consulting public-benefits law, and people finding out what they have to do to sell new products in new markets. Those people need access to law. They need to be able to create starting points for themselves, using search to connect words and phrases that they already understand with concepts and explanations that at first they will not understand at all. They need to be able to follow their noses from those poorly-understood things to other pages that will explain them. Making all that possible is the next challenge.

What now?

Google Scholar’s caselaw collection offers features -- such as citators -- that are a step toward the “system of books” that would fully integrate primary legal sources and commentary into a practical resource for public understanding and professional practice. The legal-information ecosystem on the Web as a whole is moving in that direction. As that progresses, the benefits to everyone affected by law -- which is to say, everyone, period -- will be enormous. We will move beyond making law available on the Web to making it truly accessible on the Web -- not just discoverable, but understandable.

In 1992, starting with important caselaw collections, the open-access community began connecting law to itself. Hyperlinks gave readers a way to seamlessly follow citations -- at least if the cited thing was available online somewhere. And simply seeing to it that the things that ought to be online are online kept us all busy for a very long time (and is still a significant problem, in many places, some of them surprisingly close to home). We need to increase the density of connections between documents by making connections easier for machines (rather than human authors) to create. We need to hugely increase the amount of freely-available material that explains the law. And we need to -- in ways both trivial, and not -- make it possible for people to find the laws that affect them using things they already know.

Regulations provide a really good arena for thinking about such problems, for two reasons. First, they are harder for information systems to deal with. They are inconsistently drafted by a wide variety of people. For example, the Code of Federal Regulations is essentially a compilation of the work of perhaps 200 agencies (nobody really knows exactly how many). And, compared to caselaw, regulations have been relatively neglected by open-access publishers. Finally, and most importantly, they are the largest single contact surface between the public and the legal system. Yes, there are Supreme Court cases that are sweeping in their effect on daily life -- roughly half a dozen a year, compared to the thousands and thousands of cases in the Federal system that are just about two people suing two other people over something that only four people care about (and maybe a fifth if you count the judge). Regulations affect lots of people, and they change often. That makes them much more of a challenge for open-access publishers, both technically and economically. It also makes it that much more urgent to provide citizens with improved modes of access and value-added services such as notification of changes and anything and everything that would make compliance easier. Second, regulations are about things, and they are often based on science. And building things that bridge knowledge domains is what information scientists do.

A trivial example may help. Right now, a full-text search for “tylenol” in the US Code of Federal Regulations will find… nothing. Mind you, Tylenol is regulated, but it’s regulated as “acetaminophen”. But if we link up the data here in Cornell’s CFR collection with data in the DrugBank pharmaceutical collection , we can automatically determine that the user needs to know about acetaminophen -- and we can do that with any name-brand drug in which acetaminophen is a component. By classifying regulations using the same system that science librarians use to organize papers in agriculture, we can determine which scientific papers may form the rationale for particular regulations, and link the regulations to the papers that explain the underlying science. These techniques, informed by emerging approaches in natural-language processing and the Semantic Web, hold great promise.

All successful information-seeking processes permit the searcher to exchange something she already knows for something she wants to know. By using technology to vastly expand the number of things that can meaningfully and precisely be submitted for search, we can dramatically improve results for a wide swath of users. In our shop, we refer to this as the process of “getting from barking dog to nuisance”, an in-joke that centers around mapping a problem expressed in real-world terms to a legal concept. Making those mappings on a wide scale is a great challenge. If we had those mappings, we could answer a lot of everyday questions for a lot of people.

As I hinted earlier, search is often just the start; it shows the way to the trailhead, but the information-seeker must then follow a path that leads to commentary and deeper explanation of what the search engine offers easily. Building that path is a problem that rests critically on integration across multiple websites and collections. Metadata-publishing standards and linked-data approaches are helping; we look forward, for example, to a set of specific legal extensions to schema.org that will make it easier for people and machines to follow their noses from what search provides to the understanding that they really need. It will be a long job.

But that is a tale for another day, perhaps another ten years in the future. It’s exciting to see how far we’ve come. Scholar, and its legal collection, are a tremendous gift to those who want to know about the law, and a platform for those of us who want to go further.

Monday, October 20, 2014

Caselaw is Set Free, What Next?

Caselaw is Set Free, What Next?

A lawyer story

Everyone else

What now?