The next article in our 10th Anniversary Series is by Thomas Bruce. He is the director of the
Legal Information Institute at Cornell. He co-founded the LII in 1992. Today, its legal collections are used widely and have inspired the
Free Access to Law Movement which has helped citizens worldwide learn about the laws that govern them. Thomas is also the author of
Cello, first Web browser for Microsoft Windows. -- Anurag Acharya
Caselaw is Set Free, What Next?
Thomas Bruce, Director, Legal Information Institute, Cornell
A lawyer story
Google Scholar’s caselaw collection is a victory for open access to
legal information and the democratization of law. It would be more
than worthy of celebration from that standpoint alone. But caselaw is
above all an obsession of lawyers, and I’d like to start by telling
the tale from their point of view.
Five years ago, when Google Scholar added judicial opinions to its
portfolio, it created an immediate sensation among
lawyers. Small-office and solo practitioners
were
the
most vocal about it; they had always had a difficult time
affording the services of commercial
publishers,
even
in print. And now there was access to a significant chunk of
material that had previously been lodged firmly behind paywalls. It
was linked and searchable, and still better, it
offered
a
version of the citation-tracking and evaluation features that
lawyers knew and loved in expensive commercial systems. It had
first-class sorting and filtering features. It
had
Bluebook-form
citations for each case (pretty much the epitome of something that
nobody but
lawyers
knows
or cares about, but a very thoughtful touch indeed). Nobody in the
open-access arena had tried such a thing, and probably only Google
could
have.
One
commentator said that, “Google fired (arguably) the
loudest...salvo in the battle for free access to caselaw… and it
apparently came as a tweet”.
Scholar’s immediate impact on the legal profession was owed in large
part to its technical virtuosity. It was an unusual display of
ingenuity used to democratize services and features whose value had
mostly been known only to lawyers. But, for the legal profession, it
was happening in the middle of a long-brewing, near-perfect
storm. Since
at
least the early 90’s, clients had complained about surcharges that
law firms added to legal research costs. By 2000, there was growing
refusal to reimburse legal-research fees at all; clients felt that the
firm’s online charges were just a part of overhead, like water and
electricity. That was not an isolated gripe; rather, it was a visible
crack in
a
business model that we now know had been eroding for quite some
time. By one estimate, the 2008 implosion of the
financial-services industry destroyed over a third of the legal
employment in New
York.
A
lot of firms changed radically or disappeared altogether in the
aftermath. You could talk, in dry academic terms,
about
downward
price pressure on the industry. One suspects that the feeling was
more like riding in an elevator whose cables had been cut.
There had been free offerings of caselaw
online
for some
time, starting with a BBS
system
offered
by the Cleveland Freenet in 1989; the first web-based effort
started
here at Cornell in
1992, and was followed with a full edition of all Federal statutes in
1994. Elsewhere -- notably
in
Canada
and
Australia -- open-access
systems offered by third parties had evolved into the de facto
national standard. And government was catching up, with many law
creators publishing their materials online, for free.
Free services had never been the first choice of lawyers in the
US. Some of the reasons were rational -- free services often lacked
features that lawyers depend on, most provided very little in the way
of commentary or annotation, and in any case they were highly
distributed. There was no “one-stop shopping” in the world of open
access to law, just a lot of websites offering different
collections. The irrational reasons were, if anything, even more
interesting and far more influential, though much more deeply buried
in lawyer psyches. Lawyers are notoriously conservative in their work
methods,
and
many
law librarians even more so. Anything that was both new and
noncommercial
was
inherently
suspect. And the commercial services had
had
more
than a century to reinforce the idea that size and comprehensiveness
were the only measures of quality that mattered.
Even so, it’s hard to convey the degree to which lawyers mistrust
distributed systems. As John Lederer once remarked, “Lawyers don’t buy
books -- they buy systems of books”, and so it was with electronic
products as well. It was easy for lawyers to dismiss what they saw as
isolated pockets of legal information offered by volunteers at wildly
different levels of added value, and marketers of commercial
services
had
been quick to emphasize these qualities. That said, in the year
prior to the addition of caselaw to Scholar, Cornell’s website had
delivered well over 81 million pageviews to nearly 14 million unique
visitors. 4.5 million of those pageviews went to
the
Federal Rules of
Civil Procedure, a collection unlikely to be used by anyone but
lawyers.
Comes
now Google, a company with unparalleled capacity and legendary
technical skills, offering a large collection of caselaw under one
roof, with a workable citator and advanced search functionality. That
was a big story, and it was
often
reported as “Google takes on commercial legal-research
behemoths”. It was free access offered from a source that could not be
dismissed as somehow beneath notice or unlikely to survive. Google’s
offerings in Scholar thus became a validation of, and a capstone on,
the things that open-access advocates had been doing for years. Apart
from its inherent value -- which was, and is, huge -- it was a sign
that freely accessible legal information was technically advanced and
more than sufficient for many if not most professional needs. Most of
all, it signaled that free legal information was something to be taken
seriously. It sent that signal at a time
when
circumstances
compelled the profession to pay far more attention than it otherwise
might have. Scholar not only brought us a new and capable
collection, it brought a new level and quality of attention to the
entire open-access enterprise.
Everyone else
I began by telling a story about law and lawyers, but of course
there’s an even more compelling story about law and everyone
else. Laws -- and particularly statutes and regulations
--
affect
everybody. They describe what’s possible and permissible, what it
costs to do business, what we can expect from government and what
government can expect from us. On any given day, an open-access legal
web site such as ours, or Scholar, is used by people who are helping
veterans get the benefits to which they’re entitled, small businesses
planning new courses of action, and students at all levels who are
learning about the Constitution and our system of government. There
are law-enforcement personnel learning about the limits and
obligations of their position, hospital managers consulting
public-benefits law, and people finding out what they have to do to
sell new products in new markets. Those people need access to
law. They need to be able to create starting points for themselves,
using search to connect words and phrases that they already understand
with concepts and explanations that at first they will not understand
at all. They need to be able
to
follow
their noses from those poorly-understood things to other pages
that will explain them. Making all that possible is the next
challenge.
What now?
Google Scholar’s caselaw collection offers features -- such as
citators -- that are a step toward the “system of books” that would
fully integrate primary legal sources and commentary into a practical
resource for public understanding and professional practice. The
legal-information ecosystem on the Web as a whole is moving in that
direction. As that progresses, the benefits to everyone affected by
law -- which is to say, everyone, period -- will be enormous. We will
move beyond making law available on the Web to making it truly
accessible on the Web -- not just discoverable, but
understandable.
In 1992, starting with important caselaw collections, the open-access
community began connecting law to itself. Hyperlinks gave readers a
way to seamlessly follow citations -- at least if the cited thing was
available online somewhere. And simply seeing to it that the things
that ought to be online are online kept us all busy for a very long
time (and is still a significant problem,
in
many
places, some of
them
surprisingly
close to home). We need to increase the density of connections
between documents by making connections easier for machines (rather
than human authors) to create. We need to hugely increase the amount
of freely-available material that explains the law. And we need to --
in ways both trivial, and not -- make it possible for people to find
the laws that affect them using things they already know.
Regulations provide a really good arena for thinking about such
problems, for two reasons. First, they are harder for information
systems to deal with. They are inconsistently drafted by a wide
variety of people. For example,
the
Code of Federal
Regulations is essentially a compilation of the work of perhaps
200 agencies (nobody really knows exactly how many). And, compared to
caselaw, regulations have been relatively neglected by open-access
publishers. Finally, and most importantly, they are the largest single
contact surface between the public and the legal system. Yes, there
are Supreme Court cases that
are
sweeping
in their effect on daily life -- roughly half a dozen a year,
compared to the thousands and thousands of cases in the Federal system
that are just about two people suing two other people over something
that
only
four people care about (and maybe a fifth if you count the
judge). Regulations
affect
lots of people,
and
they
change often. That makes them much more of a challenge for
open-access publishers, both technically and economically. It also
makes it that much more urgent to provide citizens with improved modes
of access and value-added services such as notification of changes and
anything and everything that would make compliance easier. Second,
regulations are about things, and they are often based on science. And
building things that bridge knowledge domains is what information
scientists do.
A trivial example may help. Right now, a full-text search for
“tylenol” in the US Code of Federal Regulations will find…
nothing. Mind you, Tylenol is regulated, but it’s regulated as
“acetaminophen”. But if we link up the data here in Cornell’s CFR
collection with data in
the
DrugBank
pharmaceutical collection , we can automatically determine that
the user needs to know about acetaminophen -- and we can do that with
any name-brand drug in which acetaminophen is a component. By
classifying regulations using the
same
system
that science librarians use to organize papers in agriculture, we
can determine which scientific papers may form the rationale for
particular regulations, and link the regulations to the papers that
explain the underlying science. These techniques, informed by emerging
approaches in natural-language processing and the Semantic Web, hold
great promise.
All successful information-seeking processes permit the searcher to
exchange something she already knows for something she wants to
know. By using technology to vastly expand the number of things that
can meaningfully and precisely be submitted for search, we can
dramatically improve results for a wide swath of users. In our shop,
we refer to this as the process of “getting from barking dog to
nuisance”, an in-joke that centers
around
mapping
a problem expressed in real-world terms to a legal concept. Making
those mappings on a wide scale is a great challenge. If we had those
mappings, we could answer a lot of everyday questions for a lot of
people.
As I hinted earlier, search is often just the start; it shows the way
to the trailhead, but the information-seeker must then follow a path
that leads to commentary and deeper explanation of what the search
engine offers easily. Building that path is a problem that rests
critically on integration across multiple websites and
collections. Metadata-publishing standards and linked-data approaches
are helping; we look forward, for example, to a set of specific legal
extensions to
schema.org that will
make it easier for people and machines to follow their noses from what
search provides to the understanding that they really need. It will be
a long job.
But that is a tale for another day, perhaps another ten years in the
future. It’s exciting to see how far we’ve come. Scholar, and its
legal collection, are a tremendous gift to those who want to know
about the law, and a platform for those of us who want to go further.