Google Scholar will soon be 10 years old. It is amazing how time flies. Seems like it was just yesterday that Alex and I were scrambling to put everything in place for the launch. To help celebrate the anniversary, we have invited friends and colleagues in scholarly communication to share their thoughts. About Scholar, about scholarly communication and about future directions. These will appear in a 10th anniversary blog post series. The first post in the series is by John Sack, the founding director of HighWire Press. - Anurag Acharya
HighWire Press started at Stanford University almost 20 years ago -- we launched the Journal of Biological Chemistry Online in early 1995 -- about the same time that Google's founders were working in the same Stanford Quadrangle on the foundations for Google. It took until 2002 to get our two efforts together and index HighWire-hosted scholarly articles in Google. This project increased usage of the articles by one to two orders of magnitude, even though their abstracts had been fully indexed in PubMed right from the start. Two years later, in 2004, Google Scholar arrived.
In the twenty years since HighWire began, and in the ten years since Google Scholar beat a path to the door of scholarship, what have we achieved? We know the answer to that question from interviews we did in 2002 and again in 2012-2014 with over sixty researchers.
Back in 2002, people still used the word "e-Journal" to describe the electronic version of a "print journal". Researchers told us they needed better ways to locate content across all the different sources of full-text – publisher sites each had their own separate search engines, and PubMed searched only abstracts.
We collectively solved that problem -- publishers took a big leap in providing the Google indexer with access to subscriber-only content. So when HighWire asked Stanford researchers in 2012 interviews about the challenges of searching, they said:
"Finding is easy..."
....but reading is hard."
We had so well-solved the search problem that people found more than they could handle. This wasn't just a relevance-ranking problem -- useless stuff showing up in search results. There was important material in those results and it needed to be evaluated to satisfy a researcher's sense of thoroughness.
To “read” many articles in a short period of time, researchers want to be able to absorb the gist of an article quickly, and be able to judge its quality and relevance. In our interviews with researchers, we heard strong support for adding visual abstracts to articles (as the American Chemical Society has been doing for years in all of its journals); for adding "take home messages" to articles indicating the significance of an article in the context of what is known and what the article adds (often found in clinical journals, like the BMJ, but now also appearing in basic-science journals such as PNAS and the JBC); and for a contextualized 'figure reading' experience (such as is found in the Lens viewer introduced in eLife).
All of these help researchers take in an article faster. None of these aids is available from Scholar search results, so readers must visit the sites where the full text is found. This “pogo-sticking” from search result to article and back and forth may seem normal and natural to us in the publishing industry. But as consumers we rely on Google showing augmented search results: if Google results stopped showing movie and restaurant “star” ratings, and restaurant price range “$$$” in its search results we’d think there was a bug!
How can Google Scholar meet this "read faster" challenge? How search evolves on this front will affect how researchers and publishers do their work of finding audiences.
One way to speed scholarly literature research would be to improve the “directedness” of search results -- don't just give me a list of articles, but give me or get me to paragraphs in context. Clearly Scholar knows the context for matching a query's criteria since it shows a snippet from the text. Why not have Scholar and publisher sites collaborate a bit more to help readers get quickly from a result list to the first paragraph that matches a search, then on to the next matching paragraph, and so on.
And if Scholar can do that with search results, perhaps it can also help us with the too-arduous task of going from a citation embedded in an article, to the specific part of the cited article that is being referenced. Book references contain page numbers; why should journal articles be less specific?
Perhaps we can see how unhelpful this is by stepping out of our scholarly-publishing tradition and shifting to the consumer context: Imagine if a Google search provided you with a link to only the web site (i.e., home page) rather than to the specific page on a site that matched your search! That's what we settle for with scholarly journal references.
We know from researcher interviews that in some fields people don't start by reading the article text per se, they "read" the images and then look at the narrative around the images for context. In some fields, figures tell the story -- just as in graphic novels and comic books, I suppose! -- and an article is figures woven together by text. This isn’t only for disciplines that are visual in the traditional sense, but perhaps as true for equations in a physics article, structures in a chemistry article, or tables in a clinical-trial article.
So why not make it possible to search images by searching the figure legend, or text in a figure or table, or closed caption in a video. Google already provides a basic image search. Perhaps if publishers would provide Scholar with rights to display low-resolution article images – the visual equivalent of a snippet – we could have a scholarly version of image search.
There are great opportunities for innovation ahead of us. We will need to take some risks, build experiments and collaborate across boundaries between stakeholders. That’s what we have done for the past decade, and look how far we have come -- “finding is easy”!
Helping Researchers See Farther Faster
John Sack, Founding Director, HighWire Press
HighWire Press started at Stanford University almost 20 years ago -- we launched the Journal of Biological Chemistry Online in early 1995 -- about the same time that Google's founders were working in the same Stanford Quadrangle on the foundations for Google. It took until 2002 to get our two efforts together and index HighWire-hosted scholarly articles in Google. This project increased usage of the articles by one to two orders of magnitude, even though their abstracts had been fully indexed in PubMed right from the start. Two years later, in 2004, Google Scholar arrived.
In the twenty years since HighWire began, and in the ten years since Google Scholar beat a path to the door of scholarship, what have we achieved? We know the answer to that question from interviews we did in 2002 and again in 2012-2014 with over sixty researchers.
Back in 2002, people still used the word "e-Journal" to describe the electronic version of a "print journal". Researchers told us they needed better ways to locate content across all the different sources of full-text – publisher sites each had their own separate search engines, and PubMed searched only abstracts.
We collectively solved that problem -- publishers took a big leap in providing the Google indexer with access to subscriber-only content. So when HighWire asked Stanford researchers in 2012 interviews about the challenges of searching, they said:
"Finding is easy..."
....but reading is hard."
We had so well-solved the search problem that people found more than they could handle. This wasn't just a relevance-ranking problem -- useless stuff showing up in search results. There was important material in those results and it needed to be evaluated to satisfy a researcher's sense of thoroughness.
Reading Faster
To “read” many articles in a short period of time, researchers want to be able to absorb the gist of an article quickly, and be able to judge its quality and relevance. In our interviews with researchers, we heard strong support for adding visual abstracts to articles (as the American Chemical Society has been doing for years in all of its journals); for adding "take home messages" to articles indicating the significance of an article in the context of what is known and what the article adds (often found in clinical journals, like the BMJ, but now also appearing in basic-science journals such as PNAS and the JBC); and for a contextualized 'figure reading' experience (such as is found in the Lens viewer introduced in eLife).
All of these help researchers take in an article faster. None of these aids is available from Scholar search results, so readers must visit the sites where the full text is found. This “pogo-sticking” from search result to article and back and forth may seem normal and natural to us in the publishing industry. But as consumers we rely on Google showing augmented search results: if Google results stopped showing movie and restaurant “star” ratings, and restaurant price range “$$$” in its search results we’d think there was a bug!
How can Google Scholar meet this "read faster" challenge? How search evolves on this front will affect how researchers and publishers do their work of finding audiences.
Contextualization of References
One way to speed scholarly literature research would be to improve the “directedness” of search results -- don't just give me a list of articles, but give me or get me to paragraphs in context. Clearly Scholar knows the context for matching a query's criteria since it shows a snippet from the text. Why not have Scholar and publisher sites collaborate a bit more to help readers get quickly from a result list to the first paragraph that matches a search, then on to the next matching paragraph, and so on.
And if Scholar can do that with search results, perhaps it can also help us with the too-arduous task of going from a citation embedded in an article, to the specific part of the cited article that is being referenced. Book references contain page numbers; why should journal articles be less specific?
Perhaps we can see how unhelpful this is by stepping out of our scholarly-publishing tradition and shifting to the consumer context: Imagine if a Google search provided you with a link to only the web site (i.e., home page) rather than to the specific page on a site that matched your search! That's what we settle for with scholarly journal references.
Searching For Images
We know from researcher interviews that in some fields people don't start by reading the article text per se, they "read" the images and then look at the narrative around the images for context. In some fields, figures tell the story -- just as in graphic novels and comic books, I suppose! -- and an article is figures woven together by text. This isn’t only for disciplines that are visual in the traditional sense, but perhaps as true for equations in a physics article, structures in a chemistry article, or tables in a clinical-trial article.
So why not make it possible to search images by searching the figure legend, or text in a figure or table, or closed caption in a video. Google already provides a basic image search. Perhaps if publishers would provide Scholar with rights to display low-resolution article images – the visual equivalent of a snippet – we could have a scholarly version of image search.
There are great opportunities for innovation ahead of us. We will need to take some risks, build experiments and collaborate across boundaries between stakeholders. That’s what we have done for the past decade, and look how far we have come -- “finding is easy”!