Optimize notebook presentation on Google Scholar #437

bollwyvl · 2015-03-26T13:30:07Z

While twitter(#167) is nice for reaching a broad audience, among the notebooks we serve that are broadly linked, many are scholarly in nature. At present, zero of these are findable on Google Scholar, one of the best tools for getting the right eyeballs on a notebook to have a broader impact to science and engineering.

While Google does index nbviewer (really, the only way to find things until we hammer out #405), Google Scholar remains aloof.

Because of this de facto status of Google Scholar's fields as the way to publish metadata, other tools like Mendeley and Zotero would immediately work.

It looks like we have to do a few things for this to work.

Metadata

The most arduous is getting users to add/maintain rich metadata to each notebook we serve. Looking at that page, it looks like we would need to encourage content authors to provide, at a minimum, the following fields:

citation_title or DC.title
citation_author or DC.creator
citation_publication_date or DC.issued

The challenge here is in improving the notebook metadata UI, a la ipython/ipython#6073, in helping users select, populate and observe the status of their metadata experience. Echoing my comments from there, raising the reward to include in-notebook UI awareness of the metadata, i.e. special formatting for title, authors, abstract, publication date, which nbviewer would recognize would make this seem less like drudge-work, and help ensure that the metadata stays accurate through forking, etc.

References

Allowing the crawler to find references used in the notebook would add considerably to their impact. Oddly, Scholar doesn't specify a metadata-driven way to do this, instead relying on some semantic markup:

Mark the section of the paper that contains references to other works with a standard heading, such as "References" or "Bibliography", on a line just by itself. Individual references inside this section should be either numbered "1. - 2. - 3." or "[1] - [2] - [3]" in PDF, or put inside an "
" list in HTML. The text of each reference must be a formal bibliographic citation in a commonly used format, without free-form commentary.

This suggests that something like takluyver/cite2c#10 would be valuable.

Discoverability

It could well be that notebooks linked in other places would get picked up by scholar.

However, if we want to optimize the downstream experience, then need to create a place that the crawler can actually find things in a reasonable fashion. If successful, we would likely immediately fall under the "more than 100k" club. Here's the guidance from the site:

For websites with more than a hundred thousand papers, we recommend that you create an additional browse interface that lists only the articles added in the last two weeks. This smaller set of webpages can be recrawled more frequently than your entire browse interface, which will facilitate timely coverage of your recent papers by the search robots.

So perhaps the first step of getting to indexed notebooks would be to index those that meet the minimum viable metadata standard, and make "recently published" documents easily discoverable from the front page.

Quality

Undoubtedly, this could become an avenue for spam or just junk. No doubt a simple robot could generate an infinite stream of properly-formatted, difficult-to-detect junk notebooks, with proper MathJax and figures... in fact more easily than humans!

I have no suggestion as to how we might combat this, as we obviously don't have the manpower to validate every link, or even really put a sensible PR-based, crowd-sourced mechanism in place that would be robust against even a trivial amount.

The text was updated successfully, but these errors were encountered:

bollwyvl added the type:Enhancement A proposed extension to the behavior of the project label Sep 1, 2015

bollwyvl mentioned this issue Oct 20, 2015

markdown internationalization in nbviewer #507

Closed

rgbkrk mentioned this issue Sep 6, 2016

Fields for author/creator and title jupyter/nbformat#45

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize notebook presentation on Google Scholar #437

Optimize notebook presentation on Google Scholar #437

bollwyvl commented Mar 26, 2015

Optimize notebook presentation on Google Scholar #437

Optimize notebook presentation on Google Scholar #437

Comments

bollwyvl commented Mar 26, 2015

Metadata

References

Discoverability

Quality