Reinventing Discovery by Michael Nielsen

This coming June will mark ten years since I started this blog (using Blogger on our own domain– here’s the very first post) and writing about physics on the Internet. This makes me one of the oldest science bloggers in the modern sense– Derek Lowe is the only one I know for sure has been doing this longer than I have, and while Bob Park’s “What’s New” and John Baez’s “This Week’s Finds” have been around longer, they started out as mailing lists, not true weblogs.

As such a long-term denizen of the Internet, I’m pretty much contractually obliged to have an opinion about Michael Nielsen’s new book, Reinventing Discovery: The New Era of Networked Science, now out form Princeton University Press. I know Michael from back in my misspent youth on Usenet, and I’ve heard him talk about this a few times (in person and via the Web), and I’m mentioned briefly in the book, so I was happy to get a review copy.

The central thesis here is an idea that’s become almost cliche: “The Internet changes everything!” The book argues that with the recent revolution in information technologies, it’s become possible to do science in entirely new ways. The ability to connect to vast numbers of other researchers, and even ordinary citizens, who might have useful expertise to bring to bear on a scientific problem opens the possibility of solving problems that would be completely intractable in the traditional mode of scientists working alone or in small groups. Web-based scientific enterprises like the Polymath Project and Galaxy zoo have generated impressive scientific results, proving new theorems and discovering new classes of astronomical objects by using the Internet to network together large numbers of people to tackle the problem in a distributed way. These projects are potentially the leading edge of a revolution in the way science is done.

In other hands, this can easily tip over the line into the sort of Internet triumphalism that I find really grating. What I appreciate most about this book, then, is what it isn’t. It’s not a grand, sweeping revolutionary manifesto taking this idea and running with it past all sensible limits, a la most of what’s been written about the “Singularity”. While it does highlight the revolutionary potential of the networking of science, it also clearly spells out the limits of the method. The Internet is changing everything, true, but some things will change more than others, and the big payoffs for networked science will come in a handful of fairly specific fields where the research problems have the right characteristics. As someone whose scientific background is in one of the areas least likely to be transformed by these tools (experimental AMO physics), I like seeing these limits acknowledged.

The book is also worth praising for what it is, though, which is a wide-ranging and engagingly written description of some of the coolest things that have been done using Internet technologies. These include things that are obviously scientific, like the projects linked above, but also activities that seem more frivolous, like the distributed chess game Kasparov vs. the World in 1999. These aren’t necessarily pieces that I would have put together, but once identified, they clearly fit, and make a convincing argument for the potential of networked science.

The other aspect of this that will draw a lot of discussion is the chapter on Open Science, which argues for more sharing of scientific information, from research papers (a la the arXiv) to raw data and analysis code. Again, I’m less convinced of the immediate benefits of this, largely because my own field of research is one of the least likely to show benefits, but it is notable for having one of the best arguments for open-ness that I’ve heard. Anticipating objections that giving work away “for free” is against the interests of scientists, Nielsen points out that we already give our work away for free– nobody gets paid for submitting scientific papers to journals, after all. The reward for publishing a paper is ultimately a social construct: we have agreed to use journal publication as a measure of scientific publication, and thus the reward for “giving away” data to a journal is the accumulation of prestige within the discipline. While this does indirectly lead to financial rewards through job promotion and so on, the immediate benefit of publishing a paper is status, not money.

As Nielsen notes in the final section, once tools became available to track citations to arXiv preprints, large swathes of theoretical physics shifted to using those as the marker of professional success, with journal publication becoming a secondary activity in some cases. There’s no reason that, given the right tools, we couldn’t also agree to treat the contribution of data to a public archive, or code to a community repository as professional activity in the same way that we count preprints as professional activity. All that’s really needed is a willingness on the part of the scientific community to expand our definition of professional activity. As I’ve written stuff along these lines before, I liked seeing that mentioned explicitly.

This is a fairly short book, so there are, of course, places where I would’ve liked to see more detail. One of the necessary conditions for successful networking of science is a kind of community management– you need ways to make sure that the useful contributions are brought forward and highlighted, which is a tricky problem. While the book highlights some successful examples– the Polymath Project and Kasparov vs. the World being the best examples– a quick scan of comment pages on the Internet will show that this is not a trivial matter. I would’ve liked to see more discussion of the failure modes of this sort of thing, and maybe some connections to the ideas of people like Teresa Nielsen Hayden who have put a lot of work into understanding what it takes to make a great commenting community on the Web. It’s not immediately clear to me that these sort of networked projects scale well– if their success depends on heroic community management efforts by people with both relevant expertise and the right sort of personality, then the effects might end up being less sweeping than anticipated. I think it’s too early to have good data for how this affects scientific projects, but above a certain threshold of popularity, blog and media comment sites tend to degenerate rapidly into an absolute sewer, and that gives me a little pause.

All in all, though, I highly recommend this book. It’s engagingly and persuasively written, while still being measured in its approach to the subject. If you have any interest in the way science is done in the modern age, and how it will be done in the future, you should pick up a copy.

5 comments

  1. I have limited experience with StackExchange. I used the Physics site pretty regularly for a while, until work got too busy, and haven’t really gotten back to it. That was more an explain-for-general-public sort of thing than a research situation, though.

  2. You’re not old enough to have other reference points, just as I am not old enough to appreciate other — far bigger — changes in communication. An example. The “Pony Express” was western stories at their best in my youth, yet I was reminded just the other day (150th anniversary of its demise) that it lasted only a short time before telegraph completely crossed the country.

    Messages have been traveling across the US at nearly the speed of light for 150 years.

    Is the internet plus WWW really a difference in kind from e-mail lists sent over Bitnet? There isn’t much of a difference between waiting for the daily or hourly transmissions over Bitnet and finding an hour to catch up with a blog reached via a busy T3 network during a busy day. It is still true that if it is really important, you pick up the phone to tell them to read their mail.

    When it came right down to it, Ye Olde Telephone (plus FAX machines and the NY Times) was more than enough to spread info about how to make a high Tc superconductor around the world so fast that it was replicated all over the place within days. You didn’t have to wait for preprints to go through the mail if it was important enough. From that perspective, you could say Federal Express and FAX “changed everything”.

    Cheap photocopying was, IMHO, the biggest revolution in networked science. We were a small group, but we would send out a hundred copies of an important preprint, targeted to people interested in the problem, and get hundreds in return. You couldn’t do that very effectively when you only had a mimeograph. Even if it took a week or two to get to Japan or some parts of Europe, that was a huge increase in communication speed and “networking” than having to wait months for a Physical Review Letter or a year or more for a journal article to get refereed by regular mail, type set, proofed, printed, and show up in your mailbox.

    The invention of TeX and LaTex by Don Knuth was just as revolutionary as cheap photocopies, because they enabled electronic transmission of preprints that actually look and read like a journal article. IMO, that was the second revolution in networked science.

    I would say the advantage of the original “x-rated” ancestor of arXiv was more the quasi-journal nature of its library, because it wasn’t any faster than sending a preprint by e-mail. The web library idea grew out of the pre-print libraries maintained by labs like LANL and large research groups, with the added virtue of standardization and citation, as you noted.

    Finally, “open science” is a mindset, not a technology. You choose to send something to arXiv, or not, just as you choose to make raw data or code available, or not. That choice had to be made then, just as it is made now. The decision is similar to the one made when patenting something (thus making it publicly available) rather than keeping it an industrial secret.

  3. we would send out a hundred copies of an important preprint, targeted to people interested in the problem

    I remember this technique as well. It has an important limitation: the 100 recipients of your preprint were people that you, or at least senior members of your group, knew. And they, likewise, would send your group copies of their preprints. But what about people from outside your circle who were interested in the problem? They would be at a disadvantage, at least until they found a way to get into your circle. And remember that if you are just starting out, especially if you are in a developing country, the international postage for snail-mailing 100 copies of your preprint may not be a trivial cost, unlike for established groups in the West. Now they can download your preprint via the Web, on the computer that they have to have anyway in order to be a functional lab, and thereby compete on a more level footing with you.

    You can see the change in authorship patterns, if you look for it. In the 1990s, it was rare to see manuscripts from scientists in countries like China, India, or Korea, and the handful that were out there were by scientists who had ties to the West (usually via their Ph.D. advisor). Today I know (at least from their published work, if not personally) senior scientists from all three of those countries who work in their native country. You can argue that national priorities have played a role here, and you would be right, but this change would have been much slower without the Internet.

  4. Congratulations on 10 years!

    Good luck on all fronts. I assume your family has missed having a Halloween baby.

    My son’s birthday is on Guy Fawkes Day…

Comments are closed.