How Did the arXiv Succeed?

In which we look again at the question of why, despite the image of physicists as arrogant bastards, biologists turn out to be much less collegial than physicists.

————

While I was away from the blog, there was a spate of discussion of science outreach and demands on faculty time, my feelings about which are a little too complicated to boil down to a blog post in the time I have available. I did notice one thing in Jeanne Garb’s guest blog post at Nature Networks:

Yet, given the current system, most scientists are choosing to keep a closed-notebook policy because they fear getting scooped, which is science jargon for idea(s) and/or data theft. When a scientist is scooped, they are no longer able to report their data as being novel. When the data is not novel, they cannot publish in high impact journals. When they can’t publish in high impact journals, the chances for funding are significantly reduced. When the funding chances are significantly reduced, there is no money to do science. When there is no money to do science, they lose their job and their passion. You get the idea.

It is no wonder that many scientists are careful when they are speaking of their data and experimental designs. It is true even for me, and I am hugely supportive of the open science movement. For instance, I love the idea of Figshare and since I’ve learned about it, I’ve been wanting to upload some data. Yet, something in me kept me from doing so. A lot of my stuff is not yet published – what if I get scooped? The bricks for my scientific foundation have not yet been laid – would posting my data demolish my chances?

Once again, I’m struck by the unlikeliness of the arXiv preprint server. Which is, after all, an institutionalized form of sharing your results before they’re published in high-impact journals. This is exactly the sort of thing that makes Garb (and the vast majority of other people in the life sciences, judging by the numerous failed attempts to create an arxiv-like service for those fields) so uncomfortable. You’re putting your research out there early, which might give other researchers a chance to build on your results before they’re published, and possibly scoop you on your next publication. Until relatively recently, arXiv postings even counted as prior publication for the purposes of the top journals, so putting a preprint there could block you from publishing the results in Science or Nature (this may have been relaxed somewhat in recent years, but I’m not really sure).

And yet, somehow, this managed to become an absolutely essential component of academic physics. In some fields, it’s the primary component of academic physics– people in high-energy physics rely on the arXiv almost to the exclusion of traditional journals.

And it’s still kind of boggling to me that this actually happened. Because it’s not like the job situation in academic physics is some idyllic golden age– the only difference between academic physics and academic biology is that physics’s job crunch started twenty or thirty years earlier. We’re not enjoying a vast oversupply of jobs in physics (as the depressed postdoc next to me at the bar at DAMOP could testify), and those lucky enough to get academic jobs are not lacking in pressure to publish in high-impact journals. It’s not even that physicists are a bunch of cuddly hippies, either– the high-energy theory community in which the arXiv first took root has a richly deserved reputation for arrogance and a tendency toward extremely harsh commentary. (The “moronic philosopher” crack by Lawrence Krauss that caused so much recent angst is not entirely unexpected in that crowd– physicists tend to have very definite opinions, and rarely mince words when speaking of those they think have gone astray.)

And yet, somehow, the arXiv not only managed to gain a foothold, but has thrived. It’s even managed to incorporate a lot of the angst-generating elements of academia– Paul Ginsparg showed a graph of submissions as a function of time, showing an enormous peak around 4pm, which results from people trying to get their paper near the top of the next day’s summary email. People even write automated scripts to ping the arXiv’s server so as to identify the optimal instant for submitting their paper. This is, in some sense, the local equivalent of trying to get into Science or Nature, and the position of a preprint in that summary email has a significant effect on the number of citations that a preprint receives.

A lot of pieces talking about the failure of open access policies to catch on more widely tend to point to the success of the arXiv in physics and math as if it’s the rule and the failure of the life-science versions are the exception. But, given that physics does not lack for high-stakes job competition, or publication pressure, I think this is the wrong way around. It’s not surprising that biologists don’t embrace preprint-sharing; rather, it’s a mystery how the arXiv managed to succeed so brilliantly.

So, what is it about physics that makes preprint sharing work here, while it’s never caught on in the life sciences? Is there some additional structural difference between physics and the life sciences that I’m missing? Or are biologists just bigger assholes than particle physicists?

21 comments

  1. Chad, it seems to me that one of the main structural differences between physics and biology is that in physics (specially in condensed matter) graduate students and postdocs are typically ‘encouraged’ to publish as much as possible in order to have better chances of getting a good job afterwards. That’s why now many papers submitted on the arxiv have the PRL style 4-page format, or more generally in the short letter style appropriate for Nature, Science and PRL. As the community has accepted this behavior, the arxiv’s popularity in physics has increased. I don’t think this kind of behavior exists as much in the life sciences.

    Also, Science Magazine does not accept manuscript that are posted in the arxiv before submission.

  2. That’s one I hadn’t heard before. Are students and postdocs in biology not encouraged to publish? What alternate path do they have to getting jobs, then?

  3. ArXiv sends an email with papers listed in order of submission? That seems absurd, and trivially easy to exploit (as is being done, apparently). Why not randomize the order for each recipient, ensuring visibility for all the papers, or at least a single random order to remove the time advantage? The latter helps ensure that the community is all talking about the same subjects, at the cost of disproportionate visibility. Maybe a good compromise is to send different randomized orders to each institution.

  4. I think part of it is that physicists are more computer literate than biologists… or, at least, they were in the 1990s when arXiv.org (then xxx.lanl.gov) became a fixture. Remember that particle physicists invented the web.

  5. In order to understand the success of the arXiv, I think we have to look at the norms that existed before the arXiv. Particularly in theoretical physics, physicists used to send paper preprints to their colleagues by mail for commentary. When it began, arXiv was just a way to do this electronically in a way that obviously required far less effort on the part of the authors. It is therefore no surprise that it took off. The real question is why did physicists have the preprint sharing culture to begin with, as opposed to other fields of science where this is not common?

    In high energy physics, there are essentially two types of work. Either you are involved in a massive experimental collaboration involving dozens of authors or you are a theorist. In either case, there is little chance of being scooped. If you are an experimentalist, chances are that your experiment is the only one of its type in the world because the equipment costs millions if not billions of dollars. Sure, there may be local competition between say CMS and ATLAS at CERN, but at the end of the day publications from both groups are going to end up in high-impact journals. On the other hand, if you are a theorist then there is no data for anyone else to steal. There might be numerics from computer simulations, but your competitors won’t typically have access to your code, so they can’t scoop you that way. In fact, the very fact that you have sent your preprint to colleagues is a way of establishing precedence, since you can send it to them before the manuscript is polished enough for journal publication, and you can hope that they will back up your claim to independent discovery if someone else publishes the same idea before your paper is ready.

    To sum up, I think it is the dominance of big expensive experiments and theory in high energy physics that explains the preprint culture. This also works for other theory dominated fields, but not so much for things like experimental atomic physics or condensed matter, wherein many experiments are more readily reproducible by multiple groups. However, the uptake of the arXiv was in fact much slower in these fields, and in condensed matter in particular. These fields had the example of HEP to look towards, and the advantages that the arXiv brought to it would have been much more obvious to them than to scientists in unrelated fields like biology and medicine.

    Although this is my main theory, I think there are other effects going on as well. For example, in medicine, peer review is seen as much more of a gold standard of correctness than it is in physics. Many doctors that I have spoken to (and I have spoken to many because I have a chronic illness) feel that it is unethical to release results prior to peer review because members of the public may take the results out of context and start treating themselves in a dangerous way. Waiting for peer review, and the associated carefully managed press releases and commentaries, is seen as a way of ensuring that patients are given reliable information. There is a certain truth to this argument. I have seen first hand that members of some patient support groups are willing to jump at the first sign of anything that might improve their condition, however speculative. There is definitely a need for the medical community to have some degree of control over the flow of information. Whether holding out everything until after peer review is the best way of doing this is debatable, but I can understand their reasons for doing so.

  6. Couple of corrections: being scooped is not the same as idea theft, rather it is about not being recognized as the first to claim a certain achievement. Secondly, LK is a cosmologist and popular book writer who has, among other things, very strong opinions about much of high energy theory. His recent dust up with philosophers, one of many, can be safely categorized as a personal quirk, not reflecting on anybody else.

    As for the main issue, in my field sending a preprint to the ArXiv is exactly what counts to establish priority as far as the community of scientists is concerned. So, the main concern expressed in the initial quote is simply not there.

  7. I agree with Matt Leifer’s comment above and would just add the following. In math almost all papers are posted on arxiv before being submitted, and being first on arxiv counts for purposes of priority. If you have good ideas in your paper that can lead relatively quickly to other papers, chances are good that other people who aren’t as deep in your work as you are won’t be able to do it as quickly as you. If you’re writing about something hot, where a lot of strong people are interested and likely to jump, you may write more papers and develop your ideas as far as you can before posting, so someone who’s faster doesn’t scoop you.

    So in summary people post almost everything on arxiv, but if they have something very good in a hot area, then fear of being scooped can kick in and they may not do it.

  8. To sum up, I think it is the dominance of big expensive experiments and theory in high energy physics that explains the preprint culture. This also works for other theory dominated fields, but not so much for things like experimental atomic physics or condensed matter, wherein many experiments are more readily reproducible by multiple groups. However, the uptake of the arXiv was in fact much slower in these fields, and in condensed matter in particular.

    I’m not sure that’s really a fear of getting scooped. AMO physics has been slow to adopt the arXiv as well– when I run across an interesting AMO paper, there’s at best a 50/50 chance of finding it on the arXiv– but I think there’s enough difference between labs that it’s not a trivial matter to reproduce someone else’s work. There’s also a fair amount of compartmentalization– certain groups work on certain types of problems, and there’s not that much direct head-to-head competition.

    I think the slow adoption isn’t so much a fear of being scooped as a lack of a critical mass posting stuff to the arxiv in the first place. There’s really very little overlap between AMO and high energy physics, so the norms in the two fields have been very different. The big names in the AMO field have never really had any incentive to post stuff there, and as long as they don’t, there’s no need for anybody else to do it, either. I think a big part of the arXiv’s spread in high-energy had to do with its early adoption by key players, so if you wanted to keep abreast of what they were doing, you needed to follow the arXiv, and if you’re following it, you might as well put your own stuff there.

    As for the main issue, in my field sending a preprint to the ArXiv is exactly what counts to establish priority as far as the community of scientists is concerned. So, the main concern expressed in the initial quote is simply not there.

    Right, but my point is that that’s a social convention. That is, arXiv posting is what counts to establish priority because high energy physicists have decided that it’s what will count to establish priority. There’s no a priori reason why biologists couldn’t adopt a similar convention, either for the arXiv or for some other similar service. The question is why they haven’t.

  9. I think the social convention is sufficient for HEP because there are no financial, legal or other constraints which are more common in more applied fields. Biologists, at least some of them, stand to win or lose large sums of money based on decisions of priority. This might be part of the explanation why their system of assigning priority is more formalized.

  10. How much a physics community is there? Biology is not a single community. At Yale there are 13 departments/programs in the biological sciences while there is one physics department and one astronomy department. Different disciplines of biology at major research universities compete against one another. That may play a part.

  11. How much a physics community is there? Biology is not a single community. At Yale there are 13 departments/programs in the biological sciences while there is one physics department and one astronomy department. Different disciplines of biology at major research universities compete against one another. That may play a part.

    I have very limited knowledge of the internal operation of large universities. What I’ve been told by colleagues at those places is that while there may be a single physics department, there is often strong competition between different subfields within the department, over things like distribution of new hires among fields and allocation of departmental resources that can make it work rather like several separate departments.

    In terms of a larger community, there’s a good deal of separation among the subfields of physics, as you can see from the discussion of arXiv practices– in much of high energy physics, the posting of a paper on the arXiv is the most essential step in the process of publication, while in AMO physics, arXiv posting remains entirely optional. Different subfields within the general category of physics have very different research norms and publication practices, and can have very little contact between them. There isn’t exactly cut-throat competition for resources, in large part because a lot of AMO budgets fit in the rounding error of particle astrophysics proposals. But there’s certainly some rivalry there.

  12. I’ve said this before, but it’s worth repeating: Physics, alone of all fields, went through a period where the overwhelming majority of scientists in the field in a major country were working on a single project. I refer, of course, to the Manhattan Project. The tendency toward large groups that Matt refers to above is one of the products of this era. Since people were routinely sharing preprints with large groups of co-authors, it was no great stretch to send copies to colleagues in the subfield who were outside your collaboration. The arXiv was a matter of developing an electronic equivalent of the paper preprint. It made things even more democratic, because now anybody with an internet connection could read preprints, not just the well-connected.

    It’s not as if physics doesn’t have the potential for financial conflicts of interest. Not to the extent that biomedical sciences do, but there certainly are plenty of industrial physicists out there, particularly in areas like AMO.

  13. Well, it did catch on in the social sciences, see http://www.ssrn.com/ Pretty much the same thing as the arxiv. My understanding is that it’s going well. (I have a paper there on the server, it works pretty much the same as the arxiv.)

    The life sciences are difficult territory because there’s lots of money involved and also a very high responsibility. I mean, think of the “OPERA anomaly”, may it rest in peace. That misinformation, imo, was bad enough, but at least one can’t say that it was a piece of information relevant to many people’s life. But consider you have a similar “anomaly,” with all the press attention, in the life science, say, texting while driving is good for your brain development. Such information really shoudn’t be publicly distributed before it’s been sifted and sorted by peer review and editing.

  14. Ah, btw, I note you’ve had a major update of this website. It’s a huge improvement! The photo in the header is somewhat odd though, it looks like cropped out of a 50s horror movie ;o)

  15. I wonder if part of it is that particle theory in particular is a relatively small and perhaps more clubby field than many areas of science. In that case it might be difficult to “scoop” someone without incurring the opprobrium of the community, since everyone is more or less aware of what each other is doing.

    E.g. “I hear that Jones just submitted a paper on supercalifragilisticexpialidocious models of supergravity. That dolt should come up with his own ideas. Everyone knows that Smith has been working on those.”

  16. Grant structure?

    The leaders in preprint activity prior to the arXiv were theoreticians and mathematicians. Grants weren’t critical to them. Grants were critical to HEP experimentalists but on such a massive scale that the NSF Program Officers could keep tabs on the entire field.

    But in biology, particularly medicinal biology, everyone needs grants. NIH has been cutting back, which makes everyone twitchy about whether they’ll get or keep grants. Preliminary data is key to a successful R1 application. It’s no wonder that PIs are reluctant to share it.

  17. Just to follow up on jim’s comment, it’s even more crude than that: NIH will only allow peer-reviewed papers to count in biosketches and as ‘citable’ literature (i.e., not preliminary data).

  18. It’s ironic that one of the most famous scoops I’ve heard of was in physics, the discovery of the neutron. Word had gotten out that Curie’s lab had discovered a new particle, but they didn’t recognize what they had discovered. Chadwick and his team realized that the new particle was neutral and spent the next week racing to duplicate the experiment and publish, hoping that no one at Curie’s lab would realize the particle’s neutral charge before they were scooped.

    Of course, the big breakthrough was recognizing that the particle was indeed neutral.

    I always thought that the biologists and chemists were more secretive because of the long tradition of industrial secrecy in industrial labs which dated back to the middle ages. Since a lot of money could be at stake, the default was to keep things hush hush, announcing only when most advantageous. Physicists were more likely to be publicly funded, so they’d announce all kinds of neat stuff in hopes of drawing patronage as the patrons would in turn draw distinction from the research they funded. The whole process encouraged open and early disclosure.

  19. Chad wrote, “That is, arXiv posting is what counts to establish priority because high energy physicists have decided that it’s what will count to establish priority. There’s no a priori reason why biologists couldn’t adopt a similar convention, either for the arXiv or for some other similar service. The question is why they haven’t.”

    But the problem is that when there is already a firmly established convention, it is difficult to adopt a new convention. If you are in charge of hiring or funding, it seems safer to choose someone who published in Nature or Science than to choose someone who posted a result on arXiv (or something equivalent) even if it was posted significantly earlier. If you are a researcher who has a result to publish, where is an incentive to post your result on arXiv (or something equivalent) if (A) you are not likely to get a credit for it and (B) your rival group may steal your idea and publish it on Nature? It is remarkable that physics community was able to make a smooth transition.

    Also, there definitely is a cultural difference in physics and life sciences. For some reasons, it is more important to publish papers in glamour journals in life sciences than in physics. You do see some high impact physics papers in Nature and Science, but they are more often published in specialized journals. Harold Varmus and others proposed something like arXiv in life sciences, but met a huge resistance. They eventually came up with PLoS journals, which are open-access, but still just peer-reviewed journals.

    As a biologist with a background in physics, I personally like the arXiv model because I think peer-review is far from perfect, I think that a value of a paper shouldn’t be judged in which journal it was published, and arXiv allows faster and more open communications. But the change in the culture is not going to happen soon.

  20. As one of the more elderly folks around here, I’ll merely point out that sharing papers before publication was COMMON in physics, sent by snail mail all over the place. (And when that snail mail went by boat overseas as 4th class mail, it really was at a snail’s pace.) It was natural for e-mail to replace this mechanism, and equally natural to adopt an ftp server as a central storage location.

    It is also no accident that it was started at LANL (on xxx.lanl.gov so it would not be confused with the main http://www.lanl.gov site). LANL, like other national and university labs, had been distributing numbered official preprints and reports for decades. The fact that these were numbered and dated in conjunction with this official distribution process probably contributed to protecting your priority on an idea, just as notarized lab notebooks do.

  21. Something similar to what Kaleberg describes happened with the J/psi. If Ting had not arrived at SLAC when he did, in the day between firm discovery and the paper being mailed to PRL on Long Island, he would have been scooped.

    The history of that week is quite interesting.

Comments are closed.