Review and Replication

So, there was this big story in cosmology the other day– Tom Levenson’s write-up is very nice— which has been hailed as one of the greatest discoveries since the last greatest discovery, blah, blah, blah. And now that a few days have passed, we’re starting to see the inevitable backlash, ranging from detailed technical analyses of possible other explanations to more general musings about the nature of peer review. I’m not qualified to evaluate the former, so I’m going to talk a bit about the latter.

The title of that Atlantic post is “‘One of the Greatest Discoveries in the History of Science’ Hasn’t Been Peer-Reviewed—Does It Matter?,” which is probably a good candidate for applying the rule-of-thumb that a question in a headline is always answered “No.” Or at least “Not really.” It wanders around a bit, eventually hitting the most important point, namely that for all it is sometimes fetishized in coverage of science, the modern version of “peer review” is a very recent invention. Einstein famously encountered the modern refereeing system only once, and when he got some criticisms from a referee, he huffily withdrew the paper (and then made changes addressing the criticism before submitting to a different journal).

There’s also a decent case to be made that, in high-energy physics in particular, the peer review system has largely been superseded. What really matters these days is the posting of a result on the arxiv, which occurs before the anonymous-referee formal peer review process (in most cases). In that community especially, you really have lots of “peer review” happening in parallel– there may be a couple of people looking it over for a journal, but there are dozens of others looking it over as a basis for response papers and the like, which is vastly more important in the end.

What ultimately matters, after all is not review but replication– that is, when somebody else tries to do the same experiment, do they get the same results? This is where issues with statistical flukes usually sort themselves out, and that’s not a problem that can be fixed by any amount of refereeing. A journal referee can look for obvious gaps in the analysis, but will never get to the level of detail of a repeat experiment or a follow-up measurement.

Which means that the Atlantic article is sort of in the vicinity of an interesting philosophical question, without actually hitting the right point. That is, the interesting question isn’t “What does it mean that this result wasn’t reviewed?” but “What does it mean when you have results that can’t easily be replicated?” The BICEP2 results don’t really trigger that, because while from all reports it’s a technical tour de force, there are other experiments doing the same basic thing, and thus within a few years there will be independent confirmation or refutation of the result.

The Higgs boson, cited repeatedly in the Atlantic article because anyone writing about physics in 2014 is contractually obligated to mention that goddamn particle, comes closer, because the LHC is really the only game in town for high-energy physics. There’s no other accelerator in the world with comparable energy that can check the results produced at the LHC, which dances up to the line of being a problem. This is part of why the LHC includes multiple detectors, though– while ATLAS and CMS use the same high-energy proton beams, they are independent detectors, run by separate teams of physicists, and so can provide a cross-check on each other’s results. I suspect that, in the spirit of people who invoke Byzantine loopholes in Bell inequality experiments, somebody could come up with a reason to doubt the Higgs detection based on both detectors being located at the LHC, but it would be kind of strained.

More potentially troubling are the things that are very difficult to do, but not quite sexy enough to produce many parallel efforts. Weirdly, the best example of this that comes to mind is on the theory side, with the high-order calculations of the electron g-factor. This is a technically brilliant and highly demanding task, and there seems to be only a small number of people the world working on it, mostly associated with Toichiro Kinoshita at Cornell. The last time I saw a talk on this stuff, a couple of DAMOP meetings ago, the speaker ended with a plea for somebody else to take up the same question, and provide an independent check. There didn’t seem to be any takers, though, because it’s a fantastically complex task, and not all that intrinsically exciting. They do a lot of internal consistency checks, and apply multiple calculation methods, but pretty much all of the results in this area seem to come from one group. That situation is probably a more legitimate philosophical problem than BICEP2.

In the end, as I said, these recent results will all shake out in the normal manner of such things. Other groups will look at other areas of the sky with other telescopes, and either see the same pattern of gravity-wave-induced polarization, or something different. It’ll take a little while, but nothing in this fundamentally challenges the nature of science.

4 comments

  1. A journal referee can look for obvious gaps in the analysis, but will never get to the level of detail of a repeat experiment or a follow-up measurement.

    This is why the peer review process doesn’t detect outright fraud: the reviewer must presume that the authors performed the described experiment(s) and obtained the described result(s). Here, too, I think the arXiv will do a better job: with more people looking at a paper, there are more chances to spot things like the figure duplication that led to Jan-Hendrik Schön’s downfall (according to a book I’ve read on the subject, a Bell Labs postdoc trying to follow up on some of Schön’s work happened to notice that a figure in one paper had the same curve, including “noise”, as a figure in a different paper which purported to show the results of a completely different experiment).

  2. Physics parameterizes theory to observation (parity violations, symmetry breakings, chiral anomalies, Chern-Simons repair of Einstein-Hilbert action). Quantum gravitation and SUSY are empirically sterile, but they must be true. Euclid was the geometry, cartography knew better.

    Postulates are postulated because they cannot be defended. Physics still has a weak postulate, as Newton had two (GR, QM). ECKS gravitation says “chiral spacetime torsion.” Trace spacetime chiral anisotropy, Noether, defective angular momentum conservation, Milgrom acceleration not dark matter. Contrast visually and chemically identical, single crystal test masses in enantiomorphic space groups, e.g., left- and right-handed alpha-quartz in a geometric Eötvös experiment.

    Postulates are corrected external to their axiomatic systems. Physics is rigorously derived. Looking contrary is silly. You cannot fix it until you know how it is broken.

    WMAP: 13.75 Gyr; 72% dark energy, 23% dark matter, 4.6% baryonic matter.
    Planck: 13.82 Gyr; 68.3% dark energy, 26.8% dark matter, 4.9% baryonic matter.
    http://arxiv.org/abs/1306.5534 Dark matter inside Saturn’s orbit is less than 1.7×10^(-10) M_solar. Parameterize that.

  3. Chad — thanks for the props at top. Good stuff on replication. The larger area of difficulty with this is in the biomedical realm, I think. But that’s for another day.

  4. I agree that the problem is worse for biomedical research, both because it’s so much harder to muster real statistical power, and because the consequences are much worse. If a bunch of theoretical physicists go chasing off after a statistical artifact, well, that’s a month or two that they weren’t messing up the global financial system (we kid because we love…). But when spurious medical studies catch on, they can drive terrible life decisions with real public health consequences.

Comments are closed.