On Not Talking, for the Right Reasons

Over at Backreaction, Bee has a nice piece on our current age of virality. Toward the end, she discusses some of the ways this applies to science, specifically a quote from this Nature article about collaborative efforts to measure “big G”, and a story about a Chinese initiative to encourage collaboration. She writes of the latter, “Essentially, it seems, they’re giving out salary increases for scientists to think the same as their colleagues.”

And I agree that this can be a problem– there’s a famous paper I can never find looking at the evolution of the accepted value of some physical constant over time, showing that it changed by many standard deviations from the initial measurement, but never all that much in a single step. This strongly suggests that experimenters measuring the value had an idea of the “right” answer, and may have (probably unconsciously) biased their own results in that direction. This sort of thing is why the practice of “blinded” data analysis has become more widespread in recent years (though the steady increase in computing power over that same period has no doubt helped, by making this much easier to arrange). Too much collaboration between groups can have a similar effect– if there are only one or two measurements out there, it’s hard to have confidence that there isn’t some subtle systematic bias that’s confusing things.

At the same time, though, there are major advantages to pooling resources for difficult questions. One of the big stories in AMO physics the last few years has been the dramatic improvement in the search for an electric dipole moment of the electron driven by the ACME collaboration between Yale and Harvard. They haven’t found an EDM, but they’ve found that there’s nothing to find with precision an order of magnitude better than the previous experiments, within the span of about five years. This came about because the principals of the collaboration– Dave DeMille at Yale, and John Doyle and Jerry Gabrielse at Harvard– realized that they were all independently working on a really hard problem, and could do a lot better if they combined their efforts.

Now, would it be a good thing for science if they also brought in Ed Hinds and Norval Fortson and Eric Cornell? Probably not– there might be some additional efficiency gains, but you would lose out on the sanity check of having some competing measurements out there using different systems. But I think that the net effect of their collaboration to this point has been very positive.

So, there’s a balance to be struck: it’s important to have enough collaboration to smooth the path to difficult measurements, but enough competition that you can trust the results. You don’t want everybody to be talking, but at the same time, you want people to be not-talking for the right reasons.

And I think the big-G collaboration described by Nature is trying to do this the right way:

A big item on the agenda at the meeting will be debating how to choose a small number of these experiments to be conducted by members of a consortium, this time with an unprecedented level of oversight. Each experiment will be repeated by two independent groups, using identical sets of equipment created and tested at a third institution. While the experiments are going on — and there is still time to fix them — experts from outside those two groups will hunt for errors. In the past, says [Terry] Quinn [former director of the International Bureau of Weights and Measures (BIPM) in Paris], who is a driving force behind the NIST meeting, scientists have picked holes in each other’s experiments only after they were published, making it difficult to verify whether those problems were really the cause of an error.

I think this sounds like an appropriate balance. These big-G measurements are extraordinarily difficult to do, and because each experiment is its own self-contained thing, they rely very much on local tacit knowledge of how their systems work, which can allow subtle errors to creep in. More consistent techniques, and integrated data analysis and error checking is a good step toward fixing the disagreement in measurements, which is a bit of an embarrassment for people who care about such things.

And, of course, some more talking might help avoid the occasional disaster. After all, the root cause of the recent BICEP2 controversy is that the BICEP2 analysis of their data required them to figure out the contribution of dust to the signal they were seeing, and the information they needed to do that properly was held by the Planck collaboration, their direct competitors. BICEP2 did their analysis with incomplete information– apparently including estimates from values presented at a conference by Planck scientists– and ended up thinking their values were a lot more solid than they apparently are, once all the data from Planck are properly incorporated. Which is kind of embarrassing, and could’ve been headed off by a little more communication. And, in fact, the plan calls for the two experiments to collaborate on the next measurement, for just this sort of reason.

So, you know, there’s a fine line between talking too much, and not talking enough. Figuring out exactly where that falls is a problem much too hard for mere physicists, and probably can only be done in retrospect, many years down the road.

9 thoughts on “On Not Talking, for the Right Reasons

  1. I think I’ve seen at least presented in a talk the evolution of our best measurement of a physical parameter. If memory serves, it was the lifetime of the neutron. As I remember it, though, it wasn’t just a slow evolution, but a punctuated equilibrium. Every so often, somebody would have a new value that was outside the earlier error bars, and then everybody would agree with that for a while.

    Here’s one possible reference:

    http://www.quantumdiaries.org/2009/06/08/history-of-measurement/

  2. What I remember is a much longer paper, possibly on the arxiv, showing several such graphs and talking about the process and so on. I probably have a copy somewhere, but can I find it?

  3. the ACME collaboration between Yale and Harvard

    Did they have problems with dissolving and/or exploding vacuum gaskets, or other, more spectacular failures?

    One of my colleagues was associated with a proposal for a project with the acronym ACME. At least in this case, the lead on the proposal was an immigrant who knew the dictionary definition of the word, but not its cultural association with Wile E. Coyote. (The Harvard/Yale team have names that sound American, so they should have been aware of that cultural history, especially given those universities’ reputation as home to many Super Geniuses.) That project was not funded, and neither was the revised version with a different acronym.

    To the main point, lack of communication is a definite problem in my field. We are supposed to make our data publicly available, but there are many ways to screw up the data analysis, and I have seen several. When I have spotted such issues as a paper referee, I have recommended rejection (along with a suggestion that they collaborate with somebody who actually knows that data set). Sometimes, however, the paper slips through.

  4. Did they have problems with dissolving and/or exploding vacuum gaskets, or other, more spectacular failures?

    Yes, which is why they formed the collaboration…

    (DeMille’s group was originally working on an EDM search with lead oxide, which involved all sorts of unfortunate inadvertent hot metal chemistry. This was down the hall from my lab when I was a post-doc.)

  5. Perhaps you are thinking of Feynman’s claim that the value of the charge on the electron had been swayed by Millikan’s initial experiments. It is reported in this New Yorker article:

    http://www.newyorker.com/news/news-desk/more-thoughts-on-the-decline-effect

    Here’s the relevant quotation:

    ———————————————–
    In his 1974 commencement address at Caltech, Richard Feynman described why the initial measurement was off, and why it took so long to fix:

    Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It’s a little bit off, because he had the incorrect value for the viscosity of air. It’s interesting to look at the history of measurements of the charge of the electron, after Millikan. If you plot them as a function of time, you find that one is a little bigger than Millikan’s, and the next one’s a little bit bigger than that, and the next one’s a little bit bigger than that, until finally they settle down to a number which is higher.

    Why didn’t they discover that the new number was higher right away? It’s a thing that scientists are ashamed of—this history—because it’s apparent that people did things like this: When they got a number that was too high above Millikan’s, they thought something must be wrong—and they would look for and find a reason why something might be wrong. When they got a number closer to Millikan’s value they didn’t look so hard. And so they eliminated the numbers that were too far off, and did other things like that.

    —————————————-

    By the way, the BICEP2 controversy could have been avoided if no one wanted to claim precedence in the race to first results on the microwave background. As long as some people want to be first, there will always be a lack of cooperation.

  6. The dreaded “decline effect”! Jonah Who-Shall-Not-Be-Named!

    I had heard the Feynman thing, but I’m thinking of a history-of-physics article about the changes in these things over time. I’ll dig around for it later.

    I agree that the desire for precedence helped create the BICEP2 thing, and they undoubtedly rushed a bit because of that. But to be fair, it affects both sides of the controversy– that is, the reason the BICEP2 folks were using preliminary numbers from a conference talk to estimate the dust effect is that the Planck team also wanted to be first, and kept their data to themselves.

  7. That is an interesting article! When I hear about that subject, I always think of the page in the Particle Data Group showing the change in various quantities over time, the one linked from the Quantum Diaries @#1.

    I particularly like the ones that look like a damped oscillation.

Comments are closed.