There’s a minor scandal in fundamental physics that doesn’t get talked about much, and it has to do with the very first fundamental force discovered, gravity. The scandal is the value of Newton’s gravitational constant G, which is the least well known of the fundamental constants, with a value of 6.674 28(67) x 10-11 m3 kg-1 s-2. That may seem pretty precise, but the uncertainty (the two digits in parentheses) is scandalously large when compared to something like Planck’s constant at 6.626 068 96(33) x 10-34 J s. (You can look up the official values of your favorite fundamental constants at this handy page from NIST, by the way…)
To make matters worse, recent measurements of G don’t necessarily agree with each other. In fact, as reported in Nature, the most recent measurement, available in this arxiv preprint, disagrees with the best previous measurement by a whopping ten standard deviations, which is the sort of result you should never, ever see.
This obviously demands some explanation, so:
What’s the deal with this? I mean, how hard can it be to measure gravity? You drop something, it falls, there’s gravity. It’s easy to detect the effect of the Earth’s gravitational pull, but that’s just because the Earth has a gigantic mass, making the force pretty substantial. If you want to know the precise strength of gravity, though, which is what G characterizes, you need to look at the force between two smaller masses, and that’s really difficult to measure.
Why? I mean, why can’t you just use the Earth, and measure a big force? If you want to know the force of gravity to a few parts per million, you would need to know the mass of the Earth to better than a few parts per million, and we don’t know that. A good measurement of G requires you to use test masses whose values you know extremely well, and that means working with smaller masses. Which means really tiny forces– the force between two 1 kg masses separated by 10 cm is 6.6 x 10-9 N, or about the weight of a single cell.
OK, I admit, that’s a bit tricky. So how do they do it? There are four papers cited in the Nature news article. I’ll say a little bit about each of them, and how they figure into this story.
The oldest measurement cited by Nature is the torsion balance measurement from 2000 by the Eöt-Wash group at the University of Washington. This is an extremely refined version of the traditional method of measuring G first developed by Henry Cavendish in the late 1700’s.
Let’s assume I’m too lazy to follow that link, and summarize in this post, mmmkay? OK. Cavendish’s method used a “torsion pendulum,” which is a barbell-shaped mass hung at the end of a very fine wire, as seen at right. You put two test masses near the ends of the barbell, and they attract the barbell, causing the wire to twist. The amount of twist you get depends on the force, so by measuring the twist of the wire for different test masses and different separations, you can measure the strength of gravity and its dependence on distance.
Sounds straightforward enough. It is, in concept. Of course, given the absurdly tiny size of the forces involved, it’s a really fiddly measurement to do. Cavendish himself set the apparatus up inside a sealed room, and then read the twist off from outside, using a telescope. If he was in the room looking at the apparatus, the air currents created by his presence were enough to throw things off.
This remained the standard technique for G measurements for about two centuries, though, because it’s damnably difficult to do better. And the Eöt-Wash group’s version is really astounding.
So, how did they do better? One of the biggest sources of error in the experiment comes from the twisting of the wire. In an ideal world, the response of the wire would be linear– that is, if you double the force, you double the twist. In the real world, though, that’s not a very good assumption, and that makes the force measurement really tricky if the wire twists at all.
The great refinement introduced by the Eöt-Wash group was to not allow the wire to twist. They mounted their pendulum, shown at left, on a turntable, and made small rotations of the mount as the wire started to twist, to prevent the twist from becoming big. Their force measurement was then determined by how much they had to rotate the turntable to compensate for the gravitational force causing a twist of the wire.
They also mounted the attracting masses on a turntable, and rotated it in the opposite direction around the pendulum, to avoid any systematic problems caused by the test masses or their positioning. Their signal was thus an oscillating correction signal, as each test mass passed by their pendulum, and they recorded data for a really long time: their paper reports on six datasets, each containing three days worth of data acquisition.
The value they got was 6.674 215 6 ± 0.000092 x 10-11 m3 kg-1 s-2, far and away the best measurement done to that point.
So what are the other papers? The second one, in chrononogical order, is a Phys. Rev. D paper from a group in Switzerland, who used a beam balance to make their measurement. They had two identical test masses hung from fine wires, and they alternately weighed each mass while moving enormous “field masses” weighing several metric tons each into different positions, as shown in the figure at right. In the “T” configuration, the upper test mass should appear heavier than the lower test mass, as the large field masses between them pull one down and the other up. In the “A” configuration, the upper test mass should be lighter, as the field masses pull it up while pulling the lower mass down.
This was another experiment with very long data taking, including this wonderfully deadpan description:
The equipment was fully automated. Measurements lasting up to 7 weeks were essentially unattended. The experiment was controlled from our Zurich office via the internet with data transfer occurring once a day.
Their value, 6.674â252(109)(54) x 10-11 m3 kg-1 s-2 is in good agreement with the Eöt-Wash group’s result.
If it agrees, why even mention it? It’s an important piece of the story, because it’s a radically different technique, giving the same answer. It’s extremely unlikely that these would accidentally come out to be the same, because the systematic effects they have to contend with are so very different.
Yeah, great. Get to the disagreement. OK, OK. The third measurement, in this PRL by a group in China, uses a pendulum again, but a different measurement technique. They used a rectangular quartz block as their pendulum, suspended by the center, with test masses outside the pendulum. They place these test masses in one of two positions: near the ends of the pendulum when it was at rest (shown in the figure), or far away from the ends (where the “counterbalancing rings” are in the figure).
The gravitational attraction of the masses in the near configuration makes the pendulum twist at a slightly different rate than in the far configuration, and that’s what they measured. The oscillation period was almost ten minutes, and the difference between the two was around a third of a second, which gives you some idea of how small an effect you get.
Their value was 6.673â49(18) x 10-11 m3 kg-1 s-2, which is a significantly larger uncertainty than the other two, but even with that, doesn’t agree with them. Which is kind of a problem.
So, how do you deal with that? Well, they obviously had a little trouble getting the paper through peer review– it says it was first submitted in 2006, but not published until 2009. That probably means they needed to go back and re-check a bunch of their analysis to satisfy the referees that they’d done everything correctly. After that, though, all you can do is put the result out there, and see what other people can make of it.
Which brings us to the final paper? Exactly. This is an arxiv preprint, and thus isn’t officially in print yet, but it has been accepted by Physical Review Letters.
They use yet another completely different technique, this one employing free-hanging masses whose position they measure directly using a laser interferometer. They also have two configurations, one with a bunch of source masses between the two hanging masses, the other with the source masses outside the hanging masses. The gravitational attraction of the 120kg source masses should pull the hanging masses either slightly closer together, or slightly farther apart, depending on the configuration, and this change of position is what they measure.
Their value is 6.672 34 ± 0.000 14 x 10-11 m3 kg-1 s-2, which has nice small error bars– only the Eöt-Wash result is better in that regard– but is way, way off from the other values. Like, ten times the uncertainty off from the other values. There’s no obvious reason why this would be the case, though. If anything, the experiment is simpler in concept than any of the others, so you would expect it to be easier to understand. There aren’t any really glaring flaws in the procedure, though (it never would’ve been accepted otherwise), so this presents a problem.
So, now what? Well, in the short term, this probably means that the CODATA value for G (the official, approved number used by international physics) will need to be revised to increase the uncertainty. This is kind of embarrassing for metrology, but has happened before– a past disagreement of this type is one of the things that prompted the original Eöt-Wash measurements.
In the medium to long term, you can bet that every group with a bright idea about how to measure G is tooling up to make another run at it. This sort of conflict, like any other problem in physics, will ultimately need to be resolved by new data.
Happily, these experiments cost millions of dollars (or less), not billions, so we can hope for multiple new measurements with different techniques to resolve the discrepancy. It’ll take a good long while, though, given how slowly data comes in for these types of experiment, which will give lots of people time to come up with new theories of what’s really going on here.
Gundlach, J., & Merkowitz, S. (2000). Measurement of Newton’s Constant Using a Torsion Balance with Angular Acceleration Feedback Physical Review Letters, 85 (14), 2869-2872 DOI: 10.1103/PhysRevLett.85.2869
Schlamminger, S., Holzschuh, E., Kündig, W., Nolting, F., Pixley, R., Schurr, J., & Straumann, U. (2006). Measurement of Newton’s gravitational constant Physical Review D, 74 (8) DOI: 10.1103/PhysRevD.74.082001
Luo, J., Liu, Q., Tu, L., Shao, C., Liu, L., Yang, S., Li, Q., & Zhang, Y. (2009). Determination of the Newtonian Gravitational Constant G with Time-of-Swing Method Physical Review Letters, 102 (24) DOI: 10.1103/PhysRevLett.102.240801
Harold V. Parks, & James E. Faller (2010). A Simple Pendulum Determination of the Gravitational Constant Physical Review Letters (accepted) arXiv: 1008.3203v2
Nice writeup, and I’m ashamed to nitpick about grammar/spelling, but I believe you may have misspelled the word “Thang” in the title.
Ha! It’s nice to see you physicists have trouble too.
It looks as if different methods give different results (suggesting they all have different biases). How will you lot know which one is correct?
Also, would altitude make a difference?
What usually happens in this sort of situation is that a few measurements using different techniques will turn out to agree with each other reasonably well, and that will come to be accepted as the “real” value. The data from all of these will be re-analyzed, and eventually somebody will find a plausible systematic effect that could’ve thrown the results off.
If I had to guess, I’d say that the Eot-Wash result is probably closer to the final value than the newer measurements. The period measurement seems to invite exactly the sort of weird twisting-wire effects that the original turntable measurements were designed to avoid, and the hanging-pendulum thing is, as far as I know, a very new technique, and the most likely to have some subtle problem that hasn’t been noticed yet because people haven’t been banging on it for decades like the torsion pendulum. That’s just a guess, though.
Why do Eöt-Wash give “6.6742156” the extra digits “56” if, by their own estimates, the “2” might just as well be “1” or even “3”?
When I was an undergraduate, G was known to only three digits, so this is progress. I wonder about three-day experiments, though. At this stage they should be collecting measurements for several years before reporting.
Re: Nathan Myers
Re: the digits, I think there’s a typo in Chad’s writeup and I think you also lost a digit in your reading.
I think the numbers are:
6.674215
with an error bar of
0.000092
(times the appropriate units).
It’s standard these days to give 2 digits of error bar, and write your number out to the same length, so I don’t think there’s anything weird here.
Re: “I wonder about three-day experiments, though. At this stage they should be collecting measurements for several years before reporting.”
I disagree for two reasons.
First, precision measurement is partially about statistical error (which you can improve by taking data longer) and partially about checking for systematic errors (which don’t improve through averaging). To check for systematics, typically one varies conditions and remeasures things to see how the measured value depends on stuff. To do this well, you may need to take data for almost as long under your “altered” conditions as you do when making them under your “good” conditions. And in most experiments, there are dozens of possible systematics to check. So, a good rule of thumb for a precision measurement experiment is that, when estimating what your statistical error, you allow for a day (or maybe a week) of data taking, so that you can do your systematic checks on the timescale of a couple of years. So I’m guessing that it’s not that these guys are lazy or got bored; it’s that they’re responsible.
Secondly, if you look at their error budget in the paper, the statistical error accounts for 6 ppm error, and their total error is 14 ppm, so taking data for a few years would only make a small improvement to their error, reducing it from 14 to 12 ppm (assuming adding things in quadrature).
If the exact and true value of G is discovered, creating a future time where we say “we should have seen that”, it will be because it stands out in theory or is singularly distinctive in other context.
In the use of log tables, it was often necessary to extrapolate a value between those which were actually listed, by discerning the pattern of change and making a prediction. In multidimensional mappings of the values of fundamental constants expressed as common logs, recognizable as multidimensional log tables, there is one
single, unique concurrence of numerical pattern alignments which occurs for a single value of G :
http://www.outlawmapofphysics.com/NGC.pdf
Thanks, AC, that was helpful. I didn’t lose a digit; .000092 is close to .0001. But I wouldn’t have commented on 6.674215. How should we interpret the values “6.674252(109)(54)” and “6.67349(18)” from the other experiments?
I would have thought that rather than use the torsional modulus of a wire (nonlinearity etc. problems included) and then try to compensate for the nonidealities, more modern approaches to the Cavendish method would have used other sources of extremely small and precise torque.
Such as, to name one, opposing lasers from the same direction as the stationary masses.
The question of “measuring G” leads to a related, deeper (and somewhat “philosophical”) questions: is G is one of those constants with MLT dimensions, like c, that it’s impossible to unambiguously talk about the value of in other than relative terms? Hence, you can’t just say e.g. “if c was ten times greater, then ….” (although George Gamow did what he could in the clever, mass-accessible “Mr. Tompkins in Wonderland.”) If we imagine c different, then other things have to change along with it and it’s hard to see how we could define or tell the difference. (More complicated than “what if everything got bigger,” but still a definition issue.) We can presumably triangulate some relations in simple atomic physics while forgetting the nuclear part, such as being able to know if the relationship between electron mass, e, h, c, etc, have changed.
Yet even though G has units (L^3/MT^2), I think it is rather intuitively clear what it means to say “what if everything was the same, except G was different.” We can IMHO rather easily imagine what physics would be like (most experiments anyone does: no difference, including G = 0 or even negative, as long as G was not imagine much bigger than now) and indeed imagine ways to test G separately. Some folks even thought G might change over long time scales. They must not have imagined an intrinsic contradiction: they conceived and and looked for evidence. (None was found AFAIK.)
It is true we have to imagine mass “staying the same” while G changes, but there are other measures of mass if we accept some triangulation from other features. We can just say, the familiar particles “being the same mass in every other way” as well as constant charge, h, etc. which implies no alteration in atomic physics. Yet I suppose it could be fun to imagine how much you’d need to fiddle to hide an attempted change of G in other ways.
I suspect that G may in fact be only approximately constant and that it’s value does depend weakly on the local composition of matter.
Are there any good reasons to exclude such a possibility?
And yes, I know it demotes GR to an effective theory.
I wonder if you can really do this in the basement:
http://www.fourmilab.ch/gravitation/foobar/
The arXiv abstract for the latest measurement states that the value is in good agreement with the accepted 1986 CODATA value but not, as stated, the latest value.
I guess it’s interesting how an ‘accepted’ value changes with time and highlights how difficult it is to make good uncertainty estimates – the CODATA uncertainties don’t overlap so somebody was being optimistic.
The Eöt-Wash quadrupole balance is error-correcting marvelous in so many ways. Dipole measurements are historically divergent. However, the Eöt-Wash balance has only used a fused silica plate, a low atomic number glass. It would be interesting to repeat the measurement with
1) a single crystal silicon plate of surpassing purity, SOP in the semiconductor industry. Big G pendula might detect magnetic impurity atoms even in low numbers.
2) a high atomic number plate. Pt-Ga,In alloys heat treat to tool steel hardness and stiffness for shaping; Niessing Co., Hoover and Strong Pt SK alloys, Eastern Smelting HTA Pt alloys. Alloys are ~95 wt-% platinum.
3) A space group P3(1)21 single crystal quartz plate. Sawyer Technical Materials LLC grows X-plate quartz to astounding perfection. (Commercial Grade C Z-plate quartz is space group P3(2)21. That is not geometrically interesting, nor is it especially compositionally pure or free of dislocations.)
Given how sensitive these experiments are, there are too many factors at play to have a truly controlled environment.
For example, how would one detect and factor in an anomalous amount of mass (or lack thereof) in one direction? An large and unknown underground cavern 1/2 kilometer underground/away can throw off the experiment if the experiment just happens to be positioned the wrong way.
Perhaps the best way to get uniform data is to perform some of these brilliant experiments in (or immediately outside) of an orbital station.
The arXiv abstract for the latest measurement states that the value is in good agreement with the accepted 1986 CODATA value but not, as stated, the latest value.
The 1986 value had really gigantic error bars, so that’s not terribly surprising. The Eot-Wash value also agrees with the 1986 CODATA value, in a fairly narrow sense.
For example, how would one detect and factor in an anomalous amount of mass (or lack thereof) in one direction? An large and unknown underground cavern 1/2 kilometer underground/away can throw off the experiment if the experiment just happens to be positioned the wrong way.
They know a lot about the local mass distribution. I saw a talk by somebody from the Eot-Wash group that included a picture of their lab. In the middle of the floor a few meters away from the apparatus was a big stack of lead bricks, which he explained were placed there to compensate for the fact that the building they’re in is on the side of a large hill.
It’s a known problem, and something they incorporate into their error analysis.
Nice article, I don’t know much about physics and gravity but had a idea from your post. I have learned few things and will continue reading the site. Bookmarked and will visit daily.
This is just crazy, they should go with approximate gravitation value 😉
Seriously, The fact that there are some differences in measurements means that additional factors are missing from the equation.
I agree with Neil B above. I could never understand how “mass is constant”. There have to be x number of assumptions to get that right. That’s common sense. I don’t see those assumptions being clearly laid out in books and reports!
[Given how sensitive these experiments are, there are too many factors at play to have a truly controlled environment.
For example, how would one detect and factor in an anomalous amount of mass (or lack thereof) in one direction? An large and unknown underground cavern 1/2 kilometer underground/away can throw off the experiment if the experiment just happens to be positioned the wrong way.
Perhaps the best way to get uniform data is to perform some of these brilliant experiments in (or immediately outside) of an orbital station.]
totally agree with you
@Neil @ Nick (18):
Or do all methods in the same lab at the same time. Then they’d all be under the same conditions
Laws of Physics, or Merely Local By-laws?
Written By: Jonathan Vos Post
Date Published: September 9, 2010
http://www.hplusmagazine.com/editors-blog/laws-physics-or-merely-local-laws
@ Anonymous Coward:
The “mispelling” of the word “Thang” is actually a play on words about gravity and Dr. Dre/Snoop Dogg’s rap song “Ain’t Nuthin but a G Thang”. Here the joke is that really the calculating a precise value for the gravitation constant G – is Nuthin But a G Thang.
At some point the results will have to be repeatable with a standardised machine. Hallmark! No there is no progress yet within Codata of going from one value (gravitational constant only) to the next as this is extremely difficult. After Luther-Towler’s result Codata 1986 there was a collapse to 3-4 digits (Codata 1998) due to different results creating a large uncertainty fluctuation. It looks like this may happen again but it will not be as bad. So a new gravitational constant number in Codata is not necessarily a sign of a step forward just yet. Actually I like this trending down back to Luther-Towler as their results may actually be repeatable and machine was simple. Also, just because two different apparatuses get a similar number is not cause to celebrate as this has happened time and again with the history of this number. The other values in Codata such as masses and the fine structure constant etc. are actually progressing along.
Well, we have a couple of other problems to contend with, such as the fact that there are measurable “tide” effects from the Earth, Moon, Sun three body system, which are time varying. Their magnitude is on the order of multiple microGals to tens of microGals. This time varying signal needs to be incorporated in terms of frequency and amplitude to ensure that the long period measurements are actually statistically canceling out these effects, and not biasing the results.
I’m surprised not to see any accounting for the large tidal effects of Jupiter and Mars along with the usual Sun and Moon. As we all know, each time the outer planets are in conjunction all manner of catastrophes occur here on Earth (earthquakes, tsumamis, buildings topplingâ¦). A little basic astrology could easily account for the error in these measurements.