Classic Edition: Needles in Haystacks Are Easy

The fourth and final post in my 2003 series attempting to explain experimental particle physics to the lay reader. This one talks about the specifics of the “pentaquark” experiment that was announced that year, and provided the inspiration for the whole thing.

It should be noted that that discovery is by no means certain, but I’m still fairly happy with the explanatory aspects of these posts. I’m certainly not bothered enough to re-write them.

So feel free to ignore pentaquark-specific comments in these reposts. If you’d like a more recent experimental hook for this, Tommaso Dorigo has you covered.

So what’s the deal with the graph at the bottom of this page? The short answer is, it’s a graph of the number of particles detected versus the energy of the collision, and the narrow peak highlighted in red indicates the new “pentaquark” particle. It’s position horizontally tells you the mass of the particle (how much energy you needed to make it), while the height tells you how likely it is to be made (how many collisions produced that particle).

To unpack that a little, I need to sketch a bit of how the data analysis process actually works. The raw data from these experiments comes in gigantic chunks– the CLAS detector generates something like a terabyte of data per day, giving the results of thousands or millions of collisions. This provides a gigantic metaphorical haystack which the experimenters sift for needles. (It’s a slight simplification to think of the data as lists of particles and energies, but not a damaging one (it’s a lie-to-children, as it were).)

If you have a particular reaction or particle you want to study, the first step is to identify the products– what sort of particles will be detected after the reaction takes place or the particle decays. Then you write a computer program to sift through the massive piles of data to find collisions with those results.

To take a concrete example, if you wanted to study neutral pion production, you might ask the computer to pull out those collisions where two protons collided, and two protons were detected. That will whittle your big dataset down from, say, a million events to, say, ten thousand (numbers are fictitious). You can then comb through those ten thousand (again, with a computer program) to figure out how many of them are really what you’re after– look for cases where the energy loss is enough to account for a neutral pion, say, or cases where there were also two photons detected. That lets you eliminate collisions where the colliding protons just glanced off each other without producing anything else, and also cases where they produced other particles– two pions instead of one, say, or something more exotic.

That narrows it down to, say, five thousand events, which then need to be checked to verify that they’re really the right reactions– making sure that the protons detected are really protons from the same reaction, and not stray particles from some other collision, and the photons were produced in the target region, and aren’t just stray cosmic rays. By the time you do all that sorting out, you’re left with a few thousand events, that you can plot up, and use to determine the mass of the pion and the cross-section for pion production (“cross-section” being physics-speak for “likelihood of a given reaction”) as described above.

This process is relatively simple for reactions like pion production, and you end up with gobs of data. The more complicated the reaction, and the more exotic the particles, the less likely you are to find things, and the more data you have to sift to find what you’re looking for. Particle physicists have to be very clever programmers, and sharp statisticians to tease meaningful results out of the vast masses of data they collect.

In the case of the pentaquark, the actual reaction is very complex, and is represented by the Feynman diagram in the other figure on the Ohio page. (Feynman diagrams probably deserve an entry of their own. Some other time.) Summarized in words, a high-energy photon strikes a neutron in a deuterium nucleus, creating a bunch of quarks out of thick vacuum, which arrange themselves into the pentaquark particle and a negative kaon. The pentaquark falls apart into a neutron and a positive kaon, while the negative kaon bangs into the proton on its way out of the nucleus, and all four particles move off to the detectors.

To find these events, they look for all four particles being produced in a single reaction: a proton, a neutron, and a pair of kaons. Plotting those events gives you the graph mentioned above. The narrow feature indicates the real reaction of interest– cases where the photon energy was just right to produce the required quarks, and they happened to pair off in the right way. The broader peak is just background noise– a collection of events where the proton, neutron, and kaons were produced by some other process.

To find this peak, they sifted through the entire Jefferson Lab dataset– running to a few billion collisions. Out of all that data, they found thirty collisions where a pentaquark was produced. That gives you some idea just how rare this particle is, and explains why it took thirty-odd years to find it.