# Thread: scientists delusions and 5 sigma

1. Established Member
Join Date
Jan 2010
Location
Wisconsin USA
Posts
2,639

## scientists delusions and 5 sigma

I was reading the final article, https://phys.org/news/2018-07-higgs-boson-quarks.html which I think is awesome, but they have this thing in there which states its significance was 4.9 sigma. Are scientists ignorant or delusional? Is it not common knowledge that 5 sigma events can occur frequently, but not be significant, if one collects enough data points. Isn't there a better way than just saying 5 sigma?

2. Member
Join Date
May 2018
Posts
19
Originally Posted by Copernicus
I was reading the final article, https://phys.org/news/2018-07-higgs-boson-quarks.html which I think is awesome, but they have this thing in there which states its significance was 4.9 sigma. Are scientists ignorant or delusional? Is it not common knowledge that 5 sigma events can occur frequently, but not be significant, if one collects enough data points. Isn't there a better way than just saying 5 sigma?
It's hard to tell from the relatively brief description, but my impression is that the significant is that of the entire data set, not a single observation.

This is a very typical way to do an experiment. There is an hypothesis, that some quantity is equal to an hypothesised value. Then an experiment is conducted, with many observations. The value of the quantity is estimated experimentally, and is never exactly equal to the hypothesised value. But, the standard deviation of the estimate (usually called the standard error) is also estimated, and if the estimated value of the quantity is more than five standard errors from the hypothesised value, then the hypothesis is rejected. (Five standard errors is completely arbitrary, but it seems to be what is used in this field.) If each individual observation has a Gaussian distribution, and the quantity and its standard error are estimated in the usual way, then the ratio of the estimate to the standard error has a t-distribution, with a number of degrees of freedom that depends on how many observations there were. If the number of observations is large, the t-distribution will approximate a normal distribution, and a five-sigma outcome is extremely improbable.

If they ran 100 million experiments, and found that one or two of them were highly unusual, and drew some conclusion from that, that would be ignorant or delusional. I seriously doubt that any scientist with the slightest bit of training would do this - the five sigma outcome appears to refer to the likelihood of the entire data sample, not of a single cherry-picked observation.

I don't see anything ignorant or delusional about what was done here.

3. Neither ignorant nor delusional, I'd say. Pretty standard way of expressing the results of a hypothesis test, in physics. You can translate to a "p" value via the normal distribution.
Can you give an example of a "5 sigma event" that can "occur frequently, but not be significant, if one collects enough data points"? I'm not sure what you're thinking of.

Grant Hutchison

4. The article actually gives a fair amount of detail about the statistical analysis. Frankly, to label the work "ignorant or delusional" is incorrect and insulting.

5. The recurring use of "sigma" when reporting hypothesis tests in physics always makes me feel slightly uneasy, since it seems to imply that a z-test is always being carried out. In my specialty, we used a lot of non-normal sampling distributions (which we'd distil down into a "p" value when reporting results), but I've never delved deeply enough into the stats in common use in physics - maybe they do always involve Gaussian distributions.

Grant Hutchison

6. Order of Kilopi
Join Date
Mar 2010
Location
United Kingdom
Posts
6,941
Originally Posted by grant hutchison
The recurring use of "sigma" when reporting hypothesis tests in physics always makes me feel slightly uneasy, since it seems to imply that a z-test is always being carried out. In my specialty, we used a lot of non-normal sampling distributions (which we'd distil down into a "p" value when reporting results), but I've never delved deeply enough into the stats in common use in physics - maybe they do always involve Gaussian distributions.

Grant Hutchison
It is a recurring issue - people assume you are in the realm of central limits. Not always true unless you have designed your hypothesis test correctly. That said for particle physics the statistics are very well understood (as they are basically the only way to interpret the data), it is perhaps the worst possible area to take physicists to task for in terms of basic statistics errors.

7. Established Member
Join Date
Jan 2010
Location
Wisconsin USA
Posts
2,639
First, I said I liked the article. I wasn't accusing these guys in general of anything. I'm just saying if someone makes 350 million measurements, we expect to get 100, 5 sigma, events that are totally meaningless. I don't know how many measurements they are making at CERN. Tommaso Dorigo went through a long explanation of the weaknesses a few years ago. http://www.science20.com/quantum_dia...iterion-118228

8. Originally Posted by Copernicus
First, I said I liked the article. I wasn't accusing these guys in general of anything. I'm just saying if someone makes 350 million measurements, we expect to get 100, 5 sigma, events that are totally meaningless.
Yeah, but that's very much not the five sigma they're talking about.
The five sigma quoted here is a measure of the likelihood of the observed results occurring by chance in some "null hypothesis" population of results. You take all the experimental results and compare them to the null hypothesis population, and look at how likely all the results would be if the hypothesized particle didn't actually exist. Dorigo actually explained that in the article you link to.
So the scientists are neither ignorant or delusional - they just understand the maths they're using.

9. Order of Kilopi
Join Date
Mar 2010
Location
United Kingdom
Posts
6,941
Originally Posted by Copernicus
First, I said I liked the article. I wasn't accusing these guys in general of anything. I'm just saying if someone makes 350 million measurements, we expect to get 100, 5 sigma, events that are totally meaningless. I don't know how many measurements they are making at CERN. Tommaso Dorigo went through a long explanation of the weaknesses a few years ago. http://www.science20.com/quantum_dia...iterion-118228
But they are not looking for 5 sigma events in 350 million measurements. The five sigma thing they are looking for is essentially testing the null for "This set of measurements conforms to the background model". Once this hypothesis is shown to be likely wrong then the question becomes more subtle - "Is this something new or is my background model wrong?"

And yes, 5 sigma is arbitrary. So is 3 sigma, 7 sigma, 9 sigma. 5 has simply become the guiding rule of thumb because it is convenient. Otherwise when are you able to publish? If two people are working the same problem and one is happy publishing at 3 sigma then they will always publish before the people working to 5, 7 etc sigma. 5 is just a convenient marker post in the community, a way to bring some fairness to the game. Having this threshold isn't a magical remedy or cause for bad statistics, they'd be there no matter what threshold or other criteria you set.

10. Originally Posted by Copernicus
First, I said I liked the article. I wasn't accusing these guys in general of anything.

But you did, with the inflammatory false dichotomy of your question,
"Are scientists ignorant or delusional?" I think almost anyone would take at least some offense at being classified as either. You probably shouldn't make such comments if you want to avoid being challenged on their implications. Not making them is also a good way to comply with the forum requirement to be polite.

11. Originally Posted by Shaula
And yes, 5 sigma is arbitrary. So is 3 sigma, 7 sigma, 9 sigma. 5 has simply become the guiding rule of thumb because it is convenient. Otherwise when are you able to publish? If two people are working the same problem and one is happy publishing at 3 sigma then they will always publish before the people working to 5, 7 etc sigma. 5 is just a convenient marker post in the community, a way to bring some fairness to the game. Having this threshold isn't a magical remedy or cause for bad statistics, they'd be there no matter what threshold or other criteria you set.
Yes, even medical doctors (notoriously poor at maths) have grasped that a "p" value is only the beginning of the discussion, not the end of it. We kind of got the hang of that as a group back in the 1980s, though individual hold-outs probably remain to this day. I'd be astonished if physicists in general (notoriously good at maths) didn't likewise already have some sort of grasp on the issues Dorigo discusses.

Grant Hutchison

12. Originally Posted by Copernicus
I was reading the final article, https://phys.org/news/2018-07-higgs-boson-quarks.html which I think is awesome, but they have this thing in there which states its significance was 4.9 sigma. Are scientists ignorant or delusional? Is it not common knowledge that 5 sigma events can occur frequently, but not be significant, if one collects enough data points. Isn't there a better way than just saying 5 sigma?
Your OP here could've generated similar responses had you not even included the comment implying the authors of the article were ignorant or delusional--especially since it seems that they are not, from those responses.

As I've never had a statistics class this is one of those phrases I see a lot here, and I completely lack a definition for.

Remember when I asked about falsifiability? This is like that. A big perceived gap in my education.

I have no clue how sigma is being used here. What a p is.

Or even what a standard deviation is, much less the concept of two or more standard deviations.

If someone cares to enlighten, I will surely pay attention.

14. Sigma is the customary symbol for the standard deviation of a normally distributed population.
Normally distributed = Gaussian = a bell curve probability distribution. Such distributions (with individual values commonly being found close to the mean, and rarely out in the tails) occur frequently in nature.
The standard deviation is a measure of how wide or narrow the bell curve is, either side of the mean - small standard deviation means a narrow distribution, large standard deviation means a wide distribution.
Because there's a unique mathematical description of this probability distribution, we can calculate how likely it is for an individual value to be found more than one, or two, or five standard deviations from the mean value. Five sigmas takes you right out into the tails of the distribution, so it's very unusual to find a value that far from the mean.
Example: the IQ scale has a mean of 100 and a sigma of 15. A five-sigma high IQ is therefore above 175. (A five-sigma low IQ would be down around 25, but by the time you get that low the distribution tail is getting crowded close to zero, and the assumption of normality - ie, that the distribution is truly Gaussian in this region - fails, and we have to doubt the accuracy of the probability estimate that low on the curve.

So that's the normal distribution.
The "p" value comes up when we us the normal distribution (or some other mathematically defined sampling distribution) to do "hypothesis testing".
Suppose I believe that eating Haribo sweets as a child lowers your IQ. I take 100 children and force-feed them Haribos for a few years, and then measure their IQ. I find the mean IQ is 85. So I ask myself, "If I took repeated samples of 100 people from the normal (mean=100, sigma=15) IQ distribution, how often would I be unlucky enough to get so many dim ones that the sample mean would be 85?" We can actually calculate the distribution of sample means in those circumstances, so we can say how likely (or unlikely) my sample is to have been drawn from kids with a normal distribution of IQs. If it turns out to be really unlikely that such a sample would turn up by chance, then I reject the "null hypothesis" (Haribos don't affect IQ, and I was just a little unlucky with my sampling) and accept the alternate hypothesis (my sample is sufficiently unlikely to have turned up by chance, I have to assume that there's a real effect from Haribos on IQ). The "p" value quoted for the result of my hypothesis test is the probability that my sample would have appeared by chance when sampling the standard population with mean IQ of 100. The lower the p value, the less likely my sample of kids are to represent a population with a normal IQ, and therefore the more likely is my hypothesis (about Haribos being related to low IQ) to be correct.

Grant Hutchison
Last edited by grant hutchison; 2018-Jul-11 at 03:52 PM.

15. I...I...

sort of almost understood that!

16. So how do you establish an initial sigma, for something like IQ for instance?

17. Originally Posted by BigDon
So how do you establish an initial sigma, for something like IQ for instance?
IQ's unusual, because the distribution is deliberately standardized - we invented it, so we determine the distribution.
For the real world, we take lots of measurements, and then calculate the mean and standard deviation.
Sigma is a measure of the spread of the data around the mean. Symbolize the mean with μ, and the value of a given measurement by x.
The measurement's distance from the mean is (x - μ).
We might think that adding up all the values of (x - μ) for all the measurements would give us a measure of the spread of the data around μ - but unfortunately the negative and positive values cancel out, and we're left with zero.
So what we do is we square (x - μ), so that all values are positive, and we sum that for all measurements.
We symbolize the summing with a capital sigma, like this: Σ(xi - μ)²
But now we need to control for the number of measurements we made, because obviously the more measurements there are, the bigger the sum is going to get. So we divide by the number of measurements, to get a mean deviation from the mean: Σ(xi - μ)²/N
That's good enough as a measure of spread, and it's called the variance of the dataset. But it has the disadvantage that its units are the square of whatever the measurement units are. So we take a square root, and that's the standard deviation:

sigma = √[Σ(xi - μ)²/N]

There are subtleties (aren't there always?) but that maybe gives you the idea of how it works.

Grant Hutchison

18. Thank you, Dr. Grant.

The darkness just got a little lighter.

19. Member
Join Date
May 2018
Posts
19
BigDon - if you are so inclined, there is an experiment you can conduct at home.

Suppose you have a coin, and you're not convinced that it is a "fair" coin, which comes up "heads" half the time and "tails" the other half. How can you determined whether the coin is fair?

You could try to analyse the coin physically, maybe use some kind of imaging technology to see whether the metal is of uniform density, things like that. Or you could perform a statistical experiment, which is just to throw the coin times.

If the coin is fair, then the number of heads should be approximately . However, it is unlikely, even if is very large, that the number of heads will be exactly , due to the randomness of the throwing process; in fact, it is impossible if is odd. However, if the coin is fair, the number of heads should be "close" to . So how do we decide just how "close" is close enough?

Throw the coin times, and under the assumption that the coin is fair, the "average" number of heads should be , and the "standard deviation" is . (This problem is unusual, which has to do with the fact that there are only two possible outcomes. In this case, if the coin is fair, the standard deviation is known exactly; in the more complicated physics problem described in the article, the standard deviation must be estimated.) The distribution of the number of heads is not the normal probability distribution to which Grant Hutchinson refers, but that's a very good approximation if is large. So count the number of heads, and call it . That's what you observed; it should have been . So is the difference between what you got, and what you should have gotten. Divide by the standard deviation, so our "test statistic" is .

Let's suppose you throw the coin times, and you get 53 heads. The test statistic is 0.6. So the result (three heads more than expected) is 0.6 standard deviations, or sigmas, away from the expected outcome of 50. From the properties of a normal distribution, a deviation of three or more heads from the expected outcome of 50 happens about 55% of the time. So this is very weak evidence of an unfair coin - even if the coin is fair, you'd get 53 or more, or 47 or less, about 55% of the time!

But suppose instead that you get 62 heads. The test statistic is 2.2. The probability of getting a 2.2 sigma result, just by luck, is about 2.8%. In many social sciences, this is considered good enough to conclude the coin is unfair. The conclusion might still be wrong - it's just not that likely. If the coin is fair, then 2.8% of the time, you will come to the false conclusion that it is unfair. If that's not good enough for you, you could demand an even higher sigma level before you conclude the coin is unfair. However, you might see the problem here - the more reluctant you are to make one sort of mistake (concluding the coin is unfair, when it isn't), the more likely you are to make a different mistake (failing to detect that the coin is unfair, when it is).

You can improve the situation by throwing the coin more times. If it is not costly to do so, then throw it 10,000 times instead of 100. If each experiment is costly, though, you have to think carefully about how precise you need to be.

The misunderstanding of the OP appears to be that the "sigma" referenced has something to do with the distribution of an individual trial; it doesn't. It has to do with the joint distribution of all of the experiments conducted.
Last edited by fullstop; 2018-Jul-12 at 08:27 AM.

20. Established Member
Join Date
Jan 2010
Location
Wisconsin USA
Posts
2,639
It looks like they are trying to subtract all the other decay sources of b quarks, from the observed b quarks, and attribute them to the decay of the Higgs boson. Which would be quite the feat. There could be many other particles hidden from view by such processes. I don't really know how possible this is. My question was isn't there a better way than saying 5 sigma. In the case for this article, apparently there are very few Higgs produced, so it would not amount to even a hundred thousand, which in that case I would think the criteria should not even be 5 sigma, but less than that. Thoughts?

21. Order of Kilopi
Join Date
Mar 2010
Location
United Kingdom
Posts
6,941
Originally Posted by Copernicus
I don't really know how possible this is.
It is pretty much how every particle from the Xi on was discovered.

Originally Posted by Copernicus
My question was isn't there a better way than saying 5 sigma. In the case for this article, apparently there are very few Higgs produced, so it would not amount to even a hundred thousand, which in that case I would think the criteria should not even be 5 sigma, but less than that. Thoughts?
For rare events in a cluttered background I'd be more inclined to increase the required threshold, not decrease it.

22. Established Member
Join Date
Jan 2010
Location
Wisconsin USA
Posts
2,639
Originally Posted by Shaula
It is pretty much how every particle from the Xi on was discovered.

For rare events in a cluttered background I'd be more inclined to increase the required threshold, not decrease it.
Interesting. I guess I never thought about it too much, except that when I would look at the curves, I wondered how they would ever find SUSY particles, if there was so much background noise.

As far as the sigma, is there a mathematical basis for increasing the sigma for a cluttered background, or would it just make people feel better? It seems there should be a calculation to figure out what the sigma should be for significance.

23. Order of Kilopi
Join Date
Mar 2010
Location
United Kingdom
Posts
6,941
Originally Posted by Copernicus
Interesting. I guess I never thought about it too much, except that when I would look at the curves, I wondered how they would ever find SUSY particles, if there was so much background noise.
It all comes down to statistics in the end - even small signals can be significant if you build a big enough data set.

Originally Posted by Copernicus
As far as the sigma, is there a mathematical basis for increasing the sigma for a cluttered background, or would it just make people feel better? It seems there should be a calculation to figure out what the sigma should be for significance.
Sigma is directly related to confidence/significance. The reason I say I'd be tempted to increase it rather than decrease it is more to do with the quality of the background model than significance. When you have a background that produces a lot of similar looking events the quality of the model you use to assign them to the various processes become critical. Clutter optimised background suppression or target matching algorithms tend to be very sensitive to small deviations from the model, which is why I say that in a cluttered environment I'd tend to use a higher sigma threshold. I'd avoid a lower one because of this too.

An analogous situation arose with the infamous 'cold spot' in the cosmic microwave background radiation. In that a background suppression algorithm was being used that had the unwanted side effect of modifying all signals to have the same form the target matching algorithm was looking for, thus even a small deviation was amplified by the interaction of the two filters and appeared very significant when it probably wasn't. Effectively the background model of the target matching algorithm was wrong because it assumed that the clutter would not have the distribution that the signal would. But the background suppression algorithm had imposed that form on the clutter...

24. The forum should have some kind of "irony-indicator" on threads.

25. Order of Kilopi
Join Date
Oct 2005
Posts
26,157
Another point to bear in mind is the difference between random error and systematic error. Random errors can be expected to follow the central limit theorem, so should produce Gaussian results and motivate the "sigma" language if there are sufficiently many trials involved. But a key assumption in the central limit theorem is the independence, i.e., randomness, of the errors in the trials. Taking Grant's example of IQ tests, if one got 85 and concluded that should not be a random fluctuation, then it is, as he said, only the beginning of the conversation. Before Haribo candies should be forced to include warnings, one would want to be very sure there were not systematic errors in the study. For example, the way Grant described the study, there was not a control group of 100 kids getting placebo Haribo (which should be a thing in itself), so we must consider the systematic error that their way of establishing IQs could be tainted to give lower results. That would be a systematic error affecting all the results, and is hard to tease out in CERN experiments because what is the control group there?

26. Order of Kilopi
Join Date
Feb 2005
Posts
10,924
Originally Posted by BigDon
I...I...

sort of almost understood that!

The first time I heard about the concept, was reading asytronautix's OTRAG listing where "the CRPU was human-rated and had a confidence level higher than 6-sigma."

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•