Originally Posted by

**Swift**
Very frequently the +/- value is one, two, or three standard deviations, which for a normal distribution would include 68%, 95%, or 99.7% of the data, but as Grant says, you'd have to look at the reference to know if that is what they are reporting, and if it is a normal distribution.

And yes, even if those values are three standard deviations, and it is a normal distribution, there is a chance that the value could be outside those limits.

I agree completely with the above answer, although I will add a little bit to it. The "chance that the value could be outside those limits", from the second paragraph, is actually given in the first paragraph - if the range covers one standard deviation, it is about 0.32, if it covers two standard deviations, it is about 0.05, and if it covers three standard deviations, it is about 0.003.

Note that if the estimate does have a normal distribution, the probability of being a large number of standard deviations off is extremely small - for example, it is about 2 in a billion for six standard deviations. A normal distribution has tails that go to zero very quickly - the probability of being far away from the estimated value is very very small.

There is a result known as Chebyshev's inequality that can be used if the estimate has a non-normal distribution - it essentially gives you the "worst case" probability that the true value is outside the estimated range. The probability of being six or ten or some other large number of standard deviations away from the estimated value will be much higher using Chebyshev's inequality than assuming a normal distribution.

Originally Posted by

**grant hutchison**
Eccentricity is not alway normally distributed (it depends on what orbits you're sampling), and I have to say that the numbers Tom has been posting scream "not normally distributed" to me, but that may be because Tom is sampling the tail of a sample.

Grant HUtchison

It is very likely that the underlying phenomenon does not have a normal distribution, but it is the distribution of the estimate, not the underlying phenomenon, that matters.

To take an illustrative example, suppose I have a six-sided die, with the numbers 1 through 6 on each side. I want to estimate the average value produced by rolling the die. (Let's say, I'm not sure whether it is a "fair" die, or if it is weighted so that some numbers are more likely than others.) So suppose it is a fair die, but I don't know this, and want to determine it experimentally. So I throw the die a large number of times, and record the values.

Now, since it is a fair die (even though I don't know this), the distribution of the data will be approximately uniform, which is very different than a normal distribution. It is discrete and bounded, with equal probability for all outcomes; the normal distribution is continuous, unbounded, and different ranges of values (of equal size) have different probabilities of occurring. So the outcomes produced by throwing the die do not conform well to a normal distribution at all.

But, if I estimate the average value by adding up all of the die throws and dividing by the number of throws (a sample average), the distribution of this estimate is approximately normal, even for a relatively small number of die throws. For a large number of die throws, the distribution of the estimate is extremely close to normal.

This is a consequence of the central limit theorem, which (warning: oversimplification) states that sample averages have approximately a normal distribution, regardless of the distribution of the underlying phenomenon. Given that, it is quite common to assume that a sample average has a normal distribution, whatever the distribution of the underlying phenomenon is. However, there are *some* assumptions needed for the central limit theorem to apply, so it sometimes doesn't work. Also, if the estimation method was something different than taking a sample average, then the central limit theorem would not apply.

So the appropriateness of assuming normality does not depend only on the distribution of the underlying phenomenon (in this case, "eccentricity"), but also on the estimation method used. It's entirely possible that one person would use one estimation method where normality is a reasonable assumption for the distribution of the estimate, and someone else uses a different estimation method where it is not reasonable to assume normality.

The other thing to keep in mind here is, the standard deviation is itself not known - it has to be estimated. Usually estimates of the standard deviation are called "standard errors", so it might be slightly more precise to use this terminology in Swift's answer (and also in my comments on it, although I'm not going to go back and correct my terminology). So there is not only uncertainty in producing the estimate itself, but also in estimating the standard deviation. We really can only get exact statistical results for very simple underlying distributions and simple estimation methods; in most cases, the best we can do is get so-called "asymptotic" results. That means the confidence interval is actually off somewhat, but as the dataset becomes larger, the error in the confidence interval becomes smaller. For a large enough data set, the error in determining the confidence interval can safely be ignored, but determining what is "large enough" can be more of an art than a science sometimes.

So summing up, I think a highly relevant point is this one:

Originally Posted by

**grant hutchison**
It'll be a confidence interval of some sort (which is to say, there will be a quantified probability that the true value lies outside the quoted +/- range). You need to look at the documentation to find out what it means, because different authors mean different things.

People who live in glass houses, should get undressed in the dark.