PDA

View Full Version : Science as Falsification



Maddad
2005-Feb-12, 10:10 PM
Science is a process as opposed to a discovered fact. It starts with an observation that needs explaining. If your explanation is testable, meaning that you provide a means of disproving your explanation, then we call that explanation a hypothesis.

If we test the hypothesis and fail to disprove it, and the implications of the explanation are broad, encompassing more than just the data necessary to formulate it, then we call the hypothesis a theory. The more we try and fail to disprove the theory, and the more broadly you are able to apply the theory, the stronger that theory becomes. You never prove it though; you just make it more and more solid.<Hr Width=80% NoShade>

In another thread at another forum, I tried to explain to another member the need to attempt to disprove a theory before you label it as solid. Although the gentleman was knowledgeable, he did not get what I was saying, and after several attempts, I gave up trying to convince him. Without aggressive failed attempts to disprove the theory, you fall victim to the very non-scientific trap of confirmation. Karl Popper made a strong case for it in his 1963 paper entitled, "Science as Falsification" ( http://www.stephenjaygould.org/ctrl/popper_falsification.html ). It is one of the great papers in science because it defines for us the difference between the process of science and that of pseudo-science.

Evan
2005-Feb-12, 10:47 PM
Ummm. You have it a bit backwards. The scientific method does not seek to disprove a hypothesis. It has 5 steps.

The scientific method is as follows:

1: Either an observation that requires explanation is made or a conjecture is formed (no observation required at all)

2: A preliminary explanation that would account for the observation is developed. In the case of a conjecture then this step is satisfied. This preliminary explanation is a hypothesis.

3: A method is proposed to test the hypothesis. It is NOT required to attempt to disprove the hypothesis, it is sufficient to confirm it. The experiment(s) are performed and the results compared to see if they are expected. If they are not then go to step 2 and adjust the hypothesis if possible. Repeat this step until either agreement is discovered or the hypothesis must be discarded.

This is where many go wrong. The experiment(s) should be seeking data, not attempting to either prove or disprove the hypothesis. That is how experimental bias creeps in. In a proper experiment there should be no expectation of a particular outcome designed into the experiment. This is why double blind studies are performed.

If agreement between the data collected and the hypothesis is found then the hypothesis is examined to see if it is able to predict additional testable outcomes of further experiments.

4: Devise experiments to test the predictions of the hypothesis. Perform the experiment(s). If the data from these experiments do not agree with the hypothesis then adjust the hypothesis if possible and go to step 3


5: If the data confirm the hypothesis then it may be elevated to the status of theory. Repeated experiments that confirm the theory may eventually give it the status of natural law. Even if there are data that contradict the theory it does not necessarily mean it is incorrect, merely incomplete.

dgruss23
2005-Feb-13, 12:42 AM
Good description. I agree with most of what you say here Evan, so I'm just going to nitpick a few things.


Ummm. You have it a bit backwards. The scientific method does not seek to disprove a hypothesis. It has 5 steps.

Its tradition to call it the "Scientific Method", but I like Carlo Lastrucci's book "The Scientific Approach" much better. The "Method" always leads to a cookbook style presentation of what science is.



3: A method is proposed to test the hypothesis. It is NOT required to attempt to disprove the hypothesis, it is sufficient to confirm it.

This is true. Sometimes a "test" really cannot refute a hypothesis. But I think we should not stray too far from Popper's point that it must be possible to falsify a hypothesis if it has any value.



5: If the data confirm the hypothesis then it may be elevated to the status of theory.

Hypotheses are generally too small in scope to be elevated to the status of a theory. Generally theories come about from an attempt to explain multiple tested hypotheses and observational results.


Repeated experiments that confirm the theory may eventually give it the status of natural law.

Theories and Natural laws are very distinct entitities with distinct purposes. A natural law is a statement of how nature behaves that has no known violations. But natural laws do not attempt to explain the reasons for the behavior. A theory attempts to explain why nature appears to behave as indicated by our observations.

A simple example is Newton's Law of gravity which only describes the mathematical relations we call "gravity". This may be contrasted with Einstein's Theory of relativity which attempts to explain the reason for gravity's behavior.

There is also a difference in the predictive aspects of natural laws and theories. Natural laws basically predict that nature will continue to behave as described by the natural law. We predict that a ball tossed up will come down - because that's what Newton's law of gravity describes as the behavior of gravity. Theories on the other hand should make predictions about behaviors that haven't yet been observed or tested. For example Einstein's theory predicted gravitational lensing - which has since been observed.


Even if there are data that contradict the theory is does not necessarily mean it is incorrect, merely incomplete.

That's absolutely true. Its a challenging line to walk between saying that Theory A needs to be modified and Theory A has now been disproven. Its along that line where much of the exciting (and sometimes testy) debate in science occurs.

dgruss23
2005-Feb-13, 12:48 AM
In another thread at another forum, I tried to explain to another member the need to attempt to disprove a theory before you label it as solid.

Wait - there's another forum? :D

Evan
2005-Feb-13, 03:04 AM
Theories and Natural laws are very distinct entitities with distinct purposes. A natural law is a statement of how nature behaves that has no known violations.

The "laws" of thermodynamics did not spring out of nothing fully formed. It was through the application of the scientific method by Lord Kelvin, Carnot, and Clausius through their studies of heat and machines that the concept of entropy was developed. This has been the subject of innumerable experiments since and every one upholds the principles.



Hypotheses are generally too small in scope to be elevated to the status of a theory

I offer the hypothesis (explanation) of the photoelectric effect, Maxwell, Einstein and others. Richard Feynman and QED, not yet refuted. And, perhaps the best example is Murray Gell-Mann and the quark hypothesis, a conjecture based on no direct observational evidence that explains all we know about the subatomic world today.



There is also a difference in the predictive aspects of natural laws and theories. Natural laws basically predict that nature will continue to behave as described by the natural law. We predict that a ball tossed up will come down - because that's what Newton's law of gravity describes as the behavior of gravity. Theories on the other hand should make predictions about behaviors that haven't yet been observed or tested.

Splitting hairs there. They haven't been observed because we haven't looked under that corner of the rug.

archman
2005-Feb-13, 06:56 AM
I've been trained (and now teach to undergraduates) to never devise an experiment without a null hypothesis incorporated into it. The null is our "no change, no effect" hypothesis, and it's that which we seek to prove/disprove. If we disprove the null, it supports the primary hypothesis. This is the only method to "prove" the main hypothesis that we teach to students.

'Course maybe this narrow interpretation only applies with statistics. I get stuck doing a lot of statistics in my field. Bleah.

trob
2005-Feb-13, 09:07 AM
There has been a massive discussion of this here: http://www.badastronomy.com/phpBB/viewtopic.php?t=14935&start=0&postdays=0&postorder =asc&highlight=falisification

all the best
Trob :D

archman
2005-Feb-13, 10:31 AM
There has been a massive discussion of this here: http://www.badastronomy.com/phpBB/viewtopic.php?t=14935&start=0&postdays=0&postorder =asc&highlight=falisification

Ugh. 33 pages of discussion. That's... swell. :o

Evan
2005-Feb-13, 11:37 AM
Much ado about nothing. Popper's comparison of the "hard" sciences to the "soft" sciences of psychology and the interpretation of human behaviour simply do not apply to the confirmation or denial of observational evidence of how the universe works.

The idea that a theory or hypothesis should be somehow "falsifiable" is a play on semantics. It must be testable. He alludes to that but then proceeds to obscure the fact with serious bafflegab involving the field of human behaviour. Humans change the rules of how they behave when it suits them. The universe does not.


Archman,

I do not understand what you mean by the statement of "...to never devise an experiment without a null hypothesis incorporated into it".

A proper experiment investigating the operation of some facet of the universe collects data. It should contain in its design no presumptions.

When I was 14 I did an experiment for the science fair at my school. It was, as I titled it, an investigation into "The Geotropic Effect on Plants".

I grew bean plants in test tubes both in control test tubes and in test tubes that were continously spun at all times in a centrifuge. The spinning test tubes were flung to a 45 degree angle by the centifugal effect so I grew the controls in tubes oriented at the same angle.

The hypothesis was that plants would not distinguish a difference between gravity and the centrifugal effect. I did not know in advance what the outcome would be. I took care to provide diffuse light to the test plants so as to eliminate that as a confounding factor.

I was not suprised after a month when the control plants grew out of the test tube and then turned 45 degrees to grow straight up. My plants in the centrifuge grew out of the test tubes and continued to grow in a direction oriented directly straight out of the test tubes. They did not turn up as did the controls.

This was not expected or unexpected. It was enlightening. I had shown that plants cannot distinguish between gravity and the centrifugal force.

I did win first prize.


The point is that the experiment may have confirmed or denied my assumption. I took every precaution to eliminate effects that were not relevant or may have obscured the effect studied. I did not seek to either disprove the presumed effect or prove it. I was looking to see what would happen. I did have a conjecture formed in advance and the results did confirm it but that did not influence my design of the experiment. The design of the experiment was determined by the subject being studied.

I will point out that my father was a nuclear physicist and a science teacher.


How would "falsifiabilty" apply to such an experiment?

A Thousand Pardons
2005-Feb-13, 12:07 PM
The idea that a theory or hypothesis should be somehow "falsifiable" is a play on semantics. It must be testable.
Popper's work is vast and I hesitate to jump into a discussion of it so soon after the demise of my old friend soupdragon2, but Popper did say at least a few times in his work that falsifiability and testability were equivalent. The emphasis is different, but I think that was Popper's point, rather than just a play on words.


The hypothesis was that plants would not distinguish a difference between gravity and the centrifugal effect.



How would "falsifiabilty" apply to such an experiment?
The outcome could have been that the plants would distinguish a difference? Is that what you mean?

Evan
2005-Feb-13, 09:22 PM
The outcome could have been that the plants would distinguish a difference? Is that what you mean?

I don't know. What do you mean? The plants may be able to distinguish gravity from centrifugal force, or not. I did not know when I started the experiment.

Again, I say that I did not design the experiment to disprove my hypothesis. I designed it to provide data. The data confirmed my hypothesis. It could just as well have not done so. In that case I would be forced to look for an alternative hypothesis.

I still do not understand the concept of rigging an experiment to try to disprove the hypothesis. The purpose of an experiment is to collect data.

Andrew
2005-Feb-13, 09:41 PM
Again, I say that I did not design the experiment to disprove my hypothesis. I designed it to provide data. The data confirmed my hypothesis. It could just as well have not done so. In that case I would be forced to look for an alternative hypothesis.
In which case you would have falsified your hypothesis. Instead the outcome of the experiment corroborated it.

I still do not understand the concept of rigging an experiment to try to disprove the hypothesis. The purpose of an experiment is to collect data.
Say you had two competing theories of some phenomenon. They may differ in some predictions but agree in others. At most, only one can be correct. In performing an experiment where both appear to predict the same outcome and that is what is observed, what have you acheived? You can only eliminate one (or both) by attempts at falsification.

A Thousand Pardons
2005-Feb-13, 09:48 PM
Again, I say that I did not design the experiment to disprove my hypothesis. I designed it to provide data. The data confirmed my hypothesis. It could just as well have not done so. In that case I would be forced to look for an alternative hypothesis.
It's a matter of perspective, I think. You could just as well say that you did not design the experiment to prove your hypothesis. That ability of the hypothesis to be either true or false is what you call testability or what Popper calls falsifiability. If there is a hypothesis that is not falsifiable, then Popper says that it lies outside the realm of science. I think soupdragon2 gave some examples a while back, but I'd have to look for them.


I still do not understand the concept of rigging an experiment to try to disprove the hypothesis. The purpose of an experiment is to collect data.
Depends. A lot of people would disagree with that. For instance, the famous Luis Alvarez, in his arguments with geologists, would deride them as non-scientists, "stamp collectors" interested only in collecting data.

I, of course, respectfully disagree with Dr. Alvarez. :)

trob
2005-Feb-13, 10:15 PM
I still do not understand the concept of rigging an experiment to try to disprove the hypothesis. The purpose of an experiment is to collect data.

Such a perspective is completely invalid. #-o At best I would call it Baconian. http://www.science.uva.nl/~seop/entries/francis-bacon/#5
There is more to science than the simple listing of data. Even the notion of science a constant and gradual accumulation of data is problematic and not in correspondence with scientific history. Infact Bacon's point was exactly that the natural sciences were different from theology and abstract reasoning in that they utilise experience and experiment. Essentially this view of science is that once you have seen a limited number of some phenomenon, you may generalize. Logically speaking you move frome "some" to "all". As the first, Hume points out that this is not - logically - a valid movement, since it cannot be proven syllogistically. Indeed this has now become known as the problem of induction: http://en.wikipedia.org/wiki/Problem_of_induction and states that not through any non-infinite listing of examples can we prove anything.
Since we are humans we cannot utilize methods that will litterally take for ever to perform, which is a problem since science is about proving your claims. Popper, however, solves the logical connundrum by pointing out that there are certain things we can do in science - which are logically valid - and that is disprove claims. You only need one contrary example to falsify a universal claim.
Popper's eminently logical solution to the problem through falsification has a price, namly that we must wave bye bye tyo positive certainty. As Popper Writes in Objective Knowledge - An Evolutionary Approach ( Revised edition. Clarendon Press. Oxford, 1979. Page 8 ): For it may happen that our test statements may refute some - but not all - of the competing theories; and since we are searching for a true theory, we shall prefer those whose falsity has not been established. In other words we must suffice with knowing that we are not yet wrong....
For a more complete discussion see: http://en.wikipedia.org/wiki/Falsifiability

All the best
Trob :D

Kesh
2005-Feb-13, 10:38 PM
Say you had two competing theories of some phenomenon. They may differ in some predictions but agree in others. At most, only one can be correct. In performing an experiment where both appear to predict the same outcome and that is what is observed, what have you acheived? You can only eliminate one (or both) by attempts at falsification.

That doesn't quite sound right.

Okay, so both hypothesis predict the same results of an experiment, but differ in a few other predictions. You cannot falsify either experiment that way... they both work for the end results. Instead, you should be performing the experiment again (or reviewing the gathered data) to observe the other predictions for each hypothesis. If one proves true while the other does not, then you have proven one hypothesis and disproven the other.

Or, at least, you've thrown significant doubt on the second. It's still possible that revisions to the second hypothesis could cause it to not only show accurate predictions of those elements, but incorporate the other predictions from Hypothesis 1 as well.

So, it's not a matter of falsifying the predictions as much as it is finding where predictions are accurate or inaccurate.

Weird Dave
2005-Feb-13, 11:36 PM
I'd disagree that a hypothesis can only be falsified, and never proved. Take for instance the hypothesis, "The Loch Ness Monster exists," or anything similar. It can easily be proved by finding the Loch Ness Monster, but it is very difficult to falsify without combing every cubic inch of the loch (even then, the Monster could be hiding in a cave or something). Similarly with the existence of magnetic monopoles and alien civilisations.

Some hypotheses, such as, "Global warming is caused by human CO2 emissions," are even trickier. There is no one experiment (or even series of experiments) that can prove or disprove this, in the absence of a large number of identical Earths to experiment on. OK, there are simulations, but their accuracy can be debated. I suspect that eventually the balance of evidence will be either for or against the hypothesis, but with no certainty of proof or disproof.

Contrast this with other situations. It would only take one good experiment, and repeats of that experiment, to prove (or at least, establish beyond reasonable doubt) that general relativity is not accurate in at least some situations, and is thus incomplete. General relativity would not be "disproved": because it gives accurate predictions in many situations, it is still a useful theory. However, we would know that there is a better theory that is more accurate, and better describes the Universe.

Finally, some experiments aim to measure a quantity (e.g. the speed of light) rather than test a theory. In that case, the experimenters should average their results with previous experiments, weighting the different values according to their accuracy and precision. We will then get a new, more precise value for that quantity, but the old value is not "disproved". All scientific quantities come with uncertainties that estimate their accuracy, and the old value has a larger uncertainty.

archman
2005-Feb-14, 12:15 AM
How would "falsifiabilty" apply to such an experiment?

Evan, I will strictly limit my remarks to what I used to teach freshman biology students, taken from current college biology texts and lesson plans. Again, what our department teaches may be purist, and with more application towards quantitative and/or statistical experimentation.

Let's take your bean plant science project, and run it through the scientific method (for experimental design) step by step. I'll omit the observational phase that precedes the actual hypothesis.

A. Hypothesis. plants would not distinguish a difference between gravity and the centrifugal effect.

It is classically assumed that in an experiment, the experimenter is seeking to elicit change (or effect) of one variable in response to another variable or variables. So...

...you need a test (independent) variable, and a response (dependent) variable. I would assume bean plant growth is the dependent variable, and centrifugal force is your independent. So now you can set up your test hypotheses.

H1: Plant Growth is affected by Centrifugal Force
Ho (null): Plant Growth is unaffected by Centrifugal Force

To support your test hypothesis (H1), it will be necessary to disprove Ho, the “no change, no effect hypothesis”.

B. Procedure. You have bean plants being spun in a centrifuge, and you have bean plants sitting in test tubes as controls. You also have the controlling variable of light intensity accounted for. That’s pretty good. You have all the essentials covered.

C. Results. The measured growth difference between your experimental and control plants will directly tell you whether or not you can support or disprove your null hypothesis. If there is no difference in growth between your experimentals and controls, then you support the null hypothesis, and thus cannot support your test hypothesis. If you do measure a discernible change, then your null is disproved, and you can support your test hypothesis. From a purist standpoint you can never come out and say that your test hypothesis is “proven”, but you can state that your experimental results “support it”.

D. Conclusion. Was the null hypothesis disproven? YES! The growth of bean plants was affected by application of centrifugal force. Therefore, we can toss out the null, and "support" the test hypothesis.

Ta Da!

These criteria are in fact what we use at our university when the regional science fairs come along every spring. We look for presence of independent and dependent variables, and an experiment that uses them properly to refute/support the hypotheses being offered. We try to ignore the flash and glam of pretty posters or current topics, and limit ourselves to proper adhernence to experimental design.

Of course, my remarks are only limited to what is taught in the biological and statistical sciences, and that at the introductory college level. Our sciences college has found that most students (even the premed ones) enter college with only a hazy notion of the scientific method and experimental design, so we make it a point to educate them at the earliest opportunity. Astronomers may do something completely different; our own astronomy department is in a different college and I don’t know what they teach. However our introductory biology program is regarded as one of the best in the U.S., and the only critiques we receive come from all the students that we *cough* flunk out, not from fellow scientists. In my field I also do a great deal of natural history observation. We don’t consider that experimentation, as no variables are being manipulated. It’s simply data gathering. The data can be used to design experiments later, of course. That's usually why I at least gather data.

My own dissertation research is required to have stated hypotheses that are testable; thus they need to incorporate null’s that can be disproven. It’s the most basic requirement in our research proposals. My committee professors can look directly at those and immediately determine whether or not my experimental design is sound. I have very tough professors!

dgruss23
2005-Feb-14, 12:15 AM
Theories and Natural laws are very distinct entitities with distinct purposes. A natural law is a statement of how nature behaves that has no known violations.

The "laws" of thermodynamics did not spring out of nothing fully formed. It was through the application of the scientific method by Lord Kelvin, Carnot, and Clausius through their studies of heat and machines that the concept of entropy was developed. This has been the subject of innumerable experiments since and every one upholds the principles.

Yes, but I need some clarification as to how that relates to my statement you're quoting.



Hypotheses are generally too small in scope to be elevated to the status of a theory


I offer the hypothesis (explanation) of the photoelectric effect, Maxwell, Einstein and others. Richard Feynman and QED, not yet refuted. And, perhaps the best example is Murray Gell-Mann and the quark hypothesis, a conjecture based on no direct observational evidence that explains all we know about the subatomic world today.

Again I need more clarification as to what you're trying to say.



There is also a difference in the predictive aspects of natural laws and theories. Natural laws basically predict that nature will continue to behave as described by the natural law. We predict that a ball tossed up will come down - because that's what Newton's law of gravity describes as the behavior of gravity. Theories on the other hand should make predictions about behaviors that haven't yet been observed or tested.


Splitting hairs there. They haven't been observed because we haven't looked under that corner of the rug.

No - absolutely not. This is not a matter of splitting hairs. Its a matter of clarifying the fundamental difference between a natural law and a theory. The terms natural law and theory are not interchangeable words with the same meaning.

Its worth repeating - Natural laws are descriptions of nature's behavior with no known exceptions - but they make no attempt to explain the reasons for nature's behavior. Natural laws may also have defining conditions under which they apply. For example, the law of conservation of mass applies to ordinary chemical reactions. However, it does not apply to nuclear reactions which convert mass to energy. For nuclear reactions you have the law of conservation of mass-energy.

Theories bring together a variety of observations, interpretations, and natural laws and attempt to explain the reasons for the behavior of nature under a single framework. A key aspect of a good theory that natural laws lack is the ability to predict new phenomenon that have not been previously conceived. As I noted before, gravitational lensing was a new prediction from relativity theory that was later confirmed with observations.

This distinction is more than a matter of looking under previously known but not yet examined parts of the rug. With theories, new rugs are found(or sought) that nobody ever thought existed before.

dgruss23
2005-Feb-14, 12:24 AM
The outcome could have been that the plants would distinguish a difference? Is that what you mean?

I don't know. What do you mean? The plants may be able to distinguish gravity from centrifugal force, or not. I did not know when I started the experiment.

Again, I say that I did not design the experiment to disprove my hypothesis. I designed it to provide data. The data confirmed my hypothesis. It could just as well have not done so. In that case I would be forced to look for an alternative hypothesis.

I still do not understand the concept of rigging an experiment to try to disprove the hypothesis. The purpose of an experiment is to collect data.

Its not about rigging the experiment in an attempt to purposefully disprove the hypothesis. The idea is that an experimental design should be such that the hypothesis could potentially be refuted. Soupdragon had talked at some length about Popper's concern that in some of the social sciences every observation could be interpreted as confirmation of the hypothesis. Popper's point was that if a hypothesis cannot be subjected to a test that could potentially refute it, then its not worth squat (although he certainly put it more elegantly than that! :) )

dgruss23
2005-Feb-14, 12:28 AM
I'd disagree that a hypothesis can only be falsified, and never proved. Take for instance the hypothesis, "The Loch Ness Monster exists," or anything similar. It can easily be proved by finding the Loch Ness Monster, but it is very difficult to falsify without combing every cubic inch of the loch (even then, the Monster could be hiding in a cave or something). Similarly with the existence of magnetic monopoles and alien civilisations.

Some hypotheses, such as, "Global warming is caused by human CO2 emissions," are even trickier. There is no one experiment (or even series of experiments) that can prove or disprove this, in the absence of a large number of identical Earths to experiment on. OK, there are simulations, but their accuracy can be debated. I suspect that eventually the balance of evidence will be either for or against the hypothesis, but with no certainty of proof or disproof.

Contrast this with other situations. It would only take one good experiment, and repeats of that experiment, to prove (or at least, establish beyond reasonable doubt) that general relativity is not accurate in at least some situations, and is thus incomplete. General relativity would not be "disproved": because it gives accurate predictions in many situations, it is still a useful theory. However, we would know that there is a better theory that is more accurate, and better describes the Universe.

Finally, some experiments aim to measure a quantity (e.g. the speed of light) rather than test a theory. In that case, the experimenters should average their results with previous experiments, weighting the different values according to their accuracy and precision. We will then get a new, more precise value for that quantity, but the old value is not "disproved". All scientific quantities come with uncertainties that estimate their accuracy, and the old value has a larger uncertainty.

Excellent post Weird Dave! =D> There are so many nuances to the scientific approach and you've certainly captured some of them here.

Metricyard
2005-Feb-14, 02:52 AM
Contrast this with other situations. It would only take one good experiment, and repeats of that experiment, to prove (or at least, establish beyond reasonable doubt) that general relativity is not accurate in at least some situations, and is thus incomplete. General relativity would not be "disproved": because it gives accurate predictions in many situations, it is still a useful theory. However, we would know that there is a better theory that is more accurate, and better describes the Universe.

Would the anamolous acceleration of the Pioneer probes be be a good example of this, or would that be classified as something different?

Bathcat
2005-Feb-14, 03:21 AM
People sometimes point out that theories should not be confused with Reality-mit-der-Kapital-R.

General Relativity models gravitation as spacetime curvature. It should not be taken to say that spacetime Really Does curve.

GR is an excellent mathematical model of gravitation. It is immensely successful and useful. But it is a model. Another model -- quantum gravitation, say (if an excellent theory of it existed) -- might model Reality just as effectively in another way.

At least so I have heard.

trob
2005-Feb-14, 06:30 AM
I'd disagree that a hypothesis can only be falsified, and never proved. Take for instance the hypothesis, "The Loch Ness Monster exists," or anything similar.

So Im a scientist because my hypothesis that the ball in the garden is correct...hardly. Then everybody would be scientist. This rather falls under the realm of common sense.
Also the mere existance of the Loch Ness Monster is not a scientific one, because no inference is included. Loch Ness is interesting because it has wide ranging implications for our understanding of evolutionary history - it would falsify it. I admit that seeing loch ness would be startling, but the mere experience is not science in and of itself. Chosing an example with these HIDDEN implications is a very good retorical ploy however.

All the best
Trob :D

Evan
2005-Feb-14, 06:43 AM
Dang, I will consider my replies tomorrow when I have had some sleep. Excellent postulates and they will take some careful consideration.

However, I still stand by my description of the scientific method.

Weird Dave
2005-Feb-14, 11:43 AM
Contrast this with other situations. It would only take one good experiment, and repeats of that experiment, to prove (or at least, establish beyond reasonable doubt) that general relativity is not accurate in at least some situations, and is thus incomplete. General relativity would not be "disproved": because it gives accurate predictions in many situations, it is still a useful theory. However, we would know that there is a better theory that is more accurate, and better describes the Universe.

Would the anamolous acceleration of the Pioneer probes be be a good example of this, or would that be classified as something different?

I'm no expert on the Pioneer anomaly, but from what I've heard I don't think it can count as a "good" experiment. The probes were never intended to test gravity, not even right at the bottom of the mission statement, so the designers would not have taken this into account. They might, for instance, have allowed a tiny gas leak out of the rear of the probe [no sniggering at the back of the class, please 8-[ ] that would not affect its normal operation but might cause the anomaly. I would class the observation of the Pioneer anomaly as the preliminary stage of a good experiment - a sign that something interesting may be going on, but not good evidence on its own.

If a spacecraft purpose-built for investigating gravity at large distances made a similar observation, then that would be good evidence. Generally, we would then want to repeat the experiment with yet another probe, designed and built by a different team. However, in this case we might not do that immediately. The results from the first probe would be used to formulate a new theory of gravity, and we could then use the new equations to predict the motions of galaxies. We may find that the new theory is accurate without needing dark matter or dark energy, and this would be very good evidence for it. We would have got our repeat experiment for free. We would probably then be justified in saying that the new theory is closer to reality then general relativity. THEN we might design a new experiment to test the new theory, based on the new predictions it makes.

Weird Dave
2005-Feb-14, 11:59 AM
I'd disagree that a hypothesis can only be falsified, and never proved. Take for instance the hypothesis, "The Loch Ness Monster exists," or anything similar.

So Im a scientist because my hypothesis that the ball in the garden is correct...hardly. Then everybody would be scientist. This rather falls under the realm of common sense.

Yes, you are. Yes, they are. Yes, it does. Back in the early days, people did not know how a cannon ball flew - they assumed it went in a straight line until it was over the target, then suddenly stopped and dropped down vertically. Anyone who watched two children throw a ball to each other, and noticed that the path was curved and symmetrical, was effectively a scientist, because they were observing the world around them. This is what everyone does every day, and the fact that it is also common sense makes no difference.

In modern times, all of the easy things have been observed. We are left with the difficult things, that require complex experiments to investigate. This does not mean that a carefully made observation is not science, even though it does not follow the "formal science" method of hypothesis, experiment, result, repeat... (If someone wants to call "formal science" science, and "informal science" something else, then be my guest. Changing the words does not change my point).



Also the mere existance of the Loch Ness Monster is not a scientific one, because no inference is included. ...

I don't understand.

trob
2005-Feb-14, 04:17 PM
Since observations are about singular things and science is about general things (or universal things) a logical inference is needed to get from the specific to the general.

Thus infrerence: The relationship that holds between the premises and the conclusion of a logical argument, or the process of drawing a conclusion from premises ...

For a discussion of logical inference see:

http://www.philosophypages.com/lg/e01.htm

Regards
Trob :D

Evan
2005-Feb-14, 05:12 PM
The concept of a null hypothesis treads dangerously close to the logical fallacy of trying to prove a negative. A "null hypothesis" is an implicit corollary of the positive hypothesis and need not be stated.

You cannot prove a null result. I can with certainty say that I have never found a one pound gold nugget on my 12 acres of land. However, I cannot say it isn't there somewhere.

This is an essential concept in law. It is why it is necessary for the prosecution to prove that a crime was committed by the defendant. The defendant cannot prove innocence as it is logically impossible to do so. Yes, there are exceptions such as the case of providing an alibi but the law still requires proof of guilt, not proof of innocence.

I think Popper's work is complete ** and hard to read as well.

Weird Dave
2005-Feb-14, 06:08 PM
Since observations are about singular things and science is about general things (or universal things) a logical inference is needed to get from the specific to the general.

Thus infrerence: The relationship that holds between the premises and the conclusion of a logical argument, or the process of drawing a conclusion from premises ...

For a discussion of logical inference see:

http://www.philosophypages.com/lg/e01.htm

Regards
Trob :D

So, are you saying that science has to be general, and that the existence of a single object is therefore not a scientific truth even if it is true? If so, then we seem to be defining science in slightly different ways. I think you have a valid point - that extrapolating from individual observations to general theories is very dangerous. However sometimes, such as in astronomy, we don't have the luxury of many controlled repeated experiments. There's only one Universe, and we'll have to make do with it for the time being :) .

A Thousand Pardons
2005-Feb-14, 06:45 PM
The concept of a null hypothesis treads dangerously close to the logical fallacy of trying to prove a negative. A "null hypothesis" is an implicit corollary of the positive hypothesis and need not be stated.

Dangerously?? I don't think that "null hypothesis" means what you think it means. But I could be wrong.

A null hypothesis is a statistical term, indicating the hypothesis being tested. For instance, if your hypothesis is that the mean of a particular population is 3, but your data arrives at 4--that does not mean that your experiment necessarily supports a value of 4. If that 4 is more than two standard deviations from 3, then the null hypothesis will be rejected. But even then, there is a 1 in 20 chance that the 4 is a statistical outlier, and the actual mean really is 3.

The null hypothesis is typically rejected if there is only a 5% certainty of the given outcome (some experiments are more stringent). Rejected on the basis of that single experiment.


I think Popper's work is complete ** and hard to read as well.
that's understandable, then :)

Disinfo Agent
2005-Feb-14, 07:14 PM
I'd disagree that a hypothesis can only be falsified, and never proved. Take for instance the hypothesis, "The Loch Ness Monster exists," or anything similar. It can easily be proved by finding the Loch Ness Monster, but it is very difficult to falsify without combing every cubic inch of the loch (even then, the Monster could be hiding in a cave or something).
In fact, impossible. :)
However, I believe that Popper would not call such a conjecture a theory.


Some hypotheses, such as, "Global warming is caused by human CO2 emissions," are even trickier. There is no one experiment (or even series of experiments) that can prove or disprove this, in the absence of a large number of identical Earths to experiment on. OK, there are simulations, but their accuracy can be debated. I suspect that eventually the balance of evidence will be either for or against the hypothesis, but with no certainty of proof or disproof.
I'll give you the point about accuracy. That certainly makes judging theories fuzzier.
However, while I can imagine the evidence against CO2 emissions as a cause for global warming becoming so overwhelming that most people would discard such theory as almost certainly false, I cannot imagine most scientists ever believing that the theory has been proven true, in a logically dichotomic sense (yes/no), no matter how overwhelming the evidence gets.


Contrast this with other situations. It would only take one good experiment, and repeats of that experiment, to prove (or at least, establish beyond reasonable doubt) that general relativity is not accurate in at least some situations, and is thus incomplete. General relativity would not be "disproved": because it gives accurate predictions in many situations, it is still a useful theory.
Popper's main concern was not with whether a theory was useful, but with whether it was true and/or scientific.


Finally, some experiments aim to measure a quantity (e.g. the speed of light) rather than test a theory. In that case, the experimenters should average their results with previous experiments, weighting the different values according to their accuracy and precision. We will then get a new, more precise value for that quantity, but the old value is not "disproved". All scientific quantities come with uncertainties that estimate their accuracy, and the old value has a larger uncertainty.
Usually, that quantity is associated with some formula, so any measurement can be regarded as an indirect test of a theory.
For instance, imagine an experimant made to measure the speed of light as accurately as possible. If by any chance a value which differed considerably from 299 792. 458 km/s were obtained, then that would be a serious blow to the theory of relativity (at least in its current version).

archman
2005-Feb-15, 04:04 AM
You cannot prove a null result. I can with certainty say that I have never found a one pound gold nugget on my 12 acres of land. However, I cannot say it isn't there somewhere.

You are correct Evan, you cannot "prove" a null hypothesis. Neither can one "prove" the test hypothesis. But you can certainly disprove either of them. That's the critical purpose of the null; if it is disproved, you "support" the test hypothesis. It is the only direct method of supporting the test hypothesis. A test hypothesis seeks to show change in your dependent variable in response to the independent variable. The null hypothesis seeks to show that the dependent variable is not affected by the independent variable. That's why the null is commonly referred to as the "no effect, no change" hypothesis.

A null hypothesis is one of the basic requirements necessary in experimental design. Well, maybe not all experimental designs... I won't speak for fields I am not trained on. I'll limit my remarks to the life sciences, and statistics.

Evan
2005-Feb-15, 05:04 AM
Bah. This business of disproving a null hypothesis is merely semantics. Why is it that the "soft" sciences that rely so heavily on statistical interpretations of data seem to have a need to obscure meanings with fancy sounding but irrelevant descriptions of the process.?

As I said before the "null hypothesis" is an implicit corollary to the "alternate" hypothesis, the one you seek to prove or fail to prove. I see no reason to explicity name it (the null). Only one hypothesis is required. The data either support it or they don't. The data may support it partially which is when adjustments are in order. If the data fail utterly to support the hypothesis then there is no reason to say "the data have proven the null hypothesis". This is merely word play. Why not just say that the data have failed to prove the hypothesis?

archman
2005-Feb-15, 06:31 AM
Bah. This business of disproving a null hypothesis is merely semantics. Why is it that the "soft" sciences that rely so heavily on statistical interpretations of data seem to have a need to obscure meanings with fancy sounding but irrelevant descriptions of the process.?
If there are any statisticians on the board, they can answer this question for you much better than I can. It's not a complicated answer, but I always have trouble articulating it. I've been called "statistics-deficient" by a great many of my peers. Suffice it to say that these "fancy-sounding words" are anything but; they're defined quite specifically for direct application to quantitative experimental design. As far as I know they have no other application. They are among the most basic core components of experimental design, which is why we teach it to budding scientists as soon as possible. Here’s the short list.
1. Question (usually not configured for experimental testing)
2. Identification of Independent & Dependent Variables (puts Question into testable format)
3. Formulation of Test Hypothesis (incorporating the above variables)
4. Formulation of Null Hypothesis (no effect of independent variable on dependent variable)


As I said before the "null hypothesis" is an implicit corollary to the "alternate" hypothesis, the one you seek to prove or fail to prove. I see no reason to explicity name it (the null).
Much confusion exists on this very point. If one follows the scientific method properly, it's the test hypothesis (H1) that's the corollary of the null (Ho), not the other way around. Since the only true accepted way to support a test hypothesis is to disprove the null, obviously the null has to be looked at first. The "alternate hypothesis" is not referred to as such; it is properly termed the "test" or "experimental" hypothesis. It is the hypothesis for which you are eliciting change in your dependent variable in response to your independent variable, thus it is the hypothesis being "tested" or "experimented" on.


Only one hypothesis is required. The data either support it or they don't. The data may support it partially which is when adjustments are in order. If the data fail utterly to support the hypothesis then there is no reason to say "the data have proven the null hypothesis". This is merely word play. Why not just say that the data have failed to prove the hypothesis?
Again, in statistics you cannot support a test hypothesis without disproving a null hypothesis. There has to be demonstrated effect of the dependent variable in response to the independent variable to refute the null. A worker may skip this "step" in his head when analyzing an experiment, but the step is still very much there.

It is perfectly acceptable in journals for authors not to implicitly talk about the null hypothesis; it's presence is implied. Perhaps this is also the source of some confusion. You don't have to write a null hypothesis into the methods section of a journal article, nor do you have to discuss disproving the null to support the test hypothesis. For reviewers scrutinizing a worker's methods, they usually merely scrutinize the test hypothesis. Null hypotheses typically follow a standard pattern of form... "blah blah blah has no effect on blah blah blah". Since everyone knows that, there's little point in reiterating it in the article, in the same manner that taxonomists do not put glossaries of their terminology into articles.


Why not just say that the data have failed to prove the hypothesis?
It’s considered bad form. Use of the word “prove” is a no-no, even in reverse context (i.e. “not prove”). I don’t read astronomical journal articles, but in the life science ones you will rarely see the “p-word” used. Instead, we use words like “support” or “bolster”, which do not imply certainty. So for your above statement, it would read something like this.

The data do not support the hypothesis

Often when we say something like this, we follow it up with something like this.


There was no effect on blah blah blah in response to blah blah blah.

Which technically, supports the null hypothesis. And if our experiment makes use of statistics, it’ll look like this.


There was no statistically significant difference between blah blah blah and blah blah blah And a p-value would be inserted.

A Thousand Pardons
2005-Feb-15, 09:39 AM
As I said before the "null hypothesis" is an implicit corollary to the "alternate" hypothesis, the one you seek to prove or fail to prove. I see no reason to explicity name it (the null). Only one hypothesis is required. The data either support it or they don't. The data may support it partially which is when adjustments are in order. If the data fail utterly to support the hypothesis then there is no reason to say "the data have proven the null hypothesis". This is merely word play. Why not just say that the data have failed to prove the hypothesis?
You might be right, about the word play. Could you give an example where that's used, and we can see what they are doing specifically?

Disinfo Agent
2005-Feb-15, 12:46 PM
You are correct Evan, you cannot "prove" a null hypothesis. Neither can one "prove" the test hypothesis. But you can certainly disprove either of them. That's the critical purpose of the null; if it is disproved, you "support" the test hypothesis.
I must disagree a bit with you. Since the alternative hypothesis is often the opposite of the null hypothesis (e.g. "no change" vs. "some kind of change"), you would seem to be contradicting yourself.

Hypothesis testing cannot prove or disprove a hypothesis. What it does is allow us to assess to what extent the data is consistent with the null hypothesis. In other words, it may "support" or "counter" a hypothesis, but not prove or disprove it. (*)

(*) This being said, sometimes the data is so inconsistent with the null hypothesis that it would be unreasonable to hold on to it.

Evan
2005-Feb-15, 01:57 PM
Archman,

Definition:

Alternative hypothesis (H1)
A prediction that there is a difference between the groups of data being compared. The alternative hypothesis is often the working hypothesis, or research question, in a study.

http://www.cirem.co.uk/definitions.html

Still all just semantics to me. Actually, what you say seems to "support" that.

A Thousand Pardons
2005-Feb-15, 06:02 PM
Still all just semantics to me.
I'd like to see some specific instances where they say that they disprove, or prove, the null hypothesis. As archman says, that's not the usual way.

Evan
2005-Feb-15, 06:29 PM
Hypothesis testing is equivalent to the geometrical concept of hypothesis negation. That is, if one wishes to prove that A (the hypothesis) is true, one first assumes that it isn't true. If it is shown that this assumption is logically impossible, then the original hypothesis is proven. In the case of hypothesis testing the hypothesis may never be proven; rather, it is decided that the model of no effects is unlikely enough that the opposite hypothesis, that of real effects, must be true.

http://www.psychstat.smsu.edu/introbook/sbk18m.htm


A hypothesis, by contrast, should be capable of being proven or justified.

and


...of the hypothesis, and thus to permit assessment of whether it is confirmed or disproved.

http://www.globaltester.com/sp5/hypothesis.html

archman
2005-Feb-16, 04:50 AM
You are correct Evan, you cannot "prove" a null hypothesis. Neither can one "prove" the test hypothesis. But you can certainly disprove either of them. That's the critical purpose of the null; if it is disproved, you "support" the test hypothesis.
I must disagree a bit with you. Since the alternative hypothesis is often the opposite of the null hypothesis (e.g. "no change" vs. "some kind of change"), you would seem to be contradicting yourself.

Hypothesis testing cannot prove or disprove a hypothesis. What it does is allow us to assess to what extent the data is consistent with the null hypothesis. In other words, it may "support" or "counter" a hypothesis, but not prove or disprove it. (*)

You may have valid points disinfo, and I've had trouble thinking about this myself. So I'll just parrot the party line. You cannot prove a hypothesis, but you can certainly "reject" one. Or to be specific, you can "reject" the null only. To me, "reject" always sounded the same as disproved, and I occasionally hear statistical folks talking about "disproving the null." But whether or no your using "reject" or "disproof" as your word of choice, it is perfectly acceptable to er, refute the null. You can't support your test hypothesis without doing this. "Rejecting the null" is one of only two possible outcomes for any experiment.

Regarding the null hypothesis as an "opposite" to the test hypothesis, this is a very common mistake. In many designs it is configured this way. But in others it is not. Here's a basic example of the latter case.

H1: plant growth shows a positive effect when fertilizer is applied
Ho: plant growth is unaffected by fertilizer application

The statistics folks call test hypotheses that specify a direction, a one-tailed hypothesis. I hate these things; they're more complicated.

archman
2005-Feb-16, 05:32 AM
Hypothesis testing is equivalent to the geometrical concept of hypothesis negation. That is, if one wishes to prove that A (the hypothesis) is true, one first assumes that it isn't true. If it is shown that this assumption is logically impossible, then the original hypothesis is proven. In the case of hypothesis testing the hypothesis may never be proven; rather, it is decided that the model of no effects is unlikely enough that the opposite hypothesis, that of real effects, must be true.

http://www.psychstat.smsu.edu/introbook/sbk18m.htm


A hypothesis, by contrast, should be capable of being proven or justified.

I would have to say this statement would have a lot of experimenters up in arms. A test hypothesis is not provable, ever. It can only be supported. Here, I'll refer a quote from one of your own links Evan, that reflects the general consensus.


The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We either "Reject H0 in favour of H1" or "Do not reject H0". We never conclude "Reject H1", or even "Accept H1".

If we conclude "Do not reject H0", this does not necessarily mean that the null hypothesis is true, it only suggests that there is not sufficient evidence against H0 in favour of H1. Rejecting the null hypothesis then, suggests that the alternative hypothesis may be true.

http://www.stats.gla.ac.uk/steps/glossary/hypothesis_testing.html

As an aside, you were right about "alternate hypothesis" being a valid term. Some of my reference books list it that way also. As I said previously, I'm considered "statistics-deficient" by my peers.


and


...of the hypothesis, and thus to permit assessment of whether it is confirmed or disproved.

http://www.globaltester.com/sp5/hypothesis.html

That weblink is also not very accurate, although it does state in its disclaimer that it's just trying to throw out a "general idea" of concepts. They should proofread their material more thoroughly, however. In the cited quote snippet, I'm assuming that by "confirming" the hypothesis, they're showing it to be proved. That's wrong, you can't do that. As for disproving (rejecting) the test hypotheis, I think I stated previously somewhere that some folks do it, but it's preferred that any mention of "rejection" only be given to the null hypothesis as it applies. If the test hypotheis isn't supported, we say exactly that... it's "not supported".

Here's a link that's better, at least regarding what can and cannot be proven. I don't like the way it refers to the null as an opposite of the test hypothesis though. Won't work with one-tailed tests.
http://www.staff.city.ac.uk/r.j.gerrard/courses/d101/d101_09.htm

I can very much see now what you mean by semantics and confusing wordplay! Don't blame me, I'm just the messenger. 8-[

Evan
2005-Feb-16, 07:01 AM
Archman,

I am fully aware of why you don't like the terms "prove" or "disprove". I don't either. Most scientists never like to speak in terms of absolutes, especially statisticians.

Although, I did find one site that spoke of disproving the null hypothesis. The explanation given was the one that made the most sense. Simply put, you can never prove the test hypothesis as the next experiment you do may contradict it. But, it only takes one experiment to disprove utterly the null hypothesis. Using the example you gave if we apply too much fertilizer the plants immediately die. In that case both the H0 and the H1 are conclusively disproven. Also in that case there is no need for the H0.

archman
2005-Feb-16, 07:52 AM
I did find one site that spoke of disproving the null hypothesis. The explanation given was the one that made the most sense. Simply put, you can never prove the test hypothesis as the next experiment you do may contradict it. But, it only takes one experiment to disprove utterly the null hypothesis. Using the example you gave if we apply too much fertilizer the plants immediately die. In that case both the H0 and the H1 are conclusively disproven. Also in that case there is no need for the H0.

Wait wait! Slow down and carefully scrutinize what the null is stating.

Ho: plant growth is unaffected by fertilizer application
If the plants die due to fertilizer input, you're showing a change in the dependent variable (growth) in response to the independent variable (fertilizer application), and thus you have to reject the null hypothesis. The fertilizer still has an effect on the plant, it's just not (ha ha) a positive effect!

I'm not on so-solid footing here, but typically a one-tailed (or directional) hypothesis is only attempted after passing two-tailed criteria. So with the fertilizer plant example... *thinking*


H1: plant growth is affected by fertilizer application
Ho: plant growth is unaffected by fertilizer application

So if the plants die (ugh!), that would reject (disprove, refute, kill, axe, broil) the null, and support the test (alternate) hypothesis.
Now I'll make a one-tailed test hypothesis.


H1: plant growth is negatively affected by fertilizer application.
Ho: plant growth is unaffected by fertilizer application
or maybe this is the null. I forget how these things are set up.


Ho: plant growth is not negatively affected by fertilizer application

Either way, you still end up refuting the null hypothesis, and that'll support the test hypothesis. Ain't statistics grand!

Evan
2005-Feb-16, 08:02 AM
Semantics again. I took positive to mean desirable, as opposed to negative meaning undesirable.

archman
2005-Feb-16, 08:33 AM
Semantics again. I took positive to mean desirable, as opposed to negative meaning undesirable.

?? You said that...

if we apply too much fertilizer the plants die. In that case both the Ho and H1 are conclusively disproven

That's an error in interpreting your results. It's not a question of desirable, undesirable, positive, or negative... it's a question of affecting a change in your dependent variable in response to your independent one. Your null states that fertilizer has no effect on plant growth. The experiment shows it does have an effect (death). Therefore the null is rejected, but the test hypothes is not.

Designed properly, an experiment shouldn't reject (or accept) both the null and the test hypothesis at the same time. From my limited knowledge, such things are supposed to be impossible.

Disinfo Agent
2005-Feb-16, 12:22 PM
You may have valid points disinfo, and I've had trouble thinking about this myself. So I'll just parrot the party line. You cannot prove a hypothesis, but you can certainly "reject" one. Or to be specific, you can "reject" the null only. To me, "reject" always sounded the same as disproved, and I occasionally hear statistical folks talking about "disproving the null." But whether or no your using "reject" or "disproof" as your word of choice, it is perfectly acceptable to er, refute the null. You can't support your test hypothesis without doing this. "Rejecting the null" is one of only two possible outcomes for any experiment.
To me, "refute" is a synonym of "disprove".
The standard term in hypothesis testing is neither "refute" nor "disprove", but "reject" (as you said yourself). This term was carefully chosen by statisticians, to make it clear that "rejecting" a hypothesis is a decision of the researcher, not a state of nature. Nature isn't necessarily in agreement with our decision, and that's made explicit when the concepts of Type I Error and Type II Error are discussed.


Regarding the null hypothesis as an "opposite" to the test hypothesis, this is a very common mistake.
What is? I never said that the alternative hypothesis was always the opposite of the null hypothesis!


In many designs it is configured this way. But in others it is not. Here's a basic example of the latter case.

H1: plant growth shows a positive effect when fertilizer is applied
Ho: plant growth is unaffected by fertilizer application

The statistics folks call test hypotheses that specify a direction, a one-tailed hypothesis. I hate these things; they're more complicated.
I don't think they are that much more complicated than two-sided alternatives. And they do often make more sense in practice, like in the example you gave. If some substance is a supected "fertiliser", then in the worst possible case it should have no effect at all on plant growth. I think you'll agree that it wouldn't make much sense for a candidate to fertiliser to actually reduce plant growth. (A substance that kills all plants in an experiment is no longer acting as a fertiliser. It's a herbicide.)

Moose
2005-Feb-16, 01:45 PM
Yes, you are. Yes, they are. Yes, it does. Back in the early days, people did not know how a cannon ball flew - they assumed it went in a straight line until it was over the target, then suddenly stopped and dropped down vertically.

*nearly snarfs his drink* Whaaaat?

I'm afraid I'm going to have to ask for a cite for this.

Bows and crossbows were in common use long before the first european guns were made. So were rock throwing siege devices. I don't see why gunners would ever have thought cannon shot behaved any differently from arrows and stones. Not beyond their first few shots, anyway. Simply trying for distance would have disabused them of the "straight line" notion, assuming they'd ever had it in the first place.

trob
2005-Feb-16, 07:24 PM
In fact Aristotle was pig headed enough to make this claim because theoretical consistency forced him to. Aristotle would rather have an empirically faulty notion of projectile movement (or Violent motion as he called it) that followed from his intellectually satisfying notion of natural motion (where things sought their natural place in the universe - a notion that followed from teleological reasoning) rather than modifying the theory of violent motion. Contemporaries to Aristotle pointed out the weakness of the projectile motion theory and it was widely discused in the Middleages before Galileo. A good discussion of this can be found at stanford encyclopedia of philosophy: http://plato.stanford.edu/entries/causation-medieval/


Correspondingly, both Buridan and Oresme are sceptical, not only about Aristotle's theory of projectile motion, but also about the related notions of natural place, motion, and rest ...Oresme and Buridan have, on these grounds, been described as "precursors of Galileo".

Thus the claim made above is thus not correct.
It is also worth remembering that the texts of Aristotle were not available (and generally unknown) in Europe till quite late (1200s) - these were reimported via Spain from the Middleeast (The philosophy being done there was far more advanced at that time by the way...). Thus your average intellectual monk before this was not brain washed to admit of such absurdities. This is why the theory of projectile motion from Aristotle was a bit of an embarrassment in those days .... :oops:
Aristotle was fashionable till ca 1613...that gives us at most 400 years in which that theory of motion was constantly under attack....

All the best
Trob :D

A Thousand Pardons
2005-Feb-16, 09:44 PM
Hypothesis testing is equivalent to the geometrical concept of hypothesis negation. That is, if one wishes to prove that A (the hypothesis) is true, one first assumes that it isn't true. If it is shown that this assumption is logically impossible, then the original hypothesis is proven. In the case of hypothesis testing the hypothesis may never be proven; rather, it is decided that the model of no effects is unlikely enough that the opposite hypothesis, that of real effects, must be true.

http://www.psychstat.smsu.edu/introbook/sbk18m.htm


A hypothesis, by contrast, should be capable of being proven or justified.

I would have to say this statement would have a lot of experimenters up in arms. A test hypothesis is not provable, ever. It can only be supported. Here, I'll refer a quote from one of your own links Evan, that reflects the general consensus.
Or use that same link: "Note that, unlike geometry, we cannot prove the effects are real, rather we may decide the effects are real." The instances where Evan has bolded the word "proven" are being used in the mathematical sense, apparently, where hypotheses can be proven--the second instance there even says "proven, or justified".

Evan
2005-Feb-16, 09:51 PM
If you do a Google for "hypothesis disproven" or "disproved" there are many hits. In particular the null hypothesis since any confirmation of the H1 conclusively disproves the null.

Disinfo Agent
2005-Feb-16, 10:09 PM
...of the hypothesis, and thus to permit assessment of whether it is confirmed or disproved.
http://www.globaltester.com/sp5/hypothesis.html
From what I can tell, the author of that text is using the word "proven" in a broad sense. He is not talking about the mathematical (yes/no) kind of proof. Earlier, he writes "proven, or justified". In my opinion, this means that, to him "proven" means the same as "justified", "in the light of experimental data", to use his own phrase.


A heuristic is something that is usually a speculative formulation serving as a guide in the investigation or solution of a problem. An important point is that a heuristic assumption is one that is useful in providing direction in the solution of a problem but that is not independently justifiable or provable. A hypothesis, by contrast, should be capable of being proven or justified. A hypothesis, in order to be meaningful, has to be given operational specificity. It has to be spelled out in such a way as to permit testing, by observing certain characteristics that are produced independently of the hypothesis, and thus to permit assessment of whether it is confirmed or disproved. Heuristics, unlike hypotheses, are not evaluated by whether they are proved or disproved, but rather by their fecundity.

[...]

With what has been said so far, we can see that the purpose of hypothesis testing is to test the viability of the null hypothesis in the light of experimental data. Depending on the data, the null hypothesis either will or will not be rejected as a viable possibility. [...] We can also see that hypothesis testing is a method of inferential statistics. An experimenter starts with a hypothesis about a population parameter (such as the number of defects present, for example) called the null hypothesis. Data are then collected and the viability of the null hypothesis is determined in light of the data. If the data are very different from what would be expected under the assumption that the null hypothesis is true, then the null hypothesis is rejected. If the data are not greatly at variance with what would be expected under the assumption that the null hypothesis is true, then the null hypothesis is not rejected. Although, as said before, failure to reject the null hypothesis is not the same thing as accepting the null hypothesis.

If you do a Google for "hypothesis disproven" or "disproved" there are many hits. In particular the null hypothesis since any confirmation of the H1 conclusively disproves the null.
And how would you confirm an alternative hypothesis H1?

Edited.

Evan
2005-Feb-16, 10:58 PM
Experimental evidence.

archman
2005-Feb-17, 05:19 AM
If you do a Google for "hypothesis disproven" or "disproved" there are many hits. In particular the null hypothesis since any confirmation of the H1 conclusively disproves the null.

Man Evan, you really have it out for the null hypothesis! And watch out for many of these googled weblinks... there's a ton of them online that say screwy things. I've been filtering through them for days... you're definitely correct that many of them say you can disprove the test hypothesis. Many also state that Ho is the opposite of H1, and that you can prove H1. I would place little trust in most weblinks out there.

You keep sticking to your guns about supporting the test hypothesis directly. That's fair. It's not permitted in statistics, however. The only way to support (or confirm) the test hypothesis is to show a lack of support for the null hypothesis. There isn't another option. Experimental evidence is collected specifically to test the null... to show an effect on the dependent variable as a result of the independent variable. If experimental evidence shows such an effect, the null is rejected.

Any other experimental evidence can only be used anecdotally, unless designed into complementary test hypotheses. This is done quite frequently, many of my research topics are so worded.

There may be methods to accept the test hypothesis without referring to the null, but they're not methods I'm familiar with.

Maddad
2005-Feb-17, 06:16 AM
Evan
I'm commenting on your science experiment post. I liked the idea very much, by the way. You've got a great dad.

In your post you asked how your experiment could have been falsifiable. You could have made the premise of the experiment that gravity, and not centrifugal force, controls the direction that plants grow. That premise was falsifiable because your plants would not have grown straight out of the test tubes had it been correct. Because they did grow straight out, you disproved the hypothesis.

You still collect all your data; you just think about your approach slightly differently. With that change of perspective you disprove one possible explanation.


Popper's eminently logical solution to the problem through falsification has a price, namly that we must wave bye bye tyo positive certainty.That bothered my son quite a bit, and me none at all *L*


I'd disagree that a hypothesis can only be falsified, and never proved. Take for instance the hypothesis, "The Loch Ness Monster exists," or anything similar. It can easily be proved by finding the Loch Ness Monster, but it is very difficult to falsify without combing every cubic inch of the lochSo change the hypothesis to a negative: Nessie doesn't exist, along with her entire branch of critters. Now any sighting falsifies the hypothesis. If you don't find her, then you're still where we all are right now. We don't know for sure, and that's reality.


Some hypotheses, such as, "Global warming is caused by human CO2 emissions," are even trickier. There is no one experiment (or even series of experiments) that can prove or disprove thisChange your hypothesis to say that only industrial greenhouse gasses causes global warming. Then look for evidence in the record going back 800,000 years. When you find cycles of global warming when there was no industry, you disprove the hypothesis that only industrial greenhouse gasses cause it.

The object is to word your hypothesis so that results are able to eliminate one possible explanation. (I'm still thinking about using archman's idea of testing for a null result.) If you succeed then you have narrowed the pool of possible answers to a question.


I will strictly limit my remarks to what I used to teach freshman biology students, taken from current college biology texts and lesson plans.Strange. The best description I ever saw of the scientific method was on page nine of my biology textbook. If I hadn't borrowed it and then returned it after the semester was over, I'd like to scan that page and part of the next.

I like your null result approach to experimentation. I'm still thinking about how to apply it so that I'll remember it in a week or ten years.

Maddad
2005-Feb-17, 06:53 AM
You cannot prove a null result.That is why we try to disprove it. We accept that we cannot prove the H1 result, but we can disprove the alternative, making H1 more likely.


I think Popper's work is complete ** and hard to read as well.I think you didn't try to understand it.


This business of disproving a null hypothesis is merely semantics.It's logic.


if we apply too much fertilizer the plants immediately die. In that case both the H0 and the H1 are conclusively disproven.Incorrect. We disproved H0 because plant growth was affected by fertilizer application. We however did not disprove H1 because we failed to account for the possibility of too much fertilizer, so we still have two possible explanations; H1 is neither proved nor disproved.


Hypothesis testing cannot prove or disprove a hypothesis.The entire point of hypothesis testing is to disprove a hypothesis. You've got nothing without it.


*nearly snarfs his drink* Whaaaat?*BOL* I love it!

Disinfo Agent
2005-Feb-17, 12:24 PM
Experimental evidence.

The entire point of hypothesis testing is to disprove a hypothesis. You've got nothing without it.
Could each of you give me his own definition of "proving a hypothesis", please? (I will assume that "disproving a hypothesis" is proving the opposite of that hypothesis.)

Evan
2005-Feb-17, 05:29 PM
Although the word "proven" seems to be often used (including me) scientists are pretty reluctant to say anything is proven. There are exceptions such as "it is proven the Earth orbits the Sun".

Consider the hypothesis that "A green plant should respond differently to different wavelengths of light".

I then proceed to test this. We build on the physical principle that shows that in general colored materials absorb certain wavelengths and reflect others.

The experimental setup is a large beaker of water with a glass funnel inverted over a carefully weighed mass of Elodea canadensis, a water plant. Temperature is controlled with an aquarium water heater. All air is drawn out of the funnel with a glass 10cc syringe at the top and the syringe sealed in place with wax. Light is provided with a full spectrum photoflood with various colored gel filters of equal density ranging across the spectrum.

The plant mass is allowed to photosynthesize for 24 hours and then the accumulated oxygen in the top of the funnel is drawn off with the syringe and measured. Repeat several times for each color and take the average.

Re-weigh the plants and do it again with a different color.

When finished graph the amount of oxygen produced with each color to see if different colors made any difference to photosynthesis.

If the graph isn't flat then we have experimental evidence supporting the hypothesis. If it is flat then we don't.

BTW, I did this experiment when I was 12 so don't rag on me about possible confounding variables. I designed the experiment myself.

trob
2005-Feb-17, 06:06 PM
In the earlier thread on falsificationism I made the following post, which nobody commented at all...It was ignored eventhough it was of the utmost importance - namely the fact that Popper had extreeme doubts about falisficationism. I REALLY hope you have some comments.....

Best Regards
Trob :D


Hi' everybody
You're really mixing it up with this Popper thing, but you seem to be getting nowhere. I think this is because your discussion is bounded by the parameters of your initial question, thus forcing you to concentrate on falsification as a methodology in itself, which will get you no where. Falsification is not the be all and end all of Popper.
As Popper himself writes in "Conjectural Knowledge": "Once I had solved the problem of induction, and realized its close connection with the problem of demarcation, interesting new problems and new sollutions arose in rapid succession. First of all I soon realized that the problem of demarcation and my sollution, as stated above, were a bit formal and unrealistic: empirical refutations could always be avoided. It was always possible to 'immunize' any theory against criticism. (This excellent expression which I think, should replace my terms 'conventionalist strategem' and 'conventionalist twist' is due to Hans Albert.) Thus I was led to the idea of methodological rules and the fundamental importance of a critical approach; that is, of an approach which avoided the policy of immunizing our theories against refutation. At the same time, I also realized the opposite: the value of a dogmatic attitude: somebody had to defend a theory against criticism, or it would succumb too easily...This led me to the view that all languages are theory impregnated; which meant, of course, a radical revision of empiricism." [Popper's cursive, my underlining] (See Objective Knowledge - revised edition. Clarendon Press, Oxford. 1979. Pages 30-31.)

First of all you will notice that Popper is clearly getting dangerously close to espousing a Kuhnian position, but rather than giving his ideas an anti-realist, relativistic twist he actually says that without a critical rationalist foundation (this also includes critical realism and correspondence truth) falsificationism is useless because of the conventionalist problem, which Popper calls imunization of theories. (an example of conventionalism can be found in my example of Euclidean Vs. non-Euclidean geometry). Thus, I do not believe that a sollution to your problems can be found by discussing falsificationism alone. In fact Popper even discards the classical notion of empiricism as espoused by many in this forum - curiously also those claiming Popper as their main man. The question you ought to ask yourselves is not whether this or that theory is falisifable, but how this or that theory fares in regard to the parameters laid out by critical rationalism. If it is a Popperian sollution you want you must begin to involve the criteria of critical rationalism also.


All the best

Disinfo Agent
2005-Feb-17, 08:05 PM
That is a relevant contribution, trob, and I think Weird Dave made a good point about accuracy that was related to what you're saying, but at the moment I think the conversation has focused on the matter of statistical hypothesis testing. :)

Maddad
2005-Feb-17, 08:35 PM
Could each of you give me his own definition of "proving a hypothesis", please?The point uncovered in this thread is that science is not about proving a hypothesis or theory.


I was led to the idea of methodological rules and the fundamental importance of a critical approach; that is, of an approach which avoided the policy of immunizing our theories against refutation. At the same time, I also realized the opposite: the value of a dogmatic attitude: somebody had to defend a theory against criticism, or it would succumb too easilyI feel mistrustful of the Big Bang theory. I know that there's a ton of evidence for it, and still . . . I wonder sometimes if theories that the BB replaced were dismissed too easily, if we rejected them completely instead of looking for modifications. The steady state is one such. The CBR is cited as evidence for the BB, and yet I wonder if we have exhausted alternative explanations for what this CBR might be. I do not feel comfortable saying that the the CBR is solid evidence for the BB because I have not seen serious attempts to explain its existence with any other mechanism. A failure to succeed in that would let me relax on my misgivings.

Disinfo Agent
2005-Feb-17, 08:40 PM
Could each of you give me his own definition of "proving a hypothesis", please?
The point uncovered in this thread is that science is not about proving a hypothesis or theory.
You did not answer my question. Why not?



I was led to the idea of methodological rules and the fundamental importance of a critical approach; that is, of an approach which avoided the policy of immunizing our theories against refutation. At the same time, I also realized the opposite: the value of a dogmatic attitude: somebody had to defend a theory against criticism, or it would succumb too easily
That's not trob on Popper, I believe. It's Popper quoted by trob.

trob
2005-Feb-17, 08:58 PM
The problem for Popper was turning all his ideas into practical science, which he did by combining statistics, Tarski's formal logic (consequence classes) and the notion of empirical content. In Objective Knowledge Popper has a chapter called Two Faces of Common Sense where he "solves" the issues. This resulted in the theory of Verisimilitude, or the theory of truth and falsity content, which solves exactly the problems that you are discussing...as well as they can be solved with the inherent weaknesses of the whole approach. The method is the application of the formal theory of probability, as he writes on page 50. I suggest reading this chapter...good stuff, but rather dense to read (But not as bad as Tarski).

All the best
Trob :D

trob
2005-Feb-17, 09:01 PM
That's not trob on Popper, I believe. It's Popper quoted by trob.

Exactly...do not put the words of greatness into my mouth...I'm not worthy LOL :D

A Thousand Pardons
2005-Feb-18, 10:12 AM
If you do a Google for "hypothesis disproven" or "disproved" there are many hits.
In mathematics, you can disprove a hypothesis.


In particular the null hypothesis since any confirmation of the H1 conclusively disproves the null.
"disproves the null"? Isn't that an instance of what you were objecting to?

trob
2005-Feb-18, 10:33 AM
In mathematics, you can disprove a hypothesis.

Even mathematics has its limits, as Penrose has shown using Gödels theorem: http://www.fortunecity.com/emachines/e11/86/loglimit.html

trob

Evan
2005-Feb-18, 07:30 PM
I am not objecting to the null hypothesis as a concept. I don't see the need for it.

In logic to describe inversion you say "If A then not B". The corollary is "If not A then B" which is implicit in the first statement. I see no need to make it explicit.

Disinfo Agent
2005-Feb-18, 08:03 PM
I am not objecting to the null hypothesis as a concept. I don't see the need for it.
The null hypothesis is usually one of "no change", or "greatest simplicity". Wouldn't you agree that, by Occam's Razor, that should be the default attitude of a responsible researcher?

To use the example that others gave before, if you're researching whether some substance can be used as a fertiliser, but you still don't know much about it, what is the most prudent hypothesis to assume:

A) That the substance does stimulate plant groth.

or

B) That the substance has no effect whasoever on plant growth?

As another example, allow me to point out that when you chose the hypothesis to be tested in the experiment you made when you were younger (described earlier), about whether plants were able to distinguish between gravity and centrifugal force, the hypothesis you selected was that "plants cannot tell the difference between the two". This is simpler than assuming that they can tell the diference, because then you'd have to explain how well they were able to tell the two effects apart, by which mechanism, and how much the effect of gravity on plant growth differed from the effect of centrifugal force.


You cannot prove a null result. I can with certainty say that I have never found a one pound gold nugget on my 12 acres of land. However, I cannot say it isn't there somewhere.

This is an essential concept in law. It is why it is necessary for the prosecution to prove that a crime was committed by the defendant. The defendant cannot prove innocence as it is logically impossible to do so. Yes, there are exceptions such as the case of providing an alibi but the law still requires proof of guilt, not proof of innocence.
In the language of statistical hypothesis testing, it's up to the prosecution to prove that a crime was committed by the defendant because the null hypothesis is that it wasn't ("innocent until proven guilty"). :)

However, the prosecution naturally has its eyes on the alternative hypothesis that the defendant is guilty.

And, to tie this in with Popper's ideas, the trial can be seen as an attempt by the prosecution to falsify the hypothesis of innocence.


In logic to describe inversion you say "If A then not B". The corollary is "If not A then B" which is implicit in the first statement. I see no need to make it explicit.
But the definition of "proof" you gave above is not the one used in logic and mathematics.

Edited.

A Thousand Pardons
2005-Feb-18, 08:52 PM
Even mathematics has its limits, as Penrose has shown using Gödels theorem: http://www.fortunecity.com/emachines/e11/86/loglimit.html

Gödel himself showed that mathematics has its limits. From that link: "But various theorists, notably the mathematical physicist Roger Penrose of the University of Oxford, have argued that human cognitive activity is not based on any known deductive rules and is thus not subject to Godelian limits."

I am not objecting to the null hypothesis as a concept. I don't see the need for it.

In logic to describe inversion you say "If A then not B". The corollary is "If not A then B" which is implicit in the first statement. I see no need to make it explicit.
But saying that A is the null hypothesis is much different than saying that B is the null hypothesis.

When one tests the null hypothesis, the results must not just favor the other hypothesis, they must greatly favor the other hypothesis.

But, that raises an important question--which may be the question that you are actually concerned with--which is, how do you determine the null hypothesis?

A notorious (and still unresolved, in my mind) example is the viscosity of the earth's upper mantle (http://www.badastronomy.com/phpBB/viewtopic.php?p=15082&highlight=viscosity+mantle#1 5082). Early determinations arrived at one value, but later attempts using it as the null hypothesis were actually favoring a much higher value but it was not possible to reject the lower value!

trob
2005-Feb-18, 10:05 PM
In mathematics, you can disprove a hypothesis

If we follow Penroses argument from Shadows of the Mind (Vintage 1995) then mathematical proof is ultimately based upon some mystical neo-platonic insight comparable to Plotinus' emanation theory (page 414): the mind emanates from physics, which emanates from mathematics, which again emanates from the mind in one circular mound of nonsense. This concept varies significantly from proofs and disproofs if you ask me. In fact I agree with Penroses diagnosis (using Gödel) but not his sollution.

I actually saw him lecture at my university - very entertaining and self assured.

All the best, I must retire - it be late LOL

Trob :D

edited

Disinfo Agent
2005-Feb-19, 02:59 PM
In any case, a mathematical proof -- whether questionable or not -- is something different from a proof in the natural sciences.

trob
2005-Feb-19, 03:26 PM
I agree totally....and it is also subject to a degree of certainty that is not the case for natural sciences.

best regards
Trob :D

Look who's become "Bad Apprentice" yeah... =D>

Disinfo Agent
2005-Feb-19, 04:08 PM
Congratulations. :D =D>

trob
2005-Feb-19, 06:27 PM
look out disinfo agent...I'm only 777 posts behind you. (only - yeah right) :lol:
Have you thought about collecting them all for publication? :D

...back to the subject at hand.

Obviously Popper thougt that the concepts of null hypothesis and H1 were subject to the concept of theory landennes and the possibility of immunization. That is - any hypothesis can always be defended from any empirical onslaught. It is rather the responsibility of critical rationalism (and NOT common sense) in conjunction with empirical data to see to it that this does not happen (i.e. irrational defence ad infinitum).
Should we not be discussing such a perspective instead of falsificationism alone, which Popper had his doubts about anyway.

All the Best
Trob :D

edited for gibberish

Disinfo Agent
2005-Feb-19, 07:35 PM
I must admit that I am not well acquainted with the extensions of Popper's ideas that you mentioned above (http://www.badastronomy.com/phpBB/viewtopic.php?p=418482#418482), so I don't think I can discuss them, personally. :oops:
Still, although his earlier formulation of the problem of induction may be a bit simplified and not an entirely realistic view of the scientific method, the general attitude of skepticism towards scientific theories that he promotes, and his contention that absolute truth (and absolute proof) can never be attained through empirical evidence, sound correct to me.

trob
2005-Feb-19, 08:42 PM
When I get the time I will try to explain his argument, but it will not be for a couple of days I am afraid. For now I suggest the following article from the Stanford Encyclopedia of Philosophy: http://www.science.uva.nl/~seop/entries/popper/#Crit



Popper's final position is that he acknowledges that it is impossible to discriminate science from non-science on the basis of the falsifiability of the scientific statements alone; he recognizes that scientific theories are predictive, and consequently prohibitive, only when taken in conjunction with auxiliary hypotheses, and he also recognizes that readjustment or modification of the latter is an integral part of scientific practice. Hence his final concern is to outline conditions which indicate when such modification is genuinely scientific, and when it is merely ad hoc. This is itself clearly a major alteration in his position, and arguably represents a substantial retraction on his part... On the other hand, the shift in Popper's own basic position is taken by some critics as an indicator that falsificationism, for all its apparent merits, fares no better in the final analysis than verificationism.

Have a nice weekend
Trob :D

archman
2005-Feb-19, 09:13 PM
You guys are all talking over my head now. Theoretical discussions on experimental design are way past me. Basic statistics is hard enough on us poor soft scientists!

bacterium-in-spaceship
2005-Feb-20, 10:06 PM
The null hypothesis is typically rejected if there is only a 5% certainty of the given outcome (some experiments are more stringent).

This is a common error. The null hypothesis is typically rejected if, given that the null hypothesis were true, there would be at most a 5% probability for the test result to be at least as extreme as the given outcome.

Basically, for a long time statistics has been divided into three sects:

1) Neyman-Pearsonist frequentism
2) Fisherism
3) Bayesianism

In schools and most universities, the version of statistics taught to students is an incoherent mix of 1) and 2); this is also what people on this thread have meant by "statistics". 3) is the only sane version, though, for reasons explained e.g. here (http://omega.albany.edu:8008/JaynesBook.html).

Anyway, it turns out that you can both confirm and disconfirm theories, though in practice you can't either verify or falsify them with absolute certainty. Every time you disconfirm one theory, you confirm its rivals. Popper had a point in that you can usually get much closer to falsifying theories than to verifying them, and in that theories that leave themselves very vulnerable to disconfirmation can be confirmed much more strongly than those that don't; but Popper wasn't nearly the last word in philosophy-of-science, and physicists especially tend to worship him too much.

archman
2005-Feb-21, 01:38 AM
Basically, for a long time statistics has been divided into three sects:

1) Neyman-Pearsonist frequentism
2) Fisherism
3) Bayesianism

In schools and most universities, the version of statistics taught to students is an incoherent mix of 1) and 2); this is also what people on this thread have meant by "statistics". 3) is the only sane version, though, for reasons explained e.g. here (http://omega.albany.edu:8008/JaynesBook.html).
So what you're saying is that the statistics taught from most academic statisticians is less than sane? Well, at least that means it isn't my fault!

Is there another way to access the material from your link? The chapters are in some weird file type.

trob
2005-Feb-21, 11:59 AM
In schools and most universities, the version of statistics taught to students is an incoherent mix of 1) and 2); this is also what people on this thread have meant by "statistics". 3) is the only sane version, though,

Well I learnt statistics through, among other books, Complete Busines Statistics by Aczel, which I thought was a rather good book - and it did not try to avoid Bayesian statistics. So, I'm not sure your postulate of ecclecticism is valid. Secondly, your claim of the sanity of Bayesianism(and thus implied insanity of the other perspectives) has not gone unchallenged, given the inituitve elements of Bayesianism:


The general outlook of Bayesian probability, promoted by Laplace and several later authors, has been that the laws of probability apply equally to propositions of all kinds. Several attempts have been made to ground this intuitive notion in formal demonstrations. http://en.wikipedia.org/wiki/Bayesian_probability

The sollutions to the subjectivist approach of Bayesianism - also that of Jaynes - can at best be described ar controversial, as indeed the Wiki-article goes on to say:


Advocates of logical (or objective epistemic) probability, (such as Harold Jeffreys, Richard Threlkeld Cox, and Edwin Jaynes), hope to codify techniques that would enable any two persons having the same information relevant to the truth of an uncertain proposition to independently calculate the same probability. Except for simple cases the methods proposed are controversial.

Given the fact that hardly any scientific questions are simple I have grave doubts about your claim. :-?

In regard to Jaynes' use of Gödel in Cox's theorem I don't see how he can avoid intuitivism a la Penrose in discussing consistency. This is because eventhough we go outside the immediate set logical basis for mathematics and therefore statistics this would open either infinite regress or intuitivism in the end. Gödels theorem stands and the implication is the inherent limitation in our ability to know, i.e. fallibilism, as Popper and Peirce correctly claim - not intuitivism.

All the best
Trob :D

A Thousand Pardons
2005-Feb-21, 04:21 PM
The null hypothesis is typically rejected if there is only a 5% certainty of the given outcome (some experiments are more stringent).

This is a common error. The null hypothesis is typically rejected if, given that the null hypothesis were true, there would be at most a 5% probability for the test result to be at least as extreme as the given outcome.
Depends upon what you mean by outcome--most outcomes have to be expressed as intervals anyway.


3) is the only sane version, though, for reasons explained e.g. here (http://omega.albany.edu:8008/JaynesBook.html).
"sane?" Hey, there is a thread on the Drake equation (http://www.badastronomy.com/phpBB/viewtopic.php?t=19187&start=0) that might be of interest.

Disinfo Agent
2005-Feb-21, 06:20 PM
Basically, for a long time statistics has been divided into three sects:

1) Neyman-Pearsonist frequentism
2) Fisherism
3) Bayesianism

In schools and most universities, the version of statistics taught to students is an incoherent mix of 1) and 2);
I wasn't aware of this. Most introductory textbooks don't contain more than a passing reference to Fisherian inference.

And, as trob says, Bayesian methods have been increasingly applied to a variety of fields of research in recent years.

bacterium-in-spaceship
2005-Feb-21, 09:43 PM
Is there another way to access the material from your link? The chapters are in some weird file type.

You mean DJVU? I've never heard of it either, but below it on the page there are links to everything in .ps format. If it's the .ps format itself that's bothering you (probably not), google "ghostview".

bacterium-in-spaceship
2005-Feb-21, 09:54 PM
(I'll get to trob's post later, probably, but I'm skipping it for now.)


Depends upon what you mean by outcome--most outcomes have to be expressed as intervals anyway.

Not as the kind of intervals whose probability is measured by p-values. If some statistic comes out as, say, 3.4, and it's a one-tailed test, then the p-value is the probability under the null hypothesis that the result would have been in the interval (3.4, infinity); that's not really what you want to know, and it's certainly very different from a confidence interval.


"sane?" Hey, there is a thread on the Drake equation (http://www.badastronomy.com/phpBB/viewtopic.php?t=19187&start=0) that might be of interest.

I don't really see the relevance.

bacterium-in-spaceship
2005-Feb-21, 10:02 PM
I wasn't aware of this. Most introductory textbooks don't contain more than a passing reference to Fisherian inference.

p-values are purely Fisherian, and they're certainly used a lot in applied statistics. I've also read that the idea that "you can't accept the null hypothesis, you can only fail to reject it" is true in Fisherism but not in Neyman-Pearsonism. IIRC the following paper goes into the details:

http://www.isds.duke.edu/~berger/papers/02-01.html


And, as trob says, Bayesian methods have been increasingly applied to a variety of fields of research in recent years.

Right, but no one seems to bother telling the students; at most, they'll present "Bayesian methods" as yet another tool in the toolbox. If it weren't for the web, I'd never have found out that some statisticians (for very good reasons) think orthodox statistics is an incoherent mess.

bacterium-in-spaceship
2005-Feb-21, 10:46 PM
Secondly, your claim of the sanity of Bayesianism(and thus implied insanity of the other perspectives) has not gone unchallenged, given the inituitve elements of Bayesianism:


The general outlook of Bayesian probability, promoted by Laplace and several later authors, has been that the laws of probability apply equally to propositions of all kinds. Several attempts have been made to ground this intuitive notion in formal demonstrations. http://en.wikipedia.org/wiki/Bayesian_probability

Firstly, even if there were something wrong with Bayesianism, that wouldn't make the other types of statistics any more sane. Their insanities speak for themselves independently of the validity of Bayesianism. The following paper looks like a good introduction to them:

http://research.microsoft.com/~minka/papers/pathologies.html

Besides, have you ever tried to understand what a 95% confidence interval really is? I think most people imagine some sort of probability distribution for whatever the confidence interval is trying to measure, and imagine 95% of the mass of this probability distribution to be inside the confidence interval. That's not how it works, though; but contemplating this for too long isn't healthy.

Secondly, if Bayes has intuitive elements and more formal attempts at justification of these intuitions, this is only a problem to the extent that there's something wrong with the formal justifications (which seem quite strong to me).



Except for simple cases the methods proposed are controversial.

Given the fact that hardly any scientific questions are simple I have grave doubts about your claim. :-?

I don't necessarily agree with (Jaynes's version of) objective Bayes. The more complicated ideas in that direction may be controversial, but this isn't true of the methods of Bayesianism in general there; once you accept the philosophical starting assumptions, and given any (maybe controversial) choice of priors, the results follow as unambiguous mathematical theorems. No further arbitrary decisions are necessary.

The problem with subjective Bayesianism is that you have to choose a subjective prior; this subjectivity is there in all the versions of statistics, only frequentists hide it better. (One place to look is in the arbitrary choice of significance level. Just because 5% is traditional doesn't make it objective; there's no mathematical reason behind choosing 5% instead of, say, 3%.)


In regard to Jaynes' use of Gödel in Cox's theorem

I don't remember Gödel's theorem ever being important in the Jaynes's arguments; I'll probably look this up later.

Disinfo Agent
2005-Feb-22, 11:14 AM
p-values are purely Fisherian, and they're certainly used a lot in applied statistics.
Could you explain that statement in more detail?



And, as trob says, Bayesian methods have been increasingly applied to a variety of fields of research in recent years.
Right, but no one seems to bother telling the students [...]
No one bothers to tell the students what?


[...] at most, they'll present "Bayesian methods" as yet another tool in the toolbox.
I don't think there's anything wrong with that. Bayesian methods are another tool in the toolbox, just as frequentist methods are another tool in the toolbox.


If it weren't for the web, I'd never have found out that some statisticians (for very good reasons) think orthodox statistics is an incoherent mess.
Well, I learned about Bayesian inference at a university. And, even without universities, there are literally thousands of books about Bayesian inference around today...

I'll take a look at your link later.

bacterium-in-spaceship
2005-Feb-22, 05:52 PM
Could you explain that statement in more detail?

The paper I linked talks about it a bit, but not much. I think I read it somewhere else that explains it better than I can; I may look it up later.


I don't think there's anything wrong with that. Bayesian methods are another tool in the toolbox, just as frequentist methods are another tool in the toolbox.

In principle (!), every problem that can be solved with frequentist methods can be solved with Bayes. At most, frequentist methods are flawed but convenient (computationally cheap, whatever) substitutes to Bayes. This isn't how they're presented, though; Bayesian methods are usually presented as yet another ritual within the frequentist framework, instead of an alternative (and better) way of thinking about statistics.


Well, I learned about Bayesian inference at a university.

Maybe I was over-generalizing from too limited information.

trob
2005-Feb-23, 08:49 AM
bacterium-in-spaceship cited http://research.microsoft.com/~minka/papers/pathologies.html :



Since these heuristics are not consistent with Bayes' rule, they are also not consistent with the axioms of common sense from which Bayes' rule is derived.

In my experience it is rather a question of uncommon sense.... :lol: (I had to get that one off my chest :D )
Back to business: Gödels theorem was developed to show that an axiomatic basis, as a solution to the foundational questions in mathematics was - even in principle - impossible. Gödel showed this, which resulted in what are called the two incompleteness theorems: 1) In any consistent formalization of mathematics that is sufficiently strong to axiomatize the natural numbers -- that is, sufficiently strong to define the operations that collectively define the natural numbers -- one can construct a true (!) statement that can be neither proved nor disproved within that system itself. 2)No consistent system can be used to prove its own consistency.

Thus if Bayesianism is axiomatic it is subject to Gödels theorem, which means that it must either rest upon assumed and unproovable postulates or accept something external. If it claims to be its own proof, then this can be shown to be wrong.
Now if something outside the formal system is referred to, then the problem occurs again in the new system, because this is also subject to Gödel. Thus the implication is: 1)platonic intuitivism, which is the philosophical defence for axiomatic assumption (this is inherently problematic for several reasons - the most forceful being this: http://www.science.uva.nl/~seop/entries/platonism/#5 ) 2) infinite regress, which is of course unacceptable, or 3) fallibillism: absolute, total and universal truth is impossible, because even if we had it we would not be able to prove it frollowing the first theorem.

Now Gödel is obviously important for Baysianism because it points out certain limitations in Baysianism: 1)it must claim platonism which is faulty, 2) infinite regress, which is faulty or 3)accept that it is itself only partially prooved and that other sollutions must be given their place in the development of knowledge, because ultimate certainty is impossible and the partial knowledge we have is gained through the clash of ideas rather than the exclusion of ideas.

Trob :D

A Thousand Pardons
2005-Feb-23, 05:24 PM
Not as the kind of intervals whose probability is measured by p-values. If some statistic comes out as, say, 3.4, and it's a one-tailed test, then the p-value is the probability under the null hypothesis that the result would have been in the interval (3.4, infinity); that's not really what you want to know, and it's certainly very different from a confidence interval.
What is it that you really want to know?



"sane?" Hey, there is a thread on the Drake equation (http://www.badastronomy.com/phpBB/viewtopic.php?t=19187&start=0) that might be of interest.

I don't really see the relevance.
We've had lots of discussions, related to the cosmos, about the validity of Drake's equation and the statistic it produces. There's folks on both sides. Calling one side sane and the other side not sane is not productive.

If it weren't for the web, I'd never have found out that some statisticians (for very good reasons) think orthodox statistics is an incoherent mess.
I'm not sure what your other resources are, but the arguments among statisticians are well documented, not just on the web. OTOH, you can find folk on the web who claim that Einstein's work is an incoherent mess. I've looked over some of the "pathologies" that are claimed for "orthodox" statistics, and the claims don't really hold water. It's just a difference of opinion, or, worse, a lack of understanding. Rarely (never?) it's a matter of sanity.

Disinfo Agent
2005-Feb-23, 06:43 PM
p-values are purely Fisherian, and they're certainly used a lot in applied statistics. I've also read that the idea that "you can't accept the null hypothesis, you can only fail to reject it" is true in Fisherism but not in Neyman-Pearsonism. IIRC the following paper goes into the details:

http://www.isds.duke.edu/~berger/papers/02-01.html
The abstract of that article says:

Ronald Fisher advocated testing using p-values; Harold Jeffreys proposed use of objective posterior probabilities of hypotheses; and Jerzy Neyman recommended testing with fixed error probabilities. Each was quite critical of the other approaches.
This can be a bit misleading. The concept of a p-value exists in the conventional Neyman-Pearson theory of hypothesis testing, too. It just has a different interpretation than in Fisher's framework.

bacterium-in-spaceship
2005-Feb-23, 08:56 PM
This can be a bit misleading. The concept of a p-value exists in the conventional Neyman-Pearson theory of hypothesis testing, too. It just has a different interpretation than in Fisher's framework.

Are you certain? The article claims (on page 6) that "Neyman criticized p-values for violating the Frequentist Principle".

bacterium-in-spaceship
2005-Feb-23, 09:13 PM
What is it that you really want to know?

What you want to know is the degree to which the data support one hypothesis over another, as measured by the probability that each hypothesis assigns to the outcome that actually happened.


We've had lots of discussions, related to the cosmos, about the validity of Drake's equation and the statistic it produces. There's folks on both sides.

I still don't see the relevance. I know people disagree about the Drake Equation; what does that have to do with Bayesianism vs frequentism? (I know informed people disagree about that as well, but what does Drake have to do with it?)


Calling one side sane and the other side not sane is not productive.

Note that I called some kinds of statistics insane, not the people who practice or defend them. But I thought it was obvious I didn't mean that literally; just exagerrating a bit out of frustration.



I'm not sure what your other resources are, but the arguments among statisticians are well documented, not just on the web. OTOH, you can find folk on the web who claim that Einstein's work is an incoherent mess.

True, but the people who claim that about Einstein's work don't have any good arguments, and they tend not to be professional physicists. I'm not making any arguments from authority, and if I did, the authorities would be various Bayesian statisticians rather than random web cranks.

Nor am I claiming that these arguments are available only on the web; they're in various journal articles, too. It's just that the typical student doesn't read these articles, or is given any idea that they exist, or is told about their messages. But again, maybe I'm over-generalizing.


I've looked over some of the "pathologies" that are claimed for "orthodox" statistics, and the claims don't really hold water. It's just a difference of opinion, or, worse, a lack of understanding.

Well, I disagree, obviously. Could you give some examples of claimed pathologies of orthodox statistics that turn out not to hold water?

A Thousand Pardons
2005-Feb-23, 09:43 PM
What is it that you really want to know?

What you want to know is the degree to which the data support one hypothesis over another, as measured by the probability that each hypothesis assigns to the outcome that actually happened.
Often, the two hypotheses are the null hypothesis and not the null hypothesis, so it seems to me that those are both the same. When you don't have a null hypothesis, you have two competing hypotheses, and the procedures are different--you don't use a null hypothesis.

The measurement of the constant of gravity is a case in point. The values are all over the place, and the error bars for most of the measurements exclude the other experimentally determined values. Obviously, we're missing something else--that's what the statistics are telling us.

I mentioned the viscosity of the earth's mantle before. I think it is true that the tests should have been done differently, but I'm not convinced that other approaches to the problem would have helped--before the fact.



We've had lots of discussions, related to the cosmos, about the validity of Drake's equation and the statistic it produces. There's folks on both sides.

I still don't see the relevance. I know people disagree about the Drake Equation; what does that have to do with Bayesianism vs frequentism? (I know informed people disagree about that as well, but what does Drake have to do with it?)
That there are reasonable people on both sides.



Calling one side sane and the other side not sane is not productive.

Note that I called some kinds of statistics insane, not the people who practice or defend them. But I thought it was obvious I didn't mean that literally; just exagerrating a bit out of frustration.
That you meant that they were literally insane? I guess that is obvious. :) Still, that raises the question, just what did you mean?




I'm not sure what your other resources are, but the arguments among statisticians are well documented, not just on the web. OTOH, you can find folk on the web who claim that Einstein's work is an incoherent mess.

True, but the people who claim that about Einstein's work don't have any good arguments, and they tend not to be professional physicists. I'm not making any arguments from authority, and if I did, the authorities would be various Bayesian statisticians rather than random web cranks.

Nor am I claiming that these arguments are available only on the web; they're in various journal articles, too. It's just that the typical student doesn't read these articles, or is given any idea that they exist, or is told about their messages. But again, maybe I'm over-generalizing.
I would agree with that.



I've looked over some of the "pathologies" that are claimed for "orthodox" statistics, and the claims don't really hold water. It's just a difference of opinion, or, worse, a lack of understanding.

Well, I disagree, obviously. Could you give some examples of claimed pathologies of orthodox statistics that turn out not to hold water?
But you were the one claiming that there was a problem. That should be your responsibility, to back up your criticism. But I'll google a couple, and get back to this in a day or two, or sooner.

bacterium-in-spaceship
2005-Feb-23, 10:09 PM
Often, the two hypotheses are the null hypothesis and not the null hypothesis, so it seems to me that those are both the same. When you don't have a null hypothesis, you have two competing hypotheses, and the procedures are different--you don't use a null hypothesis.

I don't understand what you're saying here.

Firstly: at least in Bayesian statistics, it doesn't matter whether you call a hypothesis "the null hypothesis" or not; everything should still work the same.

Secondly: "the probability (density) that a hypothesis assigns to the actual outcome" -- which is what we want to know if we want to assess the plausibilities of different hypotheses -- is something different from "the probability that a hypothesis assigns to the set containing the actual outcome plus all outcomes that are more extreme than the actual outcome" -- which is what is measured by a p-value. Do you agree?


That there are reasonable people on both sides.

Fair enough. But sometimes reasonable people rely on weak arguments, and sometimes they simply don't know of strong arguments against their views.


Still, that raises the question, just what did you mean?

By "insane" I guess I mean "full of dubious concepts and interpretations, with no clear logical structure".


But you were the one claiming that there was a problem. That should be your responsibility, to back up your criticism.

I intended the papers I linked to do that; I think they're fairly clear. Maybe I should summarize them. I sometimes think online discussions like this are a bit pointless, when there's so much out there to read that says it all better.

Disinfo Agent
2005-Feb-24, 09:33 AM
This can be a bit misleading. The concept of a p-value exists in the conventional Neyman-Pearson theory of hypothesis testing, too. It just has a different interpretation than in Fisher's framework.
Are you certain? The article claims (on page 6) that "Neyman criticized p-values for violating the Frequentist Principle".
I am positive that p-values have a meaning in modern frequentist inference, and, from what I remember of Fisherian inference (which, I admit, isn't much), I believe I understand what they mean by a p-value in that context, and I think it's a concept similar, but ultimately different from the frequentist one.
Unfortunately, I haven't been able to read the rest of the article. :(

Maddad
2005-Feb-25, 08:31 AM
Gödels theorem was developed to show that an axiomatic basis, as a solution to the foundational questions in mathematics was - even in principle - impossible.Gödels was too smart for his own britches. ;) If science is incapable of establishing truth, then we all need to go home.

Disinfo Agent
2005-Feb-25, 10:38 AM
It depends on what you mean by "establish", doesn't it? (Which takes us back to the concept of "proof"...)
Human knowledge, of any kind, is always questionable, I think we can all agree. In philosophy we can have debates about ultimate truths, but I think science is more about making a persuasive case that one is near the truth than about showing that one has reached it. :)

trob
2005-Feb-25, 11:50 AM
:lol:
Gödels was too smart for his own britches. If science is incapable of establishing truth, then we all need to go home.

Funny, but I think you get Gödel wrong. You cannot have total, unconditional and universal truth, but you can have partial and conditional truth. Gödel forces us to be fallibalists. Not much different from the implication of Poppers falsificationism.
So you can have conditional truth, and it is these conditions that we must be vary wary of and have under constant revision so as not to end in the eternal hell of Kuhnian normal science...

All the best
Trob :D

A Thousand Pardons
2005-Feb-25, 10:10 PM
But you were the one claiming that there was a problem. That should be your responsibility, to back up your criticism.
I intended the papers I linked to do that; I think they're fairly clear. Maybe I should summarize them. I sometimes think online discussions like this are a bit pointless, when there's so much out there to read that says it all better.
I went back through the links and started with this paper (http://research.microsoft.com/~minka/papers/pathologies.html), which you linked in this post (http://www.badastronomy.com/phpBB/viewtopic.php?p=420888#420888). Minka discusses two examples of "pathologies" of orthodox statistics, and I'll concentrate on the first one, on unbiasedness.

At first, I thought you had bad luck in choosing representative examples, since it was so awful, but Minka claims it appears in Lindley's book from 1972, and Jaynes apparently commented on it in 1996.

Minka is discussing a coin which has a probability θ that that coin will produce a head when tossed. In the experiment, we toss the coin n times and get h heads as a result. Using the result, we can calculate an estimate of θ. Minka thinks it is a pathology that the unbiased estimator for θ can be either h/n or (h-1)/(n-1), depending upon how we structure the experiment. In the first case, we decide to flip the coin n times, and h is the result. In the second, we flip the coin until we get h heads, and n is the result. Minka says "Since the data set tells us both n and h, it shouldn't matter whether we assumed n or h beforehand. Asserting a proposition twice has no effect on our knowledge."

That's clearly nonsense. Just run a few million runs on a computer, and that should convince you. There's actually a third way of arriving at the data that makes it clear just how wrong Minka is: I could have deliberately placed h of those coins in a heads up position, and the rest tails (in other words, assume both n and h beforehand)--in which case the dataset tells us nothing about θ.

And that's not all. He makes a point of saying, in the second case, that if n=1, then the estimator for θ will be 1 (to avoid dividing by zero), but then he goes on to point out that if we choose h to be 1, then the estimator gives us 0. Somehow, he thinks that is pathological behavior on the part of orthodox statistics, but he doesn't seem to translate the scenario back into the practice: if we did choose h=1, then we would only flip the coin until we got a head, and then use that string of tails (and one head) to try to calculate the probability of the coin. Minka doesn't understand, or try to understand, that what he is calling a pathology of orthodox statistics is actually a pathology of the trial designer in choosing h=1--it'd be hard to think of a worse way to try to find out θ. Of course, if we're choosing h first, and n does turn out to be 1, that means that h had to have been one. In other words, we'd be flipping the coin once, getting a head, and trying to use that data to try and calculate θ.

I'll look at Minka's second pathology later.

Grey
2005-Feb-28, 06:33 PM
This has nothing to do with the thread, but since there seem to be a few statistics experts here, would any of you care to wander over to this thread (http://www.badastronomy.com/phpBB/viewtopic.php?p=425278#425278) and give me some pointers on Bayesian analysis?

bacterium-in-spaceship
2005-Feb-28, 08:58 PM
Minka is discussing a coin which has a probability θ that that coin will produce a head when tossed. In the experiment, we toss the coin n times and get h heads as a result. Using the result, we can calculate an estimate of θ. Minka thinks it is a pathology that the unbiased estimator for θ can be either h/n or (h-1)/(n-1), depending upon how we structure the experiment. In the first case, we decide to flip the coin n times, and h is the result. In the second, we flip the coin until we get h heads, and n is the result. Minka says "Since the data set tells us both n and h, it shouldn't matter whether we assumed n or h beforehand. Asserting a proposition twice has no effect on our knowledge."

That's clearly nonsense.

Heh. It seems we have a conflict of intuitions here, because it's not clearly nonsense to me.

All that happened is that someone flipped a coin N times and got heads H times. Why should our estimate of theta depend on the intent of the experimenter -- on the potential results of experiments that could have been done, but weren't?

I found a nicer exposition of pathologies in the mean time; it goes into this issue of dependence on stopping rules, among other issues. It may or may not help change your intuitions. It's chapter 37 of the following online book:

http://www.inference.phy.cam.ac.uk/mackay/itila/book.html


There's actually a third way of arriving at the data that makes it clear just how wrong Minka is: I could have deliberately placed h of those coins in a heads up position, and the rest tails (in other words, assume both n and h beforehand)--in which case the dataset tells us nothing about θ.

The difference between stopping at a predetermined N and stopping at a predetermined H is not analogous to the difference between stopping at a predetermined N and placing the coins deliberately.

The former difference is purely about events that never actually happened, but that would have happened if the experimenter carried out his plan: in both cases, your information consists of the results of N random coin flips.

The latter difference is about events that did happen. In the one case, your information consists of the results of N random coin flips, and in the other case, your information consists of how you chose to place N coins.

(I may or may not reply to other posts later; not much time now.)

Grey
2005-Feb-28, 10:25 PM
Heh. It seems we have a conflict of intuitions here, because it's not clearly nonsense to me.

All that happened is that someone flipped a coin N times and got heads H times. Why should our estimate of theta depend on the intent of the experimenter -- on the potential results of experiments that could have been done, but weren't?
It's not a question of intuition, it's just the way the math works out. It's not the intent of the experimenter, it's that doing the experiment as described introduces a selection bias. For example, if you flip the coin until you get five heads, you'll theoretically include trials like ttttthhhhh, but you'll never include hhhhhttttt. Instead, you'll record the latter as hhhhh, and thus your selection technique is more likely to include trials with a disproportionally large number of heads. If you then just use h/n, you'll determine an erroneously high probability of heads.

I was curious myself just how large an effect this would be, so I thought I'd test it out, as A Thousand Pardons suggested. I ran 100,000 trials, and for each trial I stopped when I had ten heads (simulated, of course) turn up. If I calculate using h/n, the probability for heads would appear to be 0.5253, while using (h-1)/(n-1), I find 0.5004. The former is an error of about 16 sigma, while the latter is an error of about 0.3 sigma.

bacterium-in-spaceship
2005-Feb-28, 11:32 PM
Hmmm. Now I'm confused: your argument seems convincing to me, but so does mine. I'll return to this when I've thought about it a bit more.

The simulation results aren't in dispute, at least; all are agreed that over different ensembles of experiments, the same estimator can be biased or unbiased (and will therefore tend toward different averages in the long run).

(Though note that a Bayesian wouldn't see averageing these estimates (for fixed H) as meaningful; what you should do instead is consider all the coinflips together. Some runs have more coinflips in them than others, which means they contain more information. These are also the ones that contain a lower fraction of heads.)

The other pathologies, such as those relating to confidence intervals, should be more clear-cut.

A Thousand Pardons
2005-Mar-01, 01:15 AM
Heh. It seems we have a conflict of intuitions here, because it's not clearly nonsense to me.
It's not a matter of intuition, as Grey shows. It's how things work.


All that happened is that someone flipped a coin N times and got heads H times. Why should our estimate of theta depend on the intent of the experimenter -- on the potential results of experiments that could have been done, but weren't?
Do you believe it now?


I found a nicer exposition of pathologies in the mean time; it goes into this issue of dependence on stopping rules, among other issues. It may or may not help change your intuitions. It's chapter 37 of the following online book:
I try not to rely upon intuition. That's where Minka went wrong.


There's actually a third way of arriving at the data that makes it clear just how wrong Minka is: I could have deliberately placed h of those coins in a heads up position, and the rest tails (in other words, assume both n and h beforehand)--in which case the dataset tells us nothing about θ.

The difference between stopping at a predetermined N and stopping at a predetermined H is not analogous to the difference between stopping at a predetermined N and placing the coins deliberately.
Sure it is. It's just that the effect of observer bias is more obvious.



The simulation results aren't in dispute, at least; all are agreed that over different ensembles of experiments, the same estimator can be biased or unbiased (and will therefore tend toward different averages in the long run).
In Minka's paper, he points out that both estimators are unbiased, but they are clearly not the same estimator. That's why he called it a pathology. As Grey showed, either one can be appropriate. I'll have to go back through the literature and see if Lindley did too, and why. And Jaynes!


(Though note that a Bayesian wouldn't see averageing these estimates (for fixed H) as meaningful; what you should do instead is consider all the coinflips together. Some runs have more coinflips in them than others, which means they contain more information. These are also the ones that contain a lower fraction of heads.)
That's nonsense again. We're talking about the worth of the two estimators--not whether or not we know the value of theta. It's a different focus--and Minka is the one who established the focus.

Grey
2005-Mar-01, 01:48 AM
Hmmm. Now I'm confused: your argument seems convincing to me, but so does mine. I'll return to this when I've thought about it a bit more.
For this particular instance, I think the key is to realize that if you just decide to run a certain number of trials, you'll get a random sampling. But if you base when you stop on the results of those trials, you're probably introducing a selection bias, and not getting a truly random sample of the possible outcomes.

bacterium-in-spaceship
2005-Mar-01, 09:07 PM
Well, I've thought about it and I agree with Lindley/Minka again, but I'm still trying to figure out how to explain why.


I try not to rely upon intuition. That's where Minka went wrong.

Have you looked at the book chapter, though? Specifically, I'm interested in what you think of the last situation that's described (at the end of 37.2, IIRC). Suppose that in a coin-flipping experiment you continue flipping until you get some fixed number of heads; and suppose that after the experiment, someone informs you that if the experiment had gone on for a few more coin flips, he would have ended it (by smashing up the coin, say). Does learning this make your estimate of the coin's theta invalid?

Donnie B.
2005-Mar-01, 09:17 PM
No No No [-X , free willy =D>

bacterium-in-spaceship
2005-Mar-01, 09:55 PM
No No No [-X , free willy =D>

See? That's what frequentist statistics does to your mind.

I have another riddle for ATP and Grey:

Dr. Fleep and Dr. Floop are conducting a statistical experiment. Fleep starts flipping a coin, intending to stop at 50 heads. After getting 35 heads out of 60, he decides he's had enough for the day, and he goes home. Floop then walks into the lab, and, mistakenly thinking Fleep meant to stop at 100 flips, finishes the experiment, getting 59 heads out of 100 total.

How should Fleep and Floop estimate the coin's theta? Are they allowed to use the information at all?

A Thousand Pardons
2005-Mar-02, 11:15 AM
Well, I've thought about it and I agree with Lindley/Minka again, but I'm still trying to figure out how to explain why.
Intuition? :)


Have you looked at the book chapter, though?

Which book do you mean?

worzel
2005-Mar-02, 02:05 PM
Just to back track a little:


I am not objecting to the null hypothesis as a concept. I don't see the need for it.

In logic to describe inversion you say "If A then not B". The corollary is "If not A then B" which is implicit in the first statement. I see no need to make it explicit.
"If A then not B" tells us nothing about "B" if "A" is false. If you meant A and B to represent the null and test hypotheses then your inversion would be true if (and only if :wink: ) the null was strictly the negation of the test, which brings me on to my question:

Is there any point in having a null hypothesis that is not strictly the negation of the test hypothesis? If we don't have a strict negation then either the two hypotheses overlap - in which case data supporting or rejecting one could equally support or reject the other, or there is a gap - in which case data rejecting the null hypothesis can not be said to support the test hypothesis.

Disinfo Agent
2005-Mar-02, 02:26 PM
Nice catch. I hadn't even noticed that instance of denial of the antecedent (http://www.intrepidsoftware.com/fallacy/deny.php).


Is there any point in having a null hypothesis that is not strictly the negation of the test hypothesis? If we don't have a strict negation then either the two hypotheses overlap - in which case data supporting or rejecting one could equally support or reject the other, or there is a gap - in which case data rejecting the null hypothesis can not be said to support the test hypothesis.
Yes, it may be a more sensible approach, given what is known about the natural constraints of the phenomenon or variable under study. See my earlier example in this post (http://www.badastronomy.com/phpBB/viewtopic.php?p=417477#417477).

P.S. And A Thousand Pardons gave a different example of a situation where the alternative hypothesis is not the opposite of the null hypothesis, here (http://www.badastronomy.com/phpBB/viewtopic.php?p=416259#416259).

Grey
2005-Mar-02, 03:27 PM
Suppose that in a coin-flipping experiment you continue flipping until you get some fixed number of heads; and suppose that after the experiment, someone informs you that if the experiment had gone on for a few more coin flips, he would have ended it (by smashing up the coin, say). Does learning this make your estimate of the coin's theta invalid?
No. You'd still base your calculations on what actually happened, not on someone's intent. In this case, the decision about when to stop flipping coins was made based on the results of some of the tosses, so you can expect a selection bias in the sample. Should I find somewhere that shows mathematically how you can derive the (h-1)/(n-1) formula in this case?


Dr. Fleep and Dr. Floop are conducting a statistical experiment. Fleep starts flipping a coin, intending to stop at 50 heads. After getting 35 heads out of 60, he decides he's had enough for the day, and he goes home. Floop then walks into the lab, and, mistakenly thinking Fleep meant to stop at 100 flips, finishes the experiment, getting 59 heads out of 100 total.

How should Fleep and Floop estimate the coin's theta? Are they allowed to use the information at all?
Again, you keep thinking that the experimenter's intent has something to do with how the results are determined, and that's not the case. How you should estimate theta based on a given data sample is based on how that sample was actually chosen. In this case, although Fleep had planned on selecting a data set based on the results, the data set actually taken was based only on the total number of trials. You'd use h/n.

worzel
2005-Mar-02, 03:34 PM
Nice catch. I hadn't even noticed that instance of denial of the antecedent (http://www.intrepidsoftware.com/fallacy/deny.php).
I was rather suprised no one picked up on it and was expecting to be told I'd completely missed his point.



Is there any point in having a null hypothesis that is not strictly the negation of the test hypothesis? If we don't have a strict negation then either the two hypotheses overlap - in which case data supporting or rejecting one could equally support or reject the other, or there is a gap - in which case data rejecting the null hypothesis can not be said to support the test hypothesis.
Yes, it may be a more sensible approach, given what is known about the natural constraints of the phenomenon or variable under study. See my earlier example in this post (http://www.badastronomy.com/phpBB/viewtopic.php?p=417477#417477).
So you're saying that because fertilizer doesn't cause plant shrinkage then the "plant growth shows a positive effect" and the "plant growth is unaffected" hypotheses cover all bases? Well that's ok I guess, but it is conceivable that something we thought was a fertilizer actually did cause a particular plant to shrink in some circumstances, so the null hypothesis should have been "plant growth doesn't show a positive effect", which amounts to the same thing if our assumption about the reputed fertilizer turns out to be true.


P.S. And A Thousand Pardons gave a different example of a situation where the alternative hypothesis is not the opposite of the null hypothesis, here (http://www.badastronomy.com/phpBB/viewtopic.php?p=416259#416259).
I didn't see an example from ATP in that post, or anywhere in this thread as far as I can remember.

Disinfo Agent
2005-Mar-02, 04:14 PM
So you're saying that because fertilizer doesn't cause plant shrinkage then the "plant growth shows a positive effect" and the "plant growth is unaffected" hypotheses cover all bases? Well that's ok I guess, but it is conceivable that something we thought was a fertilizer actually did cause a particular plant to shrink in some circumstances, so the null hypothesis should have been "plant growth doesn't show a positive effect", which amounts to the same thing if our assumption about the reputed fertilizer turns out to be true.
No reasonable candidate to fertiliser should reduce plant growth (or kill plants) within the conditions of the experiment.

You aren't going to apply hypothesis testing procedures to a substance about which you know nothing. If you've arrived at that stage, it's because you've already gathered some data previously about the substance in a more informal way. You've spent some time observing its effect on the plants in your garden, or you know from previous studies that there's some chemical component in that substance that boosts the production of auxins (http://www.plant-hormones.info/auxins.htm), etc.

What I'm trying to say is that, in practice, you only put a product to the test if you have evidence that it is likely to be an effective fertiliser. If it had a negative effect on plant growth, you wouldn't have thought of using it as a fertiliser in the first place.



P.S. And A Thousand Pardons gave a different example of a situation where the alternative hypothesis is not the opposite of the null hypothesis, here (http://www.badastronomy.com/phpBB/viewtopic.php?p=416259#416259).
I didn't see an example from ATP in that post, or anywhere in this thread as far as I can remember.
Sorry. I think I misremembered his example.

A Thousand Pardons
2005-Mar-02, 04:18 PM
Have you looked at the book chapter, though?

Which book do you mean?
I thought at first you mean Jaynes book which you mention in this post (http://www.badastronomy.com/phpBB/viewtopic.php?p=420141#420141), but it doesn't have a chapter 37. I went back through the posts and found you mentioned 37.2 of this online book (http://www.inference.phy.cam.ac.uk/itprnn/book.pdf) (Information Theory, Inference, and Learning Algorithms, by David J. C. MacKay) in this post (http://www.badastronomy.com/phpBB/viewtopic.php?p=425540#425540). It took a long time to load (9 megabytes in pdf), but I think that must be the one. Section 37.2 is called "Dependence of p-values on irrelevant information" on p.474 of the pdf.

The section is pretty much concerned with the coin-tossing problem as we've talked about, and discusses the appropriateness of whether the stopping rule should affect the statistics. The book says "At this point the audience divides in two. Half the audience intuitively feel that the stopping rule is irrelevant, and don't need any convincing that the answer to exercise 37.1 (p.463) is `the inferences about pa do not depend on the stopping rule'. The other half, perhaps on account of a thorough training in sampling theory, intuitively feel that Dr. Bloggs's stopping rule, which stopped tossing the moment the third b appeared, may have biased the experiment somehow."

As Grey has shown, intuition has nothing to do with it. It's the way things work. The book warns " If you are in the second group, I encourage you to reflect on the situation, and hope you'll eventually come round to the view that is consistent with the likelihood principle, which is that the stopping rule is not relevant to what we have learned about pa." Hmmm. I'll dig into the earlier chapters a little and see what I can find.

MacKay says "A Bayesian solution to this inference problem was given in sections 3.2 and 3.3 and exercise 3.15 (p.59)." Exercise 3.15 (p.71 of the pdf file) contains a quote from the Guardian, saying that someone spun a Euro and got 140 heads out of 250 trials, and a London School of Economics statistics lecturer said "It looks very suspicious to me." MacKay's book asks, "But do these data give evidence that the coin is biased rather than fair?" It's a single trial! In a newspaper! Sheesh.

worzel
2005-Mar-02, 04:47 PM
What I'm trying to say is that, in practice, you only put a product to the test if you have evidence that it is likely to be an effective fertiliser. If it had a negative effect on plant growth, you wouldn't have thought of using it as a fertiliser in the first place.

In practice yes, but the null hypothesis your using and the negated test hypothesis are then the same thing, if the unlikely outcome you haven't considered is assumed to be impossible.


MacKay says "A Bayesian solution to this inference problem was given in sections 3.2 and 3.3 and exercise 3.15 (p.59)." Exercise 3.15 (p.71 of the pdf file) contains a quote from the Guardian, saying that someone spun a Euro and got 140 heads out of 250 trials, and a London School of Economics statistics lecturer said "It looks very suspicious to me." MacKay's book asks, "But do these data give evidence that the coin is biased rather than fair?" It's a single trial! In a newspaper! Sheesh.
I thought we woud expect the actual result to be about 16 away from 125 anyway. You know, the random walk thing.

A Thousand Pardons
2005-Mar-02, 07:26 PM
I thought we woud expect the actual result to be about 16 away from 125 anyway. You know, the random walk thing.
16? are you taking the square root of 250?

worzel
2005-Mar-02, 08:12 PM
I thought we woud expect the actual result to be about 16 away from 125 anyway. You know, the random walk thing.
16? are you taking the square root of 250?
Yes.

A Thousand Pardons
2005-Mar-02, 09:17 PM
I thought we woud expect the actual result to be about 16 away from 125 anyway. You know, the random walk thing.
16? are you taking the square root of 250?
Yes.
That only works for a walk in two dimensions. This is a walk in one dimension.

PS: In all dimensions, the probability density curve is approx. bell shape. So, in this case, where each "step" is either plus one (heads) or minus one (tails), the most probable outcome is zero (number of heads equals number of tails) but the most probable distance from the mean is not zero, it is 1 (either heads or tails). In two dimensions, the most probable outcome is still the mean, but the most probable distance is square root of N. In three dimensions, it's uh... let me get back to you.

worzel
2005-Mar-03, 09:57 AM
I thought we woud expect the actual result to be about 16 away from 125 anyway. You know, the random walk thing.
16? are you taking the square root of 250?
Yes.
That only works for a walk in two dimensions. This is a walk in one dimension.
Oops :oops: Thanks for the correction.


PS: In all dimensions, the probability density curve is approx. bell shape. So, in this case, where each "step" is either plus one (heads) or minus one (tails), the most probable outcome is zero (number of heads equals number of tails) but the most probable distance from the mean is not zero, it is 1 (either heads or tails). In two dimensions, the most probable outcome is still the mean, but the most probable distance is square root of N. In three dimensions, it's uh... let me get back to you.
Regarding 1 dimensional walks, the most probable outcome and the most probable distance from the mean aren't the same as the expected distance from the mean. I had a look here (http://mathworld.wolfram.com/RandomWalk1-Dimensional.html) and here (http://mathworld.wolfram.com/Heads-Minus-TailsDistribution.html) but I don't really understand all the maths.
The first link seems to be saying that the expected distance from the mean is http://www.mboyd.demon.co.uk/rimg915.gif which would be 12.6 for 250 tosses, while the second says that it is http://www.mboyd.demon.co.uk/himg1284.gif :-? my caculator won't do cominations that high. And for an even number of tosses wouldn't the most probable distance from the mean be 2 rather than 1?

worzel
2005-Mar-03, 11:32 AM
Thinking about the 250 coin tosses, how do we assess the likelyhood of someone getting 140 heads? Is the expectation value for the distance from the mean even relevant? The expectation value for a single die roll is 3.5 but that doesn't tell me much as all rolls are equally likely.

EDIT: I should have said, the expected distance from the mean of 3.5 is 1.5 (I think), which doesn't tell me much...

Getting exactly 140 heads is pretty unlikely, but so is any number of heads. Someone claiming to have thrown the same number of heads and tails seems to be more plausible than someone claiming to have thrown all heads or all tails. While the probablity of throwing exactly the same number of heads and tails is greater than throwing all of one, the probability of each is so low that it can't possibility tell me the likelyhood that either claim is true or not.

bacterium-in-spaceship
2005-Mar-04, 02:46 PM
Suppose that in a coin-flipping experiment you continue flipping until you get some fixed number of heads; and suppose that after the experiment, someone informs you that if the experiment had gone on for a few more coin flips, he would have ended it (by smashing up the coin, say). Does learning this make your estimate of the coin's theta invalid?
No. You'd still base your calculations on what actually happened, not on someone's intent. In this case, the decision about when to stop flipping coins was made based on the results of some of the tosses, so you can expect a selection bias in the sample. Should I find somewhere that shows mathematically how you can derive the (h-1)/(n-1) formula in this case?

I don't think you can derive the (h-1)/(n-1) formula in this case, because this estimator has become biased due to the different stopping rule. That was the point of the example. (If it is unbiased, then it's purely by coincidence, and it should be easy to construct a similar example where it's biased.)

The stopping rule we're talking about here is: keep flipping until you reach either H heads, or until you reach N flips. Simulating this on a computer shouldn't be hard if you've already simulated the other example, and I think what you will find is that the estimator now has a different distribution.



Dr. Fleep and Dr. Floop are conducting a statistical experiment. Fleep starts flipping a coin, intending to stop at 50 heads. After getting 35 heads out of 60, he decides he's had enough for the day, and he goes home. Floop then walks into the lab, and, mistakenly thinking Fleep meant to stop at 100 flips, finishes the experiment, getting 59 heads out of 100 total.

How should Fleep and Floop estimate the coin's theta? Are they allowed to use the information at all?
(...) How you should estimate theta based on a given data sample is based on how that sample was actually chosen. In this case, although Fleep had planned on selecting a data set based on the results, the data set actually taken was based only on the total number of trials. You'd use h/n.

Again, h/n is now a biased estimator, because if you run this experiment a lot of times, then sometimes it will finish before Floop comes in, leading to the same kind of selection bias as before. If you don't believe me, simulate it.

Here's an even better example. Suppose you're undecided between using the fixed-H rule and the fixed-N rule. To decide between them, you start the experiment by flipping a (different) coin that you know is fair; if it lands on heads, you choose the fixed-N rule, and if it lands on tails, you choose the fixed-H rule. What is the probability distribution of the estimator that divides the total number of heads (excluding the first flip) by the total number of flips (excluding the first flip)? Or at least: do you agree that it's biased? Do you agree that you shouldn't care that it's biased, if, in reality, you happen to get heads on your first coin flip?

bacterium-in-spaceship
2005-Mar-04, 03:01 PM
I went back through the posts and found you mentioned 37.2 of this online book (http://www.inference.phy.cam.ac.uk/itprnn/book.pdf) (Information Theory, Inference, and Learning Algorithms, by David J. C. MacKay) in this post (http://www.badastronomy.com/phpBB/viewtopic.php?p=425540#425540). It took a long time to load (9 megabytes in pdf), but I think that must be the one.

Yes, that's the one; apologies for lack of clarity.


As Grey has shown, intuition has nothing to do with it. It's the way things work.

That's not what Grey has shown. What Grey has shown is that, according to the unbiasedness criterion, describing the same series of coin flips by two different estimators is correct, if the series was created with different stopping rules in mind. This was never in doubt. The real question isn't whether this is what you would conclude from the unbiasedness criterion; the question is whether the conclusion (and therefore its assumption) is pathological for other reasons. I think the other examples that I've posted here (which I've mostly taken from MacKay's book) clearly show that it is; I'm still interested in hearing your views on them (are they the same as Grey's?).

bacterium-in-spaceship
2005-Mar-04, 03:09 PM
Thinking about the 250 coin tosses, how do we assess the likelyhood of someone getting 140 heads?

The probability of getting 140 heads is:

X * p^140 * q^110

where p is the probability for heads, q = (1 - p) is the probability for tails, and X is the number of combinations (the number of ways of taking 140 members out of a set of 250).


Is the expectation value for the distance from the mean even relevant? The expectation value for a single die roll is 3.5 but that doesn't tell me much as all rolls are equally likely.

Exactly! This is what Bayes does right, and orthodox statistics does wrong. The only probabilities that matter are the ones that different hypotheses assign to the actual outcome; not the probability that one hypothesis assigns to the tail area beyond the actual outcome.

A Thousand Pardons
2005-Mar-04, 03:17 PM
That's not what Grey has shown. What Grey has shown is that, according to the unbiasedness criterion, describing the same series of coin flips by two different estimators is correct, if the series was created with different stopping rules in mind. This was never in doubt.
That's not the way I read Minka. Or are you saying that Minka is wrong? Are his ideas insane then? :)

bacterium-in-spaceship
2005-Mar-04, 03:36 PM
Then I think you've misread Minka. He's not questioning that, according to the unbiasedness rule, the example is correct statistical reasoning; he's questioning the validity of the unbiasedness rule itself, because it condones situations like this one.

Grey
2005-Mar-04, 04:07 PM
I don't think you can derive the (h-1)/(n-1) formula in this case, because this estimator has become biased due to the different stopping rule. That was the point of the example. (If it is unbiased, then it's purely by coincidence, and it should be easy to construct a similar example where it's biased.)

The stopping rule we're talking about here is: keep flipping until you reach either H heads, or until you reach N flips. Simulating this on a computer shouldn't be hard if you've already simulated the other example, and I think what you will find is that the estimator now has a different distribution.
Ah, sorry, I wasn't clear what you'd meant. Yes, if your stopping rule is to stop at either H heads or N flips, then, since your rule is based on the results, you'll have a biased set. Exactly what formula you'd use to estimate theta would depend on exactly what H and N are, and would probably be moderately tricky to work out. The limits at either end would approach the formulae we've already discussed. For example, if H is 95 and N is 100, you'll almost always stop by reaching N, and you'd expect h/n to be pretty close to the correct formula for estimating theta. If, on the other hand, you choose 50 for H and 500 for N, you'll almost always stop by reaching H first, and so the proper formula will end up being pretty close to (h-1)/(n-1).


Again, h/n is now a biased estimator, because if you run this experiment a lot of times, then sometimes it will finish before Floop comes in, leading to the same kind of selection bias as before. If you don't believe me, simulate it.
Well, that actually depends on what happens when Floop comes in. Suppose Fleep had surprisingly reached 50 head in his 60 tosses. If he makes a note that the experiment is finished as he heads out for the day, which would make Floop not continue, then this would indeed be a biased selection method, as you say. But what if Floop doesn't know that Fleep considered the experiment finished? If Floop comes in and finishes the set out to 100, whether Fleep had stopped because he reached 50 or just because he was tired and wanted to go home, and Floop doesn't consider the results Fleep has gotten so far either way in making the decision to continue, then the resulting set will always be 100 tosses, and it will be a random sample.


Here's an even better example. Suppose you're undecided between using the fixed-H rule and the fixed-N rule. To decide between them, you start the experiment by flipping a (different) coin that you know is fair; if it lands on heads, you choose the fixed-N rule, and if it lands on tails, you choose the fixed-H rule. What is the probability distribution of the estimator that divides the total number of heads (excluding the first flip) by the total number of flips (excluding the first flip)? Or at least: do you agree that it's biased? Do you agree that you shouldn't care that it's biased, if, in reality, you happen to get heads on your first coin flip?
I do agree that it's biased, and I'll be lazy and not try to work out the details, since I'm not sure they matter for this discussion. :) As for what to do with the results, you'd have a choice in deciding how you were going to work things out. You could either acknowledge that the total experiment is biased and work out a method of estimating theta from all of the data, including in your calculations the effect that the initial toss might have on how the data set is created. But you could also decide ahead of time that, if the first toss comes up heads, you use the fixed-N rule and use h/n to estimate theta, and if the first toss comes up tails, you use the fixed-H rule and use (h-1)/(n-1) to estimate theta. This second method is probably simpler, since you don't have to work out the proper formula to estimate theta, but either method should give the same results overall, at least if you repeated the whole experiment many times.

bacterium-in-spaceship
2005-Mar-04, 04:26 PM
Right; you've given the correct answers as to when the estimators are unbiased, I think. But don't you agree this shows how bizarre the unbiasedness principle is?

According to the unbiasedness principle, in that last example, your estimates will depend on how what sort of experiment you think you would have performed if the first, fair coin had landed differently. You could choose to let your choice of estimator depend on that first coin toss, but you don't have to; always taking some weighted average of the two estimators (independent of the first toss outcome) should still give you an unbiased estimator.

(Let's assume all these experiments are performed only once, by the way; we're interested in what we can conclude from these specific results.)

In the case where you stop at either a fixed H or a fixed N, whichever comes first, someone can determine the stopping rule after the experiment has been completed, by walking in and (truthfully) announcing: "I would have stopped the experiment if it was still going on, but it isn't, so I guess I'll leave you alone now." This person may only have made the decision (to walk in and see whether the experiment is still going on, and if so, to stop it) after the entire experiment was concluded. (Though he can't let this decision depend on the experiment outcome, of course; he has to make it not knowing whether the experiment is still going on.)

In experiments with stopping rules similar to the fixed-H rule, would you say that, to avoid poisoning of his data, the experimenter has to sit around in his lab for some time afterward just to check whether he would have been interrupted if the experiment had still been running?

Grey
2005-Mar-04, 05:19 PM
Right; you've given the correct answers as to when the estimators are unbiased, I think. But don't you agree this shows how bizarre the unbiasedness principle is?
But if we failed to take a selection bias into account when analyzing data, we'd clearly get the wrong results. So that's certainly not a good option.


(Let's assume all these experiments are performed only once, by the way; we're interested in finding out what we can conclude from these specific results.)
Actually, I think that this is part of the problem. Since this situation is a statistical one, it really only makes sense to expect the results to be accurate in the limit of an ensemble of trials. If I toss a coin ten times, and get eight heads, my best estimate for theta has to be 0.8. But is it really an unfair coin, or did I just get lucky (or unlucky, depending on your perspective)?

So if we're talking about statistical analysis of a coin tossing experiment, we really have to envision a large ensemble of identical experiments, at least in principle, that might take place. If there really is someone poised to shut down the experiment at a certain time, then in some of these "possible worlds", they really do come in and stop the experiment before the desired number of heads is reached. If the way the sample is gathered can vary randomly, then for some of the "possible worlds" it will be gathered one way, and for some it will be gathered another way. If we can eliminate some of these "possible worlds" by looking at the circumstances of the actual experiment that we performed, we might indeed be able to use that to improve our estimate of theta. But that's using additional information, not just the intended selection rule and the results obtained.

Think about this. Suppose you're handed the raw results from one of these trials where the method of stopping is determined by coin toss at the outset. But you aren't told which way the first coin toss actually fell. Further, suppose that the number of either tosses to stop at or heads to stop at is also random, so a cursory glance at the data won't give the method away. What's the best method to estimate theta?

Disinfo Agent
2005-Mar-04, 05:29 PM
Right; you've given the correct answers as to when the estimators are unbiased, I think. But don't you agree this shows how bizarre the unbiasedness principle is?
What unbiasedness principle? I've never heard of such a principle. What is it?

Grey
2005-Mar-04, 05:33 PM
What unbiasedness principle? I've never heard of such a principle. What is it?
I think he merely meant the principle that, if there is some selection bias inherent in the way you gather data, that has an effect on the methods you'll need to use to analyze that data properly, as with the h/n or (h-1)/(n-1) situation. If that's not what he meant, well, then my answer probably doesn't make any sense. :)

Disinfo Agent
2005-Mar-04, 05:53 PM
Well, perhaps my question is too Socratic. Obviously, unbiasedness is a desirable property in an estimator. However, it seemed to me that the conversation had more to do with principles of inference than principles of estimation. I know of a few principles of inference usually mentioned when the Bayesian approach is being compared to the frequentist approach, such as the likelihood principle (http://www.answers.com/topic/likelihood-principle), but I'd never heard of unbiasedness as a principle of statistical inference. It seems questionable whether the application of such a principle is the best interpretation of the examples being discussed.

A Thousand Pardons
2005-Mar-04, 07:14 PM
Then I think you've misread Minka. He's not questioning that, according to the unbiasedness rule, the example is correct statistical reasoning; he's questioning the validity of the unbiasedness rule itself, because it condones situations like this one.
Here's a quote I gave previously, from Minka's article: "Since the data set tells us both n and h, it shouldn't matter whether we assumed n or h beforehand."

I don't think I'm misreading Minka. He's just wrong.

bacterium-in-spaceship
2005-Mar-04, 08:23 PM
But if we failed to take a selection bias into account when analyzing data, we'd clearly get the wrong results. So that's certainly not a good option.

Bayesian statistics somehow manages: it leads (if I'm reading my sources correctly, and if they're telling the truth, which seems plausible) to a posterior probability distribution that does not depend on what stopping rule was used, unless the stopping point depends explicitly on the parameters instead of just on the outcomes of the experiment.

It's not clear to me at all that "bias", in the technical sense that the expectation of an estimator over all possible experiment outcomes is different from the true parameter value, is always a bad thing.



Actually, I think that this is part of the problem. Since this situation is a statistical one, it really only makes sense to expect the results to be accurate in the limit of an ensemble of trials.

On the contrary: since this situation is a statistical one, that means that we have to draw uncertain conclusions from less-than-complete information. It's quite normal for an experiment to be performed only once ("experiment" here meaning "series of coin tosses" rather than "coin toss"). Any valid set of statistical methods should be able to draw conclusions from limited real-world data, instead of only from an infinite set of fictional worlds.


If I toss a coin ten times, and get eight heads, my best estimate for theta has to be 0.8.

Why?


So if we're talking about statistical analysis of a coin tossing experiment, we really have to envision a large ensemble of identical experiments, at least in principle, that might take place. If there really is someone poised to shut down the experiment at a certain time, then in some of these "possible worlds", they really do come in and stop the experiment before the desired number of heads is reached.

So if someone told you after the experiment that he would have stopped you if it had gone on for longer, then you would really adjust your estimates? That seems bizarre to me.

bacterium-in-spaceship
2005-Mar-04, 08:24 PM
Right; you've given the correct answers as to when the estimators are unbiased, I think. But don't you agree this shows how bizarre the unbiasedness principle is?
What unbiasedness principle? I've never heard of such a principle. What is it?

I used "unbiasedness principle" as a shorthand for the notion that we should always use unbiased estimators where possible.

bacterium-in-spaceship
2005-Mar-04, 08:30 PM
Then I think you've misread Minka. He's not questioning that, according to the unbiasedness rule, the example is correct statistical reasoning; he's questioning the validity of the unbiasedness rule itself, because it condones situations like this one.
Here's a quote I gave previously, from Minka's article: "Since the data set tells us both n and h, it shouldn't matter whether we assumed n or h beforehand."

I don't think I'm misreading Minka. He's just wrong.

By "shouldn't matter", it looks to me like he means "wouldn't matter in any good theory of inference", not "doesn't matter if we follow the principle of using unbiased estimators".

A Thousand Pardons
2005-Mar-04, 08:51 PM
Then I think you've misread Minka. He's not questioning that, according to the unbiasedness rule, the example is correct statistical reasoning; he's questioning the validity of the unbiasedness rule itself, because it condones situations like this one.
Here's a quote I gave previously, from Minka's article: "Since the data set tells us both n and h, it shouldn't matter whether we assumed n or h beforehand."

I don't think I'm misreading Minka. He's just wrong.

By "shouldn't matter", it looks to me like he means "wouldn't matter in any good theory of inference", not "doesn't matter if we follow the principle of using unbiased estimators".
Wouldn't matter in any good theory of inference, in what way? Does The Good Theory of Inference just not draw inferences?

My main objection is to the description of one or the other methods as "insane" or even "good" or "bad". Within the proper context, both seem to work fine, to me.

bacterium-in-spaceship
2005-Mar-04, 08:59 PM
Wouldn't matter in any good theory of inference, in what way? Does The Good Theory of Inference just not draw inferences?

It does draw inferences, but in such a way that it doesn't matter whether the coin flips were generated by stopping at an N that was known in advance or an H that was known in advance.


My main objection is to the description of one or the other methods as "insane" or even "good" or "bad". Within the proper context, both seem to work fine, to me.

But they contradict each other; not only do they give you different answers to the same question, they often give you different kinds of answers (probability distributions vs estimates).

If the conclusions we should draw from an experiment can depend on choices made after it was finished, doesn't that at least seem a little strange to you?

A Thousand Pardons
2005-Mar-04, 09:39 PM
Wouldn't matter in any good theory of inference, in what way? Does The Good Theory of Inference just not draw inferences?

It does draw inferences, but in such a way that it doesn't matter whether the coin flips were generated by stopping at an N that was known in advance or an H that was known in advance.
What is the inference, in your opinion, then? In the example we are talking about.



My main objection is to the description of one or the other methods as "insane" or even "good" or "bad". Within the proper context, both seem to work fine, to me.

But they contradict each other; not only do they give you different answers to the same question, they often give you different kinds of answers (probability distributions vs estimates).
In the example we are talking about, they are not the same question, clearly. They have different answers. :)


If the conclusions we should draw from an experiment can depend on choices made after it was finished, doesn't that at least seem a little strange to you?
After? In the example we are talking about, the choices were made before.

bacterium-in-spaceship
2005-Mar-04, 09:40 PM
Here's a nice way to look at it. Consider a fixed-N researcher and a fixed-H researcher investigating the same coin, and getting the same results.

At the start of the experiment, each has the same information about the coin.

From the first coin flip, both researchers gain the same new information, as this result doesn't depend in any way on the stopping rule used.

From the second coin flip, both researchers again gain the same new information, for the same reason.

Ditto for the third, ..., Nth coin flip. The Nth coin flip turns out to be the Hth heads.

Then, both researchers stop, one because he's reached N coin flips and the other because he's reached H heads. They gain no information from the fact that they stop; at least no information that they didn't already gain from the coin flips themselves. After all, given the series of coin flips that has happened, both were certain to stop, independent of any properties of the coin.

So at each individual step, both researchers gain the same information. At what point should their judgments on the parameter value start to differ?

A Thousand Pardons
2005-Mar-04, 09:49 PM
And?


What is the inference, in your opinion, then? In the example we are talking about.

bacterium-in-spaceship
2005-Mar-04, 09:50 PM
What is the inference, in your opinion, then? In the example we are talking about.

The Bayesian result should be a probability distribution for the coin's theta that depends only on your prior probability distribution, on H, and on N. I could derive or look up the formula, but either would take some time, and I doubt the exact formula is going to be very interesting.


In the example we are talking about, they are not the same question, clearly. They have different answers. :)

Sorry, I was unclear. I meant Bayesian statistics gives you a different answer than orthodox statistics, even when the question really is exactly the same (e.g. "what should be our estimate for theta in a fixed-N experiment?").


After? In the example we are talking about, the choices were made before.

I meant the example I discussed with Grey; see e.g. my post about fourteen posts up. :)

A Thousand Pardons
2005-Mar-04, 10:14 PM
What is the inference, in your opinion, then? In the example we are talking about.

The Bayesian result should be a probability distribution for the coin's theta that depends only on your prior probability distribution, on H, and on N. I could derive or look up the formula, but either would take some time, and I doubt the exact formula is going to be very interesting.
Let's see what you got. :)

worzel
2005-Mar-05, 12:59 AM
Thinking about the 250 coin tosses, how do we assess the likelyhood of someone getting 140 heads?

The probability of getting 140 heads is:

X * p^140 * q^110

where p is the probability for heads, q = (1 - p) is the probability for tails, and X is the number of combinations (the number of ways of taking 140 members out of a set of 250).
I think you missed my point. Obvisouly any particular combination of heads and tails is going to be unlikely. But if I told you that I got 125 heads and 125 tails would you doubt me because X * p^125 * q^125 is very small? You would have to doubt whatever result I got using that logic.

Earlier in the thread it was said that a statistician had doubts about the 140 heads and 110 tails report. I'm wondering if and how we can quantify that doubt. I thought the expected distance from the mean gave us some idea, but now I'm not so sure.

worzel
2005-Mar-05, 01:03 AM
ATP! I was looking forward to your comments on my post (http://www.badastronomy.com/phpBB/viewtopic.php?t=19641&start=100#427187) :(

A Thousand Pardons
2005-Mar-05, 02:06 PM
ATP! I was looking forward to your comments on my post (http://www.badastronomy.com/phpBB/viewtopic.php?t=19641&start=100#427187)
I was a little confused. :)

BTW, we probably should not be using passthrough links to images at mathworld. I doubt they'd mind if you saved them to your own server, and used them from there, though.


Regarding 1 dimensional walks, the most probable outcome and the most probable distance from the mean aren't the same as the expected distance from the mean. I had a look here (http://mathworld.wolfram.com/RandomWalk1-Dimensional.html) and here (http://mathworld.wolfram.com/Heads-Minus-TailsDistribution.html) but I don't really understand all the maths.
The first link seems to be saying that the expected distance from the mean is sqrt of (2N divided by pi) which would be 12.6 for 250 tosses, while the second says that it is n times (2n choose n) divided by 2 to the (2n-1) :-?

Actually, that first one says it is (N-1)!!/(N-2)!!, but sqrt of (2N divided by pi) is an approximation to it, for large N. (N-1)!!/(N-2)!! is equal to n times (2n choose n) divided by 2 to the (2n-1) (notice that N=2n).


my caculator won't do cominations that high. And for an even number of tosses wouldn't the most probable distance from the mean be 2 rather than 1?
Yep--because a distance of 1 is impossible for an even number of tosses. But it's 1 if the number of tosses is odd. :)

worzel
2005-Mar-05, 03:34 PM
BTW, we probably should not be using passthrough links to images at mathworld. I doubt they'd mind if you saved them to your own server, and used them from there, though.
Ok, I've just done that.

Actually, that first one says it is (N-1)!!/(N-2)!!, but sqrt of (2N divided by pi) is an approximation to it, for large N.
Ah, that makes sense.


And for an even number of tosses wouldn't the most probable distance from the mean be 2 rather than 1?
Yep--because a distance of 1 is impossible for an even number of tosses. But it's 1 if the number of tosses is odd. :)
So 1 or 2 is the most probable distance from the mean, depending on the nubmer of tosses, but what does this expectation value mean? Can we use it to assess the likelyhood of the 140H - 110T result?

Grey
2005-Mar-07, 03:41 PM
Bayesian statistics somehow manages: it leads (if I'm reading my sources correctly, and if they're telling the truth, which seems plausible) to a posterior probability distribution that does not depend on what stopping rule was used, unless the stopping point depends explicitly on the parameters instead of just on the outcomes of the experiment.
Like A Thousand Pardons, I'd certainly be interesed to see how these sorts of issues would be handled using Bayesian techniques, but I'll withhold judgment until I've actually seen the details.


It's not clear to me at all that "bias", in the technical sense that the expectation of an estimator over all possible experiment outcomes is different from the true parameter value, is always a bad thing.
Let me make sure that I'm understanding you correctly. You're saying that we shouldn't always be concerned if our estimate of some value isn't as close to the true value as we can manage, given the information available? I don't think I can agree with that statement. Why wouldn't we use whatever techniques give us the best results?


On the contrary: since this situation is a statistical one, that means that we have to draw uncertain conclusions from less-than-complete information. It's quite normal for an experiment to be performed only once ("experiment" here meaning "series of coin tosses" rather than "coin toss"). Any valid set of statistical methods should be able to draw conclusions from limited real-world data, instead of only from an infinite set of fictional worlds.
We can certainly analyze a single trial, but our analysis is based on considering what would happen in a large set of similar trials, and assuming any given experiment is more or less typical. If that's not the case, I don't think it's valid to analyze the situation statistically.


So if someone told you after the experiment that he would have stopped you if it had gone on for longer, then you would really adjust your estimates? That seems bizarre to me.
No, but that's because this added information doesn't just change the stopping rule, as you've implied. While it's true that apparently the stopping rule wasn't what I thought it was, I also know that that this alternate stopping rule was not invoked. That is, although there might have been an arbitrary stopping point, I can rule out a number of possible resulting data sets by the fact that I didn't reach that stopping point. That's just utilizing all the information available to me.


Here's a nice way to look at it. Consider a fixed-N researcher and a fixed-H researcher investigating the same coin, and getting the same results.

...Then, both researchers stop, one because he's reached N coin flips and the other because he's reached H heads...
Part of the problem here is that you've set up a carefully constructed situation where both get the same results in the coin tosses (this part isn't a problem) and they both reach their predetermined stopping point at the same moment (this is a problem). As I showed by simulation, using these two stopping rules, the two researchers generally won't stop at the same number of tosses, and the ratio of heads to the total tosses will, on average, be different for the two experiments. Doesn't it make sense to take this simple fact into account when analyzing the results?

A Thousand Pardons
2005-Mar-07, 04:15 PM
Part of the problem here is that you've set up a carefully constructed situation where both get the same results in the coin tosses (this part isn't a problem) and they both reach their predetermined stopping point at the same moment (this is a problem). As I showed by simulation, using these two stopping rules, the two researchers generally won't stop at the same number of tosses, and the ratio of heads to the total tosses will, on average, be different for the two experiments. Doesn't it make sense to take this simple fact into account when analyzing the results?
Both "researchers" (N and H) arriving at the same stopping point has a probability of ... well, lets say N=20, H=5, then it's 3%. So, not often. :)