Tag Archives: Bayes’ rule

On the optimal burden of proof

All claims should be considered false until proven otherwise, because lies can be invented much faster than refuted. In other words, the maker of a claim has the burden of providing high-quality scientific proof, for example by referencing previous research on the subject. Strangely enough, some people seem to believe marketing, political spin and conspiracy theories even after such claims have been proven false. It remains to wish that everyone received the consequences of their choices (so that karma works).
Considering all claims false until proven otherwise runs into a logical problem: a claim and its opposite claim cannot be simultaneously false. The priority for falsity should be given to actively made claims, e.g. someone saying that a product or a policy works, or that there is a conspiracy behind an accident. Especially suspect are claims that benefit their maker if people believe them. A higher probability of falsity should also be attached to positive claims, e.g. that something has an effect in whatever direction (as opposed to no effect) or that an event is due to non-obvious causes, not chance. The lack of an effect should be the null hypothesis. Similarly, ignorance and carelessness, not malice, should be the default explanation for bad events.
Sometimes two opposing claims are actively made and belief in them benefits their makers, e.g. in politics or when competing products are marketed. This is the hardest case to find the truth in, but a partial and probabilistic solution is possible. Until rigorous proof is found, one should keep an open mind. Keeping an open mind creates a vulnerability to manipulation: after some claim is proven false, its proponents often try to defend it by asking its opponents to keep an open mind, i.e. ignore evidence. In such cases, the mind should be closed to the claim until its proponents provide enough counter-evidence for a neutral view to be reasonable again.
To find which opposing claim is true, the first test is logic. If a claim is logically inconsistent with itself, then it is false by syntactic reasoning alone. A broader test is whether the claim is consistent with other claims of the same person. For example, Vladimir Putin said that there were no Russian soldiers in Crimea, but a month later gave medals to some Russian soldiers, citing their successful operation in Crimea. At least one of the claims must be false, because either there were Russian soldiers in Crimea or not. The way people try to weasel out of such self-contradictions is to say that the two claims referred to different time periods, definitions or circumstances. In other words, change the interpretation of words. A difficulty for the truth-seeker is that sometimes such a change in interpretation is a legitimate clarification. Tongues do slip. Nonetheless, a contradiction is probabilistic evidence for lying.
The second test for falsity is objective evidence. If there is a streetfight and the two sides accuse each other of starting it, then sometimes a security camera video can refute one of the contradicting claims. What evidence is objective is, sadly, subject to interpretation. Videos can be photoshopped, though it is difficult and time-consuming. The objectivity of the evidence is strongly positively correlated with the scientific rigour of its collection process. „Hard” evidence is a signal of the truth, but a probabilistic signal. In this world, most signals are probabilistic.
The third test of falsity is the testimony of neutral observers, preferably several of them, because people misperceive and misremember even under the best intentions. The neutrality of observers is again up for debate and interpretation. In some cases, an observer is a statistics-gathering organisation. Just like objective evidence, testimony and statistics are probabilistic signals.
The fourth test of falsity is the testimony of interested parties, to which the above caveats apply even more strongly.
Integrating conflicting evidence should use Bayes’ rule, because it keeps probabilities consistent. Consistency helps glean information about one aspect of the question from data on other aspects. Background knowledge should be combined with the evidence, for example by ruling out physical impossibilities. If a camera shows a car disappearing behind a corner and immediately reappearing, moving in the opposite direction, then physics says that the original car couldn’t have changed direction so fast. The appearing car must be a different one. Knowledge of human interactions and psychology is part of the background information, e.g. if smaller, weaker and outnumbered people rarely attack the stronger and more numerous, then this provides probabilistic info about who started a fight. Legal theory incorporates background knowledge of human nature to get information about the crime – human nature suggests motives. Asking: „Who benefits?” has a long history in law.

On simple answers

Bayes’ rule exercise: is a simple or a complicated answer to a complicated problem more likely to be correct?

Depends on the conditional probabilities: if simple questions are more likely to have simple answers and complex questions complicated, then a complicated answer is more likely to be correct for a complicated problem.

It seems reasonable that the complexity of the answer is correlated with the difficulty of the problem. But this is an empirical question.

If difficult problems are likely to have complex answers, then this is an argument against slogans and ideologies. These seek to give a catchy one-liner as the answer to many problems in society. No need to think – ideology has the solution. Depending on your political leaning, poverty may be due to laziness or exploitation. The foreign policy “solution” is bombing for some, eternal appeasement for others.

The probabilistic preference for complex answers in complicated situations seems to contradict Occam’s razor (among answers equally good at explaining the facts, the simplest answer should be chosen). There is no actual conflict with the above Bayesian exercise. There, the expectation of a complex answer applies to complicated questions, while a symmetric anticipation of a simple answer holds for simple problems. The answers compared are not equally good, because one fits the structure of the question better than the other.

Which ideology is more likely to be wrong?

Exercise in Bayes’ rule: is an ideology more likely to be wrong if it appeals relatively more to poor people than the rich?

More manipulable folks are more likely to lose their money, so less likely to be rich. Stupid people have a lower probability of making money. By Bayes, the rich are on average less manipulable and more intelligent than the poor.

Less manipulable people are less likely to find an ideology built on fallacies appealing. By Bayes, an ideology relatively more appealing to the stupid and credulous is more likely to be wrong. Due to such people being poor with a higher probability, an ideology embraced more by the poor than the rich is more likely to be fallacious.

Another exercise: is an ideology more likely to be wrong if academics like it relatively more than non-academics?

Smarter people are more likely to become academics, so by Bayes’ rule, academics are more likely to be smart. Intelligent people have a relatively higher probability of liking a correct ideology, so by Bayes, an ideology appealing to the intelligent is more likely to be correct. An ideology liked by academics is correct with a higher probability.

A random world as an argument against fanatism

Theoretical physicists may debate whether the universe is random or not, but for practical purposes it is, because any sufficiently complicated deterministic system looks random to someone who does not fully understand it. This is the example from Lipman (1991) “How to decide how to decide…”: the output of a complicated deterministic function that is written down still looks random to a person who cannot calculate its output.
If the world is random, we should not put probability one on any event. Nothing is certain, so any fanatical belief that some claim is certainly true is almost certainly wrong. This applies to religion, ideology, personal memories and also things right before your eyes. The eyes can deceive, as evidenced by the numerous visual illusions invented and published in the past. If you see your friend, is that really the same person? How detailed a memory of your friend’s face do you have? Makeup can alter appearance quite radically (http://www.mtv.com/news/1963507/woman-celebrity-makeup-transformation/).
This way lies paranoia, but actually in a random world, a tiny amount of paranoia about everything is appropriate. A large amount of paranoia, say putting probability more than 1% on conspiracy theories, is probably a wrong belief.
How to know whether something is true then? A famous quote: “Everything is possible, but not everything is likely” points the way. Use logic and statistics, apply Bayes’ rule. Statistics may be wrong, but they are much less likely to be wrong than rumours. A source that was right in the past is more likely to be right at present than a previously inaccurate source. Science does not know everything, but this is not a reason to believe charlatans.

Evaluating the truth and the experts simultaneously

When evaluating an artwork, the guilt of a suspect or the quality of theoretical research, the usual procedure is to gather the opinions of a number of people and take some weighted average of these. There is no objective measure of the truth or the quality of the work. What weights should be assigned to different people’s opinions? Who should be counted an expert or knowledgeable witness?
A circular problem appears: the accurate witnesses are those who are close to the truth, and the truth is close to the average claim of the accurate witnesses. This can be modelled as a set of signals with unknown precision. Suppose the signals are normally distributed with mean equal to the truth (witnesses unbiased, just have poor memories). If the precisions were known, then these could be used as weights in the weighted average of the witness opinions, which would be an unbiased estimate of the truth with minimal variance. If the truth were known, then the distance of the opinion of a witness from it would measure the accuracy of that witness. But both precisions and the truth are unknown.
Simultaneously determining the precisions of the signals and the estimate of the truth may have many solutions. If there are two witnesses with different claims, we could assign the first witness infinite precision and the second finite, and estimate the truth to equal the opinion of the first witness. The truth is derived from the witnesses and the precisions are derived from the truth, so this is consistent. The same applies with witnesses switched.
A better solution takes a broader view and simultaneously estimates witness precisions and the truth. These form a vector of random variables. Put a prior probability distribution on this vector and use Bayes’ rule to update this distribution in response to the signals (the witness opinions).
The solution of course depends on the chosen prior. If one witness is assumed infinitely precise and the others finitely, then the updating rule keeps the infinite and finite precisions and estimates the truth to equal the opinion of the infinitely precise witness. The assumption of the prior seems unavoidable. At least it makes clear why the multiple solutions arise.

Retaking exams alters their informativeness

If only those who fail are allowed to retake an exam and it is not reported whether a grade comes from the first exam or a retake, then the failers get an advantage. They get a grade that is the maximum of two attempts, while others only get one attempt.
A simple example has two types of exam takers: H and L, with equal proportions in the population. The type may reflect talent or preparation for exam. There are three grades: A, B, C. The probabilities for each type to receive a certain grade from any given attempt of the exam are for H, Pr(A|H)=0.3, Pr(B|H)=0.6, Pr(C|H)=0.1 and for L, Pr(A|L)=0.2, Pr(B|L)=0.1, Pr(C|L)=0.7. The H type is more likely to get better grades, but there is noise in the grade.
After the retake of the exam, the probabilities for H to end up with each grade are Pr*(A|H)=0.33, Pr*(B|H)=0.66 and Pr*(C|H)=0.01. For L,  Pr*(A|L)=0.34, Pr*(B|L)=0.17 and Pr*(C|L)=0.49. So the L type ends up with an A grade more frequently than H, due to retaking exams 70% of the time as opposed to H’s 10%.
If the observers of the grades are rational, they will infer by Bayes’ rule Pr(H|A)=33/67, Pr(H|B)=66/83 and Pr(H|C)=1/50.
It is probably to counter the advantage of retakers that some universities in the UK discount grades obtained from retaking exams (http://www.telegraph.co.uk/education/universityeducation/10236397/University-bias-against-A-level-resit-pupils.html). In the University of Queensland, those who fail a course can take a supplementary exam, but the grade is distinguished on the transcript from the grade obtained on first try. Also, the maximum grade possible from taking a supplementary exam is one step above failure – the three highest grades cannot be obtained.