Tag Archives: Bayes’ rule

A random world as an argument against fanatism

Theoretical physicists may debate whether the universe is random or not, but for practical purposes it is, because any sufficiently complicated deterministic system looks random to someone who does not fully understand it. This is the example from Lipman (1991) “How to decide how to decide…”: the output of a complicated deterministic function that is written down still looks random to a person who cannot calculate its output.
If the world is random, we should not put probability one on any event. Nothing is certain, so any fanatical belief that some claim is certainly true is almost certainly wrong. This applies to religion, ideology, personal memories and also things right before your eyes. The eyes can deceive, as evidenced by the numerous visual illusions invented and published in the past. If you see your friend, is that really the same person? How detailed a memory of your friend’s face do you have? Makeup can alter appearance quite radically (http://www.mtv.com/news/1963507/woman-celebrity-makeup-transformation/).
This way lies paranoia, but actually in a random world, a tiny amount of paranoia about everything is appropriate. A large amount of paranoia, say putting probability more than 1% on conspiracy theories, is probably a wrong belief.
How to know whether something is true then? A famous quote: “Everything is possible, but not everything is likely” points the way. Use logic and statistics, apply Bayes’ rule. Statistics may be wrong, but they are much less likely to be wrong than rumours. A source that was right in the past is more likely to be right at present than a previously inaccurate source. Science does not know everything, but this is not a reason to believe charlatans.

Evaluating the truth and the experts simultaneously

When evaluating an artwork, the guilt of a suspect or the quality of theoretical research, the usual procedure is to gather the opinions of a number of people and take some weighted average of these. There is no objective measure of the truth or the quality of the work. What weights should be assigned to different people’s opinions? Who should be counted an expert or knowledgeable witness?
A circular problem appears: the accurate witnesses are those who are close to the truth, and the truth is close to the average claim of the accurate witnesses. This can be modelled as a set of signals with unknown precision. Suppose the signals are normally distributed with mean equal to the truth (witnesses unbiased, just have poor memories). If the precisions were known, then these could be used as weights in the weighted average of the witness opinions, which would be an unbiased estimate of the truth with minimal variance. If the truth were known, then the distance of the opinion of a witness from it would measure the accuracy of that witness. But both precisions and the truth are unknown.
Simultaneously determining the precisions of the signals and the estimate of the truth may have many solutions. If there are two witnesses with different claims, we could assign the first witness infinite precision and the second finite, and estimate the truth to equal the opinion of the first witness. The truth is derived from the witnesses and the precisions are derived from the truth, so this is consistent. The same applies with witnesses switched.
A better solution takes a broader view and simultaneously estimates witness precisions and the truth. These form a vector of random variables. Put a prior probability distribution on this vector and use Bayes’ rule to update this distribution in response to the signals (the witness opinions).
The solution of course depends on the chosen prior. If one witness is assumed infinitely precise and the others finitely, then the updating rule keeps the infinite and finite precisions and estimates the truth to equal the opinion of the infinitely precise witness. The assumption of the prior seems unavoidable. At least it makes clear why the multiple solutions arise.

Retaking exams alters their informativeness

If only those who fail are allowed to retake an exam and it is not reported whether a grade comes from the first exam or a retake, then the failers get an advantage. They get a grade that is the maximum of two attempts, while others only get one attempt.
A simple example has two types of exam takers: H and L, with equal proportions in the population. The type may reflect talent or preparation for exam. There are three grades: A, B, C. The probabilities for each type to receive a certain grade from any given attempt of the exam are for H, Pr(A|H)=0.3, Pr(B|H)=0.6, Pr(C|H)=0.1 and for L, Pr(A|L)=0.2, Pr(B|L)=0.1, Pr(C|L)=0.7. The H type is more likely to get better grades, but there is noise in the grade.
After the retake of the exam, the probabilities for H to end up with each grade are Pr*(A|H)=0.33, Pr*(B|H)=0.66 and Pr*(C|H)=0.01. For L,  Pr*(A|L)=0.34, Pr*(B|L)=0.17 and Pr*(C|L)=0.49. So the L type ends up with an A grade more frequently than H, due to retaking exams 70% of the time as opposed to H’s 10%.
If the observers of the grades are rational, they will infer by Bayes’ rule Pr(H|A)=33/67, Pr(H|B)=66/83 and Pr(H|C)=1/50.
It is probably to counter the advantage of retakers that some universities in the UK discount grades obtained from retaking exams (http://www.telegraph.co.uk/education/universityeducation/10236397/University-bias-against-A-level-resit-pupils.html). In the University of Queensland, those who fail a course can take a supplementary exam, but the grade is distinguished on the transcript from the grade obtained on first try. Also, the maximum grade possible from taking a supplementary exam is one step above failure – the three highest grades cannot be obtained.