Tag Archives: Bayes’ rule

Clinical trials of other drugs in other species to predict a drug’s effect in humans

Suppose we want to know whether a drug is safe or effective for humans, but do not have data on what it does in humans, only on its effects in mice, rats, rhesus macaques and chimpanzees. In general, we can predict the effect of the drug on humans better with the animal data than without it. Information on “nearby” realisations of a random variable (effect of the drug) helps predict the realisation we are interested in. The method should weight nearby observations more than observations further away when predicting. For example, if the drug has a positive effect in animals, then predicts a positive effect in humans, and the larger the effect in animals, the greater the predicted effect in humans.

A limitation of weighting is that it does not take into account the slope of the effect when moving from further observations to nearer. For example, a very large effect of the drug in mice and rats but a small effect in macaques and chimpanzees predicts the same effect in humans as a small effect in rodents and a large one in monkeys and apes, if the weighted average effect across animals is the same in both cases. However, intuitively, the first case should have a smaller predicted effect in humans than the second, because moving to animals more similar to humans, the effect becomes smaller in the first case but larger in the second. The idea is similar to a proportional integral-derivative (PID) controller in engineering.

The slope of the effect of the drug is extra information that increases the predictive power of the method if the assumption that the similarity of effects decreases in genetic distance holds. Of course, if this assumption fails in the data, then imposing it may result in bias.

Assumptions may be imposed on the method using constrained estimation. One constraint is the monotonicity of the effect in some measure of distance between observations. The method may allow for varying weights by adding interaction terms (e.g., the effect of the drug times genetic similarity). The interaction terms unfortunately require more data to estimate.

Extraneous information about the slope of the effect helps justify the constraints and reduces the need for adding interaction terms, thus decreases the data requirement. An example of such extra information is whether the effects of other drugs that have been tested in these animals as well as humans were monotone in genetic distance. Using information about these other drugs imposes the assumption that the slopes of the effects of different drugs are similar. The similarity of the slopes should intuitively depend on the chemical similarity of the drugs, with more distant drugs having more different profiles of effects across animals.

The similarity of species in terms of the effects drugs have on them need not correspond to genetic similarity or the closeness of any other observable characteristic of these organisms, although often these similarities are similar. The similarity of interest is how similar the effects of the drug are across these species. Estimating this similarity based on the similarity of other drugs across these animals may also be done by a weighted regression, perhaps with constraints or added interaction terms. More power for the estimation may be obtained from simultaneous estimation of the drug-effect-similarity of the species and the effect of the drug in humans. An analogy is demand and supply estimation in industrial organisation where observations about each side of the market give information about the other side. Another analogy is duality in mathematics, in this case between the drug-effect-similarity of the species and the given drug’s similarity of effects across these species.

The similarity of drugs in terms of their effects on each species need not correspond to chemical similarity, although it often does. The similarity of interest for the drugs is how similar their effects are in humans, and also in other species.

The inputs into the joint estimation of drug similarity, species similarity and the effect of the given drug in humans are the genetic similarity of the species, the chemical similarity of the drugs and the effects for all drug-species pairs that have been tested. In the matrix where the rows are the drugs and the columns the species, we are interested in filling in the cell in the row “drug of interest” and the column “human”. The values in all the other cells are informative about this cell. In other words, there is a benefit from filling in these other cells of the matrix.

Given the duality of drugs and species in the drug effect matrix, there is information to be gained from running clinical trials of chemically similar human-use-approved drugs in species in which the drug of interest has been tested but the chemically similar ones have not. The information is directly about the drug-effect-similarity of these species to humans, which indirectly helps predict the effect of the drug of interest in humans from the effects of it in other species. In summary, testing other drugs in other species is informative about what a given drug does in humans. Adapting methods from supply and demand estimation, or otherwise combining all the data in a principled theoretical framework, may increase the information gain from these other clinical trials.

Extending the reasoning, each (species, drug) pair has some unknown similarity to the (human, drug of interest) pair. A weighted method to predict the effect in the (human, drug of interest) pair may gain power from constraints that the similarity of different (species, drug) pairs increases in the genetic closeness of the species and the chemical closeness of the drugs.

Define Y_{sd} as the effect of drug d in species s. Define X_{si} as the observable characteristic (gene) i of species s. Define X_{dj} as the observable characteristic (chemical property) j of drug d. The simplest method is to regress Y_{sd} on all the X_{si} and X_{dj} and use the coefficients to predict the Y_{sd} of the (human, drug of interest) pair. If there are many characteristics i and j and few observations Y_{sd}, then variable selection or regularisation is needed. Constraints may be imposed, like X_{si}=X_i for all s and X_{dj}=X_j for all d.

Fused LASSO (least absolute shrinkage and selection operator), clustered LASSO and prior LASSO seem related to the above method.

Dilution effect explained by signalling

Signalling confidence in one’s arguments explains the dilution effect in marketing and persuasion. The dilution effect is that the audience averages the strength of a persuader’s arguments instead of adding the strengths. More arguments in favour of a position should intuitively increase the confidence in the correctness of this position, but empirically, adding weak arguments reduces people’s belief, which is why drug advertisements on US late-night TV list mild side effects in addition to serious ones. The target audience of these ads worries less about side effects when the ad mentions more slight problems with the drug, although additional side effects, whether weak or strong, should make the drug worse.

A persuader who believes her first argument to be strong enough to convince everyone does not waste valuable time to add other arguments. Listeners evaluate arguments partly by the confidence they believe the speaker has in these claims. This is rational Bayesian updating because a speaker’s conviction in the correctness of what she says is positively correlated with the actual validity of the claims.

A countervailing effect is that a speaker with many arguments has spent significant time studying the issue, so knows more precisely what the correct action is. If the listeners believe the bias of the persuader to be small or against the action that the arguments favour, then the audience should rationally believe a better-informed speaker more.

An effect in the same direction as dilution is that a speaker with many arguments in favour of a choice strongly prefers the listeners to choose it, i.e. is more biased. Then the listeners should respond less to the persuader’s effort. In the limit when the speaker’s only goal is always for the audience to comply, at any time cost of persuasion, then the listeners should ignore the speaker because a constant signal carries no information.

Modelling

Start with the standard model of signalling by information provision and then add countersignalling.

The listeners choose either to do what the persuader wants or not. The persuader receives a benefit B if the listeners comply, otherwise receives zero.

The persuader always presents her first argument, otherwise reveals that she has no arguments, which ends the game with the listeners not doing what the persuader wants. The persuader chooses whether to spend time at cost c>0, c<B to present her second argument, which may be strong or weak. The persuader knows the strength of the second argument but the listeners only have the common prior belief that the probability of a strong second argument is p0. If the second argument is strong, then the persuader is confident, otherwise not.

If the persuader does not present the second argument, then the listeners receive an exogenous private signal in {1,0} about the persuader’s confidence, e.g. via her subconscious body language. The probabilities of the signals are Pr(1|confident) =Pr(0|not) =q >1/2. If the persuader presents the second argument, then the listeners learn the confidence with certainty and can ignore any signals about it. Denote by p1 the updated probability that the audience puts on the second argument being strong.

If the speaker presents a strong second argument, then p1=1, if the speaker presents a weak argument, then p1=0, if the speaker presents no second argument, then after signal 1, the audience updates their belief to p1(1) =p0*q/(p0*q +(1-p0)*(1-q)) >p0 and after signal 0, to p1(0) =p0*(1-q)/(p0*(1-q) +(1-p0)*q) <p0.

The listeners prefer to comply (take action a=1) when the second argument of the persuader is strong, otherwise prefer not to do what the persuader wants (action a=0). At the prior belief p0, the listeners prefer not to comply. Therefore a persuader with a strong second argument chooses max{B*1-c, q*B*1 +(1-q)*B*0} and presents the argument iff (1-q)*B >c. A persuader with a weak argument chooses max{B*0-c, (1-q)*B*1 +q*B*0}, always not to present the argument. If a confident persuader chooses not to present the argument, then the listeners use the exogenous signal, otherwise use the choice of presentation to infer the type of the persuader.

One extension is that presenting the argument still leaves some doubt about its strength.

Another extension has many argument strength levels, so each type of persuader sometimes presents the second argument, sometimes not.

In this standard model, if the second argument is presented, then always by the confident type. As is intuitive, the second argument increases the belief of the listeners that the persuader is right. Adding countersignalling partly reverses the intuition – a very confident type of the persuader knows that the first argument already reveals her great confidence, so the listeners do what the very confident persuader wants. The very confident type never presents the second argument, so if the confident type chooses to present it, then the extra argument reduces the belief of the audience in the correctness of the persuader. However, compared to the least confident type who also never presents the second argument, the confident type’s second argument increases the belief of the listeners.

If top people have families and hobbies, then success is not about productivity

Assume:

1 Productivity is continuous and weakly increasing in talent and effort.

2 The sum of efforts allocated to all activities is bounded, and this bound is similar across people.

3 Families and hobbies take some effort, thus less is left for work. (For this assumption to hold, it may be necessary to focus on families with children in which the partner is working in a different field. Otherwise, a stay-at-home partner may take care of the cooking and cleaning, freeing up time for the working spouse to allocate to work. A partner in the same field of work may provide a collaboration synergy. In both cases, the productivity of the top person in question may increase.)

4 The talent distribution is similar for people with and without families or hobbies. This assumption would be violated if for example talented people are much better at finding a partner and starting a family.

Under these assumptions, reasonably rational people would be more productive without families or hobbies. If success is mostly determined by productivity, then people without families should be more successful on average. In other words, most top people in any endeavour would not have families or hobbies that take time away from work.

In short, if responsibilities and distractions cause lower productivity, and productivity causes success, then success is negatively correlated with such distractions. Therefore, if successful people have families with a similar or greater frequency as the general population, then success is not driven by productivity.

One counterargument is that people first become successful and then start families. In order for this to explain the similar fractions of singles among top and bottom achievers, the rate of family formation after success must be much greater than among the unsuccessful, because catching up from a late start requires a higher rate of increase.

Another explanation is irrationality of a specific form – one which reduces the productivity of high effort significantly below that of medium effort. Then single people with lots of time for work would produce less through their high effort than those with families and hobbies via their medium effort. Productivity per hour naturally falls with increasing hours, but the issue here is total output (the hours times the per-hour productivity). An extra work hour has to contribute negatively to success to explain the lack of family-success correlation. One mechanism for a negative effect of hours on output is burnout of workaholics. For this explanation, people have to be irrational enough to keep working even when their total output falls as a result.

If the above explanations seem unlikely but the assumptions reasonable in a given field of human endeavour, then reaching the top and staying there is mostly not about productivity (talent and effort) in this field. For example, in academic research.

A related empirical test of whether success in a given field is caused by productivity is to check whether people from countries or groups that score highly on corruption indices disproportionately succeed in this field. Either conditional on entering the field or unconditionally. In academia, in fields where convincing others is more important than the objective correctness of one’s results, people from more nepotist cultures should have an advantage. The same applies to journals – the general interest ones care relatively more about a good story, the field journals more about correctness. Do people from more corrupt countries publish relatively more in general interest journals, given their total publications? Of course, conditional on their observable characteristics like the current country of employment.

Another related test for meritocracy in academia or the R&D industry is whether coauthored publications and patents are divided by the number of coauthors in their influence on salaries and promotions. If there is an established ranking of institutions or job titles, then do those at higher ranks have more quality-weighted coauthor-divided articles and patents? The quality-weighting is the difficult part, because usually there is no independent measure of quality (unaffected by the dependent variable, be it promotions, salary, publication venue).

Putting your money where your mouth is in policy debates

Climate change deniers should put their money where their mouth is by buying property in low-lying coastal areas or investing in drought-prone farmland. Symmetrically, those who believe the Earth is warming as a result of pollution should short sell climate-vulnerable assets. Then everyone eventually receives the financial consequences of their decisions and claimed beliefs. The sincere would be happy to bet on their beliefs, anticipating positive profit. Of course, the beliefs have to be somewhat dogmatic or the individuals in question risk-loving, otherwise the no-agreeing-to-disagree theorem would preclude speculative trade (opposite bets on a common event).

Governments tend to compensate people for widespread damage from natural disasters, because distributing aid is politically popular and there is strong lobbying for this free insurance. This insulates climate change deniers against the downside risk of buying flood- or wildfire-prone property. To prevent the cost of the damages from being passed to the taxpayers, the deniers should be required to buy insurance against disaster risk, or to sign contracts with (representatives of) the rest of society agreeing to transfer to others the amount of any government compensation they receive after flood, drought or wildfire. Similarly, those who short sell assets that lose value under a warming climate (or buy property that appreciates, like Arctic ports, under-ice mining and drilling rights) should not be compensated for the lost profit if the warming does not take place.

In general, forcing people to put their money where their mouth is would avoid wasting time on long useless debates (e.g. do high taxes reduce economic growth, does a high minimum wage raise unemployment, do tough punishments deter crime). Approximately rational people would doubt the sincerity of anyone who is not willing to bet on her or his beliefs, so one’s credibility would be tied to one’s skin in the game: a stake in the claim signals sincerity. Currently, it costs pundits almost nothing to make various claims in the media – past wrong statements are quickly forgotten, not impacting the reputation for accuracy much. 

The bets on beliefs need to be legally enforceable, so have to be made on objectively measurable events, such as the value of a publicly traded asset. By contrast, it is difficult to verify whether government funding for the arts benefits culture, or whether free public education is good for civil society, therefore bets on such claims would lead to legal battles. The lack of enforceability would reduce the penalty for making false statements, thus would not deter lying or shorten debates much.

An additional benefit from betting on (claimed) beliefs is to provide insurance to those harmed by the actions driven by these beliefs. For example, climate change deniers claim small harm from air pollution. Their purchases of property that will be damaged by a warming world allows climate change believers to short sell such assets. If the Earth then warms, then the deniers lose money and the believers gain at their expense. This at least partially compensates the believers for the damage caused by the actions of the deniers.

Why rational agents may react negatively to honesty

Emotional people may of course dislike an honest person, just because his truthful opinion hurt their feelings. In contrast, rational agents’ payoff cannot decrease when they get additional information, so they always benefit from honest feedback. However, rational decision makers may still adjust their attitude to be more negative towards a person making truthful, informative statements. The reason is Bayesian updating about two dimensions: the honesty of the person and how much the person cares about the audience’s feelings. Both dimensions of belief positively affect attitude towards the person. His truthful statements increase rational listeners’ belief about his honesty, but may reduce belief in his tactfulness, which may shift rational agents’ opinions strongly enough in the negative direction to outweigh the benefit from honesty.

The relative effect of information about how much the person cares, compared to news about his honesty, is greater when the latter is relatively more certain. In the limit, if the audience is completely convinced that the person is honest (or certain of his dishonesty), then the belief about his honesty stays constant no matter what he does, and only the belief about tact moves. Then telling an unpleasant truth unambiguously worsens the audience’s attitude. Thus if a reasonably rational listener accuses a speaker of „brutal honesty” or tactlessness, then it signals that the listener is relatively convinced either that the speaker is a liar or that he is a trustworthy type. Therefore an accusation of tactlessness may be taken as an insult or a compliment, depending on one’s belief about the accuser’s belief about one’s honesty.

If tact takes effort, and the cost of this effort is lower for those who care about the audience’s emotions, then pleasant comments are an informative signal (in the Spence signalling sense) that the speaker cares about the feelings of others. In that case the inference that brutal honesty implies an uncaring nature is correct.

On the other hand, if the utility of rational agents only depends on the information content of statements, not directly on their positive or negative emotional tone, then the rational agents should not care about the tact of the speaker. In this case, there is neither a direct reason for the speaker to avoid unpleasant truths (out of altruism towards the audience), nor an indirect benefit from signalling tactfulness. Attitudes would only depend on one dimension of belief: the one about honesty. Then truthfulness cannot have a negative effect.

Higher order beliefs may still cause honesty to be interpreted negatively even when rational agents’ utility does not depend on the emotional content of statements. The rational listeners may believe that the speaker believes that the audience’s feelings would be hurt by negative comments (for example, the speaker puts positive probability on irrational listeners, or on their utility directly depending on the tone of the statements they hear), in which case tactless truthtelling still signals not caring about others’ emotions.

On the optimal burden of proof

All claims should be considered false until proven otherwise, because lies can be invented much faster than refuted. In other words, the maker of a claim has the burden of providing high-quality scientific proof, for example by referencing previous research on the subject. Strangely enough, some people seem to believe marketing, political spin and conspiracy theories even after such claims have been proven false. It remains to wish that everyone received the consequences of their choices (so that karma works).
Considering all claims false until proven otherwise runs into a logical problem: a claim and its opposite claim cannot be simultaneously false. The priority for falsity should be given to actively made claims, e.g. someone saying that a product or a policy works, or that there is a conspiracy behind an accident. Especially suspect are claims that benefit their maker if people believe them. A higher probability of falsity should also be attached to positive claims, e.g. that something has an effect in whatever direction (as opposed to no effect) or that an event is due to non-obvious causes, not chance. The lack of an effect should be the null hypothesis. Similarly, ignorance and carelessness, not malice, should be the default explanation for bad events.
Sometimes two opposing claims are actively made and belief in them benefits their makers, e.g. in politics or when competing products are marketed. This is the hardest case to find the truth in, but a partial and probabilistic solution is possible. Until rigorous proof is found, one should keep an open mind. Keeping an open mind creates a vulnerability to manipulation: after some claim is proven false, its proponents often try to defend it by asking its opponents to keep an open mind, i.e. ignore evidence. In such cases, the mind should be closed to the claim until its proponents provide enough counter-evidence for a neutral view to be reasonable again.
To find which opposing claim is true, the first test is logic. If a claim is logically inconsistent with itself, then it is false by syntactic reasoning alone. A broader test is whether the claim is consistent with other claims of the same person. For example, Vladimir Putin said that there were no Russian soldiers in Crimea, but a month later gave medals to some Russian soldiers, citing their successful operation in Crimea. At least one of the claims must be false, because either there were Russian soldiers in Crimea or not. The way people try to weasel out of such self-contradictions is to say that the two claims referred to different time periods, definitions or circumstances. In other words, change the interpretation of words. A difficulty for the truth-seeker is that sometimes such a change in interpretation is a legitimate clarification. Tongues do slip. Nonetheless, a contradiction is probabilistic evidence for lying.
The second test for falsity is objective evidence. If there is a streetfight and the two sides accuse each other of starting it, then sometimes a security camera video can refute one of the contradicting claims. What evidence is objective is, sadly, subject to interpretation. Videos can be photoshopped, though it is difficult and time-consuming. The objectivity of the evidence is strongly positively correlated with the scientific rigour of its collection process. „Hard” evidence is a signal of the truth, but a probabilistic signal. In this world, most signals are probabilistic.
The third test of falsity is the testimony of neutral observers, preferably several of them, because people misperceive and misremember even under the best intentions. The neutrality of observers is again up for debate and interpretation. In some cases, an observer is a statistics-gathering organisation. Just like objective evidence, testimony and statistics are probabilistic signals.
The fourth test of falsity is the testimony of interested parties, to which the above caveats apply even more strongly.
Integrating conflicting evidence should use Bayes’ rule, because it keeps probabilities consistent. Consistency helps glean information about one aspect of the question from data on other aspects. Background knowledge should be combined with the evidence, for example by ruling out physical impossibilities. If a camera shows a car disappearing behind a corner and immediately reappearing, moving in the opposite direction, then physics says that the original car couldn’t have changed direction so fast. The appearing car must be a different one. Knowledge of human interactions and psychology is part of the background information, e.g. if smaller, weaker and outnumbered people rarely attack the stronger and more numerous, then this provides probabilistic info about who started a fight. Legal theory incorporates background knowledge of human nature to get information about the crime – human nature suggests motives. Asking: „Who benefits?” has a long history in law.

On simple answers

Bayes’ rule exercise: is a simple or a complicated answer to a complicated problem more likely to be correct?

Depends on the conditional probabilities: if simple questions are more likely to have simple answers and complex questions complicated, then a complicated answer is more likely to be correct for a complicated problem.

It seems reasonable that the complexity of the answer is correlated with the difficulty of the problem. But this is an empirical question.

If difficult problems are likely to have complex answers, then this is an argument against slogans and ideologies. These seek to give a catchy one-liner as the answer to many problems in society. No need to think – ideology has the solution. Depending on your political leaning, poverty may be due to laziness or exploitation. The foreign policy “solution” is bombing for some, eternal appeasement for others.

The probabilistic preference for complex answers in complicated situations seems to contradict Occam’s razor (among answers equally good at explaining the facts, the simplest answer should be chosen). There is no actual conflict with the above Bayesian exercise. There, the expectation of a complex answer applies to complicated questions, while a symmetric anticipation of a simple answer holds for simple problems. The answers compared are not equally good, because one fits the structure of the question better than the other.

Which ideology is more likely to be wrong?

Exercise in Bayes’ rule: is an ideology more likely to be wrong if it appeals relatively more to poor people than the rich?

More manipulable folks are more likely to lose their money, so less likely to be rich. Stupid people have a lower probability of making money. By Bayes, the rich are on average less manipulable and more intelligent than the poor.

Less manipulable people are less likely to find an ideology built on fallacies appealing. By Bayes, an ideology relatively more appealing to the stupid and credulous is more likely to be wrong. Due to such people being poor with a higher probability, an ideology embraced more by the poor than the rich is more likely to be fallacious.

Another exercise: is an ideology more likely to be wrong if academics like it relatively more than non-academics?

Smarter people are more likely to become academics, so by Bayes’ rule, academics are more likely to be smart. Intelligent people have a relatively higher probability of liking a correct ideology, so by Bayes, an ideology appealing to the intelligent is more likely to be correct. An ideology liked by academics is correct with a higher probability.

A random world as an argument against fanatism

Theoretical physicists may debate whether the universe is random or not, but for practical purposes it is, because any sufficiently complicated deterministic system looks random to someone who does not fully understand it. This is the example from Lipman (1991) “How to decide how to decide…”: the output of a complicated deterministic function that is written down still looks random to a person who cannot calculate its output.
If the world is random, we should not put probability one on any event. Nothing is certain, so any fanatical belief that some claim is certainly true is almost certainly wrong. This applies to religion, ideology, personal memories and also things right before your eyes. The eyes can deceive, as evidenced by the numerous visual illusions invented and published in the past. If you see your friend, is that really the same person? How detailed a memory of your friend’s face do you have? Makeup can alter appearance quite radically (http://www.mtv.com/news/1963507/woman-celebrity-makeup-transformation/).
This way lies paranoia, but actually in a random world, a tiny amount of paranoia about everything is appropriate. A large amount of paranoia, say putting probability more than 1% on conspiracy theories, is probably a wrong belief.
How to know whether something is true then? A famous quote: “Everything is possible, but not everything is likely” points the way. Use logic and statistics, apply Bayes’ rule. Statistics may be wrong, but they are much less likely to be wrong than rumours. A source that was right in the past is more likely to be right at present than a previously inaccurate source. Science does not know everything, but this is not a reason to believe charlatans.

Evaluating the truth and the experts simultaneously

When evaluating an artwork, the guilt of a suspect or the quality of theoretical research, the usual procedure is to gather the opinions of a number of people and take some weighted average of these. There is no objective measure of the truth or the quality of the work. What weights should be assigned to different people’s opinions? Who should be counted an expert or knowledgeable witness?
A circular problem appears: the accurate witnesses are those who are close to the truth, and the truth is close to the average claim of the accurate witnesses. This can be modelled as a set of signals with unknown precision. Suppose the signals are normally distributed with mean equal to the truth (witnesses unbiased, just have poor memories). If the precisions were known, then these could be used as weights in the weighted average of the witness opinions, which would be an unbiased estimate of the truth with minimal variance. If the truth were known, then the distance of the opinion of a witness from it would measure the accuracy of that witness. But both precisions and the truth are unknown.
Simultaneously determining the precisions of the signals and the estimate of the truth may have many solutions. If there are two witnesses with different claims, we could assign the first witness infinite precision and the second finite, and estimate the truth to equal the opinion of the first witness. The truth is derived from the witnesses and the precisions are derived from the truth, so this is consistent. The same applies with witnesses switched.
A better solution takes a broader view and simultaneously estimates witness precisions and the truth. These form a vector of random variables. Put a prior probability distribution on this vector and use Bayes’ rule to update this distribution in response to the signals (the witness opinions).
The solution of course depends on the chosen prior. If one witness is assumed infinitely precise and the others finitely, then the updating rule keeps the infinite and finite precisions and estimates the truth to equal the opinion of the infinitely precise witness. The assumption of the prior seems unavoidable. At least it makes clear why the multiple solutions arise.