Bayesian updating of higher-order joint probabilities

Bayes’ rule uses a signal and the assumed joint probability distribution of signals and events to estimate the probability of an event of interest. Call this event a first-order event and the signal a first-order signal. Which joint probability distribution is the correct one is a second-order event, so second-order events are first-order probability distributions over first-order events and signals. The second-order signal consists of a first-order event and a first-order signal.

If the particular first-order joint probability distribution puts higher probability on the co-occurrence of this first-order event and signal than other first-order probability distributions, then observing this event and signal increases the likelihood of this particular probability distribution. The increase is by applying Bayes’ rule to update second-order events using second-order signals, which requires assuming a joint probability distribution of second-order signals and events. This second-order distribution is over first-order joint distributions and first-order signal-event pairs.

The third-order distribution is over second-order distributions and signal-event pairs. A second-order signal-event pair is a third-order signal. A second-order distribution is a third-order event.

A joint distribution of any order n may be decomposed into a marginal distribution over events and a conditional distribution of signals given events, where both the signals and the events are of the same order n. The conditional distribution of any order n>=2 is known by definition, because the n-order event is the joint probability distribution of (n-1)-order signals and events, thus the joint probability of a (n-1)-order signal-event pair (i.e., the n-order signal) given the n-order event (i.e., the (n-1)-order distribution) is the one listed in the (n-1)-order distribution.

The marginal distribution over events is an assumption above, but may be formulated as a new event of interest to be learned. The new signal in this case is the occurrence of the original event (not the marginal distribution). The empirical frequencies of the original events are a sufficient statistic for a sequence of new signals. To apply Bayes’ rule, a joint distribution over signals and the distributions of events needs to be assumed. The joint distribution itself may be learned from among many, over which there is a second-order joint distribution. Extending the Bayesian updating to higher orders proceeds as above. The joint distribution may again be decomposed into a conditional over signals and a marginal over events. The conditional is known by definition for all orders, now including the first, because the probability of a signal is the probability of occurrence of an original event, which is given by the marginal distribution (the new event) over the original events.

Returning to the discussion of learning the joint distributions, only the first-order events affect decisions, so only the marginal distribution over first-order events matters directly. The joint distributions of higher orders and the first-order conditional distribution only matter through their influence on updating the first-order marginal distribution.

The marginal of order n is the distribution over the (n-1)-order joint distributions. After reducing compound lotteries, the marginal of order n is the average of the (n-1)-order joint distributions. This average is itself a (n-1)-order joint distribution, which may be split into an (n-1)-order marginal and conditional, where if n-1>=2, the conditional is known. If the conditional is known, then the marginal may be again reduced as a compound lottery. Thus the hierarchy of marginal distributions of all orders collapses to the first-order joint distribution. This takes us back to the start – learning the joint distribution. The discussion above about learning a (second-order) marginal distribution (the first-order joint distribution) also applies. The empirical frequencies of signal-event pairs are the signals. Applying Bayes’ rule with some prior over joint distributions constitutes regularisation of the empirical frequencies to prevent overfitting to limited data.

Regularisation is itself learned from previous learning tasks, specifically the risk of overfitting in similar learning tasks, i.e. how non-representative a limited data set generally is. Learning regularisation in turn requires a prior belief over the joint distributions of samples and population averages. Applying regularisation learned from past tasks to the current one uses a prior belief over how similar different learning tasks are.

How to learn whether an information source is accurate

Two sources may be used to check each other over time. One of these sources may be your own senses, which show whether the event that the other source predicted occurred or not. The observation of an event is really another signal about the event. It is a noisy signal because your own eyes may lie (optical illusions, deepfakes).

First, one source sends a signal about the event, then the second source sends. You will never know whether the event actually occurred, but the second source is the aggregate of all the future information you receive about the event, so may be very accurate. The second source may send many signals in sequence about the event, yielding more info about the first source over time. Then the process repeats about a second event, a third, etc. This is how belief about the trustworthiness of a source is built.

You cannot learn the true accuracy of a source, because the truth is unavailable to your senses, so you cannot compare a source’s signals to the truth. You can only learn the consistency of different sources of sensory information. Knowing the correlation between various sensory sources is both necessary and sufficient for decision making, because your objective function (utility or payoff) is your perception of successfully achieving your goals. If your senses are deceived so you believe you have achieved what you sought, but actually have not, then you get the feeling of success, but if your senses are deceived to tell you you have failed, then you do not feel success even if you actually succeeded. The problem with deception arises purely from the positive correlation between the deceit and the perception of deceit. If deceit increases the probability that you later perceive you have been deceived and are unhappy about that perception, then deceit may reduce your overall utility despite initially increasing it temporarily. If you never suspect the deception, then your happiness is as if the deception was the truth.

Your senses send signals to your brain. We can interpret these signals as information about which hypothetical state of the world has occurred – we posit that there exists a world which may be in different states with various probabilities and that there is a correlation between the signals and these states. Based on the information, you update the probabilities of the states and choose a course of action. Actions result in probability distributions over different future sensations, which may be modelled as a different sensation in each state of the world, which have probabilities attached. (Later we may remove the states of the world from the model and talk about a function from past perceptions and actions into future perceptions. The past is only accessible through memory. Memory is a current perception, so we may also remove time from the model.)

You prefer some future sensations to others. These need not be sensory pleasures. These could be perceptions of having improved the world through great toil. You would prefer to choose an action that results in preferable sensations in the future. Which action this is depends on the state of the world.

To estimate the best action (the one yielding the most preferred sensations), you use past sensory signals. The interpretation of these signals depends on the assumed or learned correlation between the signals and the state. The assumption may be instinctive from birth. The learning is really about how sensations at a point in time are correlated with the combination of sensations and actions before that point. An assumption that the correlation is stable over time enables you to use past correlation to predict future correlation. This assumption in turn may be instinctive or learned.

The events most are interested in distinguishing are of the form “action A results in the most preferred sensations”, “action B causes the most preferred sensations”, “action A yields the least preferred sensations”. Any event that is useful to know is of a similar form by Blackwell’s theorem: information is useful if and only if it changes decisions.

The usefulness of a signal source depends on how consistent the signals it gives about the action-sensation links (events) are with your future perceptions. These future perceptions are the signals from the second source – your senses – against which the first source is checked. The signals of the second source have the form “memory of action A and a preferred sensation at present”. Optimal learning about the usefulness of the first source uses Bayes’ rule and a prior probability distribution on the correlations between the first source and the second. The events of interest in this case are the levels of correlation. A signal about these levels is whether the first source gave a signal that coincided with later sensory information.

If the first source recommended a “best action” that later yielded a preferred sensation, then this increases the probability of high positive correlation between the first source and the second on average. If the recommended action was followed by a negative sensation, then this raises the probability of a negative correlation between the sources. Any known correlation is useful information, because it helps predict the utility consequences of actions.

Counterfactuals should be mentioned as a side note. Even if an action A resulted in a preferred sensation, a different action B might have led to an even better sensation in the counterfactual universe where B was chosen instead. Of course, B might equally well have led to a worse sensation. Counterfactuals require a model to evaluate – what the output would have been after a different input depends on the assumed causal chain from inputs to outputs.

Whether two sources are separate or copies is also a learnable event.

Exaggerating vs hiding emotions

In some cultures, it was a matter of honour not to show emotions. Native American warriors famously had stony visages. Victorian aristocracy prided themselves in a stiff upper lip and unflappable manner. Winston Churchill describes in his memoirs how the boarding school culture, enforced by physical violence, was to show no fear. In other cultures, emotions are exaggerated. Teenagers in North America from 1990 to the present are usually portrayed as drama queens, as are arts people. Everything is either fabulous or horrible to them, no so-so experiences. I have witnessed the correctness of this portrayal in the case of teenagers. Jane Austen’s “Northanger Abbey” depicts Victorian teenagers as exaggerating their emotions similarly to their modern-day counterparts.

In the attention economy, exaggerating emotions is profitable to get and keep viewers. Traditional and social media portray situations as more extreme than these really are in order to attract eyeballs and clicks. Teenagers may have a similar motivation – to get noticed by their peers. Providing drama is an effective way. The notice of others may help attract sex partners or a circle of followers. People notice the strong emotions of others for evolutionary reasons, because radical action has a higher probability of following than after neutral communication. Radical action by others requires a quick accurate response to keep one’s health and wealth or take advantage of the radical actor.

A child with an injury or illness may pretend to suffer more than actually to get more care and resources from parents, especially compared to siblings. This is similar to the begging competition among bird chicks.

Exaggerating both praise and emotional punishment motivates others to do one’s bidding. Incentives are created by the difference in the consequences of different actions, so exaggerating this difference strengthens incentives, unless others see through the pretending. Teenagers may exaggerate their outward happiness and anger at what the parents do, in order to force the parents to comply with the teenager’s wishes.

On the other hand, in a zero-sum game, providing information to the other player cannot increase one’s own payoff and usually reduces it. Emotions are information about the preferences and plans of the one who shows these. In an antagonistic situation, such as negotiations or war between competing tribes, a poker face is an information security measure.

In short, creating drama is an emotional blackmail method targeting those with aligned interests. An emotionless front hides both weaknesses and strengths from those with opposed interests, so they cannot target the weakness or prepare for the precise strength.

Whether teenagers display or hide emotion is thus informative about whether they believe the surrounding people to be friends or enemies. A testable prediction is that bullied children suppress emotion and pretend not to care about anything, especially compared to a brain scan showing they actually care and especially when they are primed to recall the bullies. Another testable prediction is that popular or spoiled children exaggerate their emotions, especially around familiar people and when they believe a reward or punishment is imminent.

Signalling the precision of one’s information with emphatic claims

Chats both online and in person seem to consist of confident claims which are either extreme absolute statements (“vaccines don’t work at all”, “you will never catch a cold if you take this supplement”, “artificial sweeteners cause cancer”) or profess no knowledge (“damned if I know”, “we will never know the truth”), sometimes blaming the lack of knowledge on external forces (“of course they don’t tell us the real reason”, “the security services are keeping those studies secret, of course”, “big business is hiding the truth”). Moderate statements that something may or may not be true, especially off the center of all-possibilities-equal, and expressions of personal uncertainty (“I have not studied this enough to form an opinion”, “I have not thought this through”) are almost absent. Other than in research and official reports, I seldom encounter statements of the form “these are the arguments in this direction and those are the arguments in that direction. This direction is somewhat stronger.” or “the balance of the evidence suggests x” or “x seems more likely than not-x”. In opinion pieces in various forms of media, the author may give arguments for both sides, but in that case, concludes something like “we cannot rule out this and we cannot rule out that”, “prediction is difficult, especially now in a rapidly changing world”, “anything may happen”. The conclusion of the opinion piece does not recommend a moderate course of action supported by the balance of moderate-quality evidence.

The same person confidently claims knowledge of an extreme statement on one topic and professes certainty of no knowledge at all on another. What could be the goal of making both extreme and no-knowledge statements confidently? If the person wanted to pretend to be well-informed, then confidence helps with that, but claiming no knowledge would be counterproductive. Blaming the lack of knowledge on external forces and claiming that the truth is unknowable or will never be discovered helps excuse one’s lack of knowledge. The person can then pretend to be informed to the best extent possible (a constrained maximum of knowledge) or at least know more than others (a relative maximum).

Extreme statements suggest to an approximately Bayesian audience that the claimer has received many precise signals in the direction of the extreme statement and as a result has updated the belief far from the average prior belief in society. Confident statements also suggest many precise signals to Bayesians. The audience does not need to be Bayesian to form these interpretations – updating in some way towards the signal is sufficient, as is behavioural believing that confidence or extreme claims demonstrate the quality of the claimer’s information. A precisely estimated zero, such as confidently saying both x and not-x are equally likely, also signals good information. Similarly, being confident that the truth is unknowable.

Being perceived as having precise information helps influence others. If people believe that the claimer is well-informed and has interests more aligned than opposed to theirs, then it is rational to follow the claimer’s recommendation. Having influence is generally profitable. This explains the lack of moderate-confidence statements and claims of personal but not collective uncertainty.

A question that remains is why confident moderate statements are almost absent. Why not claim with certainty that 60% of the time, the drug works and 40% of the time, it doesn’t? Or confidently state that a third of the wage gap/racial bias/country development is explained by discrimination, a third by statistical discrimination or measurement error and a third by unknown factors that need further research? Confidence should still suggest precise information no matter what the statement is about.

Of course, if fools are confident and researchers honestly state their uncertainty, then the certainty of a statement shows the foolishness of the speaker. If confidence makes the audience believe the speaker is well-informed, then either the audience is irrational in a particular way or believes that the speaker’s confidence is correlated with the precision of the information in the particular dimension being talked about. If the audience has a long history of communication with the speaker, then they may have experience that the speaker is generally truthful, acts similarly across situations and expresses the correct level of confidence on unemotional topics. The audience may fail to notice when the speaker becomes a spreader of conspiracies or becomes emotionally involved in a topic and therefore is trying to persuade, not inform. If the audience is still relatively confident in the speaker’s honesty, then the speaker sways them more by confidence and extreme positions than by admitting uncertainty or a moderate viewpoint.

The communication described above may be modelled as the claimer conveying three-dimensional information with two two-dimensional signals. One dimension of the information is the extent to which the statement is true. For example, how beneficial is a drug or how harmful an additive. A second dimension is how uncertain the truth value of the statement is – whether the drug helps exactly 55% of patients or may help anywhere between 20 and 90%, between which all percentages are equally likely. A third dimension is the minimal attainable level of uncertainty – how much the truth is knowable in this question. This is related to whether some agency is actively hiding the truth or researchers have determined it and are trying to educate the population about it. The second and third dimensions are correlated. The lower is the lowest possible uncertainty, the more certain the truth value of the statement can be. It cannot be more certain than the laws of physics allow.

The two dimensions of one signal (the message of the claimer) are the extent to which the statement is true and how certain the claimer is of the truth value. Confidence emphasises that the claimer is certain about the truth value, regardless of whether this value is true or false. The claim itself is the first dimension of the signal. The reason the third dimension of the information is not part of the first signal is that the claim that the truth is unknowable is itself a second claim about the world, i.e. a second two-dimensional signal saying how much some agency is hiding or publicising the truth and how certain the speaker is of the direction and extent of the agency’s activity.

Opinion expressers in (social) media usually choose an extreme value for both dimensions of both signals. They claim some statement about the world is either the ultimate truth or completely false or unknowable and exactly in the middle, not a moderate distance to one side. In the second dimension of both signals, the opinionated people express complete certainty. If the first signal says the statement is true or false, then the second signal is not sent and is not needed, because if there is complete certainty of the truth value of the statement, then the statement must be perfectly knowable. If the first signal says the statement is fifty-fifty (the speaker does not know whether true or false), then in the second signal, the speaker claims that the truth is absolutely not knowable. This excuses the speaker’s claimed lack of knowledge as due to an objective impossibility, instead of the speaker’s limited data and understanding.

A “chicken paper” example

The Nobel prize winner Ed Prescott introduced the term “chicken paper” to describe a certain kind of economics research article to the audience at ANU in a public lecture. For background, a macroeconomics paper commonly models the economy as a game (in the game theory sense) between households, sometimes adding the government, firms or banks as additional players. A chicken paper relies on three assumptions: 1) households like chicken, 2) households cannot produce chicken, 3) the government can provide chicken. Prescott’s point was to criticize papers that prove that the intervention of the government in the economy improves welfare. For some papers, such criticism on the grounds of “assuming the result” is justified, for some, not. This applies more broadly than just in macroeconomics.

One example that I think fits Prescott’s description is Woodford (2021, forthcoming in the American Economic Review), pages 10-11:We suppose that units are unable to credibly promise to repay, except to the extent that the government allows them to issue debt up to a certain limit, the repayment of which is guaranteed by the government. (We assume also that the government is able to force borrowers to repay these guaranteed debts, rather than bearing any losses itself.)” The “units” that Woodford refers to are households, which are also the only producers of goods in the model. Such combined producer-consumers are called yeoman farmers and are a reasonable simplification for modelling purposes.

The inefficiency that the government solves in Woodford (2021) is the one discussed in Hirshleifer (1971) section V (page 568) that public information destroys mutually beneficial trading and insurance opportunities. In Woodford (2021), a negative shock to exactly one industry out of N in the economy occurs and becomes public at time 0 before trade opens. Thus the industries cannot trade contingent claims to insure against this shock. They are informed of the shock before trade. However, the government can make a transfer at time 0 to the shock-affected industry and tax it back later from all industries.

If the government also has to start its subsidizing and taxing after trade opens, it can still provide “retrospective insurance” as Woodford calls it by taxes and subsidies. Market-based “insurance” would also work: the affected industry borrows against the collateral of the government subsidy that is anticipated to arrive in the same period.

P-value cannot be less than 1/1024 in ten binary choices

Baez-Mendoza et al (2021) claim that for rhesus macaques choosing which of two others to reward in each trial, „the difference in the other’s reputation based on past interactions (i.e., how likely they were to reciprocate over the past 20 trials) had a significant effect on the animal’s choices [odds ratio (OR) = 1.54, t = 9.2, P = 3.5 × 10^-20; fig. S2C]”.

In 20 trials, there are ten chances to reciprocate if I understand the meaning of reciprocation in the study (monkey x gives a reward to the monkey who gave x a reward in the last trial). Depending on interpretation, there are 6-10 chances to react to reciprocation. Six if three trials are required for each reaction: the trial in which a monkey acts, the trial in which another monkey reciprocates and the trial in which a monkey reacts to the reciprocation. Ten if the reaction can coincide with the initial act of the next action-reciprocation pair.

Under the null hypothesis that the monkey allocates rewards randomly, the probability of giving the reward to the monkey who previously reciprocated the most 10 times out of 10 is 1/1024. The p-value is the probability that the observed effect is due to chance, given the null hypothesis. So the p-value cannot be smaller than about 0.001 for a 20-trial session, which offers at most 10 chances to react to reciprocation. The p-value cannot be 3.5*10^-20 as Baez-Mendoza et al (2021) claim. Their supplementary material does not offer an explanation of how this p-value was calculated.

Interpreting reciprocation or trials differently so that 20 trials offer 20 chances to reciprocate, the minimal p-value is 1/1048576, approximately 10^-6, again far from 3.5*10^-20.

A possible explanation is the sentence “The group performed an average of 105 ± 8.7 (mean ± SEM) trials per session for a total of 22 sessions.” If the monkey has a chance to react to past reciprocation in a third of the 105*22 sessions, then the p-value can indeed be of the order 10^-20. It would be interesting to know how the authors divide the trials into the reputation-building and reaction blocks.

Symmetry of matter seems impossible

I am not a physicist, so the following may be my misunderstanding. Symmetry seems theoretically impossible, except at one instant. If there was a perfectly symmetric piece of matter (after rotating or reflecting it around some axis, the set of locations of its atoms would be the same as before, just a possibly different atom in each location), then in the next instant of time, its atoms would move to unpredictable locations by the Heisenberg uncertainty principle (the location and momentum of a particle cannot be simultaneously determined). This is because the locations of the atoms would be known by symmetry in the first instant, thus their momenta unknown.

Symmetry may not provide complete information about the locations of the atoms, but constrains their possible locations. Such an upper bound on the uncertainty about locations puts a lower bound on the uncertainty about momenta. Momentum uncertainty creates location uncertainty in the next instant.

Symmetry is probably an approximation: rotating or reflecting a piece of matter, its atoms are in locations close to the previous locations of its atoms. Again, an upper bound on the location uncertainty about the atoms should put a lower bound on the momentum uncertainty. If the atoms move in uncertain directions, then the approximate location symmetry would be lost at some point in time, both in the future and the past.

Less inspiring people in universities than in early school

A student claimed that fewer inspiring people are found in universities than in early school. Empirical checks of this would be interesting and would need a measure of inspiringness. A theoretical explanation is a tradeoff between multiple dimensions: subject matter competence, integrity, reliability, communication skills, being inspiring, etc. The tradeoff is on both the demand and the supply side. An inspiring competent person has many career options (CEO, politician, entrepreneur) besides academia, so fewer such people end up supplying their labour to the education sector.

On the demand side, a university has to prioritise dimensions on which to rank candidates and hire, given its salary budget and capacity constraints on how many job positions it has. Weighting competence more leaves less emphasis on inspiringness. Competing universities may prioritise different dimensions (be horizontally differentiated), in which case on average each institution gets candidates who have more of its preferred dimension and less of other dimensions.

As a side note, what an organisation says its priorities are may differ from its actual priorities, which are evidenced by behaviour, e.g., who it hires. It may say it values teaching with passion, but hire based on research success instead.

A constraint is a special case of a tradeoff. Suppose that given the minimum required competence, an employer wants to hire the most inspiring person. The higher this level of competence (teaching PhD courses vs kindergarten), the fewer people satisfy the constraint. At a high enough level of the constraint, there may be insufficient candidates in the world to fill all the vacant jobs. Some employers cannot fill the position, others will have just one candidate. Maximising inspiringness over an empty set, or a set of one, is unlikely to yield very inspiring people.

It may be inherently simpler to inspire with easier material, in which case even with equally inspiring people throughout all levels of education, the later stages will seem less inspiring.

Larger leaps through theory may be required as a subject gets more advanced, leaving less scope for inspiring anecdotes and real-life examples. The ivory tower is often accused of being out of touch with common experience. Parting with everyday life is partly inevitable for developing any specialised skill, otherwise the skill would be an everyday one, not specialised.

If inspiring people requires manipulating them, and more educated individuals resist manipulation better, then inspiring people gets more difficult with each level of education. Each stage of study selects on average the more intelligent graduates of the previous stage, so if smarter people are harder to manipulate, then those with higher levels of education are harder to inspire. On the other hand, if academics are naive and out of touch with the ways of the world, then they may be easier to manipulate and inspire than schoolchildren.

People accumulate interests and responsibilities in their first half of life. The more hobbies and duties, the less scope for adopting a goal proposed by some charismatic person, i.e., getting inspired by them. Later in life, many goals may have been achieved and people may have settled down for a comfortable existence. They are then less inclined to believe the need to follow a course that an inspiring person claims is a way of reaching their goals.

Receive-only mode for phones to save power

Airplane mode cuts off all or non-wifi communication, which is undesirable. Receive-only mode would allow receipt of texts and recorded messages, save power and prevent detection of the phone by radio frequency methods (not by a metal detector). If the phone is stationary, then there is no need for it to send periodic keep-alive or hand-off signals to the cell phone tower. The phone’s accelerometer and GPS receiver can detect with reasonable accuracy whether it stays in the same cell tower’s range. Only when the phone moves a large enough distance will sending hand-off or check-in signals become necessary.

Location can also be detected using the radio receiver of the phone (which every phone has for calls and texts) if multiple cellphone towers are in range – just triangulate. A saved map of tower coverage areas in the phone helps position the phone and detect when the phone moves to a different tower’s area.

A software modification should be enough to create a receive-only mode: turn off sending (supplying power to the antenna) but keep receiving (measure and record the voltage and current in the antenna). Add optional deactivation of the receive-only mode based on the accelerometer and GPS detecting the phone moving out of range of the current cell tower.

Recumbent bicycle bunny hop in theory

I have not tried this, so it is just speculation. There are many claims online that a recumbent bike cannot be bunny hopped. However, lifting the front wheel should be possible while sitting on the bike, because lifting the front caster of an office chair is possible without touching the floor. Lean forward, then slam your torso back against the backrest – careful that you don’t tip over backward. Your legs may be lifted or the feet may rest on top of the “spider” at the bottom of the chair.

On a recumbent, a further boost comes from suddenly pedalling hard in low gear, which accelerates the rear wheel forward and under, rotating the front wheel up around the pivot of the rear wheel.

Lifting the rear wheel of a recumbent should be possible while seated, because popping your butt off the floor when sitting with straight legs is possible without using your leg muscles. Put your fists on the floor slightly behind and to the side of your hips. Bend your elbows, then suddenly straighten them, pushing explosively against the floor. Your butt and your fists lift a few inches. Keep your legs locked straight. Very strong people can do this with legs lifted (in boat pose: body in V-shape with only the butt and fists touching the floor).

Because lifting each wheel is possible and the movements do not directly oppose each other, a recumbent should be bunnyhoppable. Lift first the front and then the rear wheel.