Tag Archives: science

How to learn whether an information source is accurate

Two sources may be used to check each other over time. One of these sources may be your own senses, which show whether the event that the other source predicted occurred or not. The observation of an event is really another signal about the event. It is a noisy signal because your own eyes may lie (optical illusions, deepfakes).

First, one source sends a signal about the event, then the second source sends. You will never know whether the event actually occurred, but the second source is the aggregate of all the future information you receive about the event, so may be very accurate. The second source may send many signals in sequence about the event, yielding more info about the first source over time. Then the process repeats about a second event, a third, etc. This is how belief about the trustworthiness of a source is built.

You cannot learn the true accuracy of a source, because the truth is unavailable to your senses, so you cannot compare a source’s signals to the truth. You can only learn the consistency of different sources of sensory information. Knowing the correlation between various sensory sources is both necessary and sufficient for decision making, because your objective function (utility or payoff) is your perception of successfully achieving your goals. If your senses are deceived so you believe you have achieved what you sought, but actually have not, then you get the feeling of success, but if your senses are deceived to tell you you have failed, then you do not feel success even if you actually succeeded. The problem with deception arises purely from the positive correlation between the deceit and the perception of deceit. If deceit increases the probability that you later perceive you have been deceived and are unhappy about that perception, then deceit may reduce your overall utility despite initially increasing it temporarily. If you never suspect the deception, then your happiness is as if the deception was the truth.

Your senses send signals to your brain. We can interpret these signals as information about which hypothetical state of the world has occurred – we posit that there exists a world which may be in different states with various probabilities and that there is a correlation between the signals and these states. Based on the information, you update the probabilities of the states and choose a course of action. Actions result in probability distributions over different future sensations, which may be modelled as a different sensation in each state of the world, which have probabilities attached. (Later we may remove the states of the world from the model and talk about a function from past perceptions and actions into future perceptions. The past is only accessible through memory. Memory is a current perception, so we may also remove time from the model.)

You prefer some future sensations to others. These need not be sensory pleasures. These could be perceptions of having improved the world through great toil. You would prefer to choose an action that results in preferable sensations in the future. Which action this is depends on the state of the world.

To estimate the best action (the one yielding the most preferred sensations), you use past sensory signals. The interpretation of these signals depends on the assumed or learned correlation between the signals and the state. The assumption may be instinctive from birth. The learning is really about how sensations at a point in time are correlated with the combination of sensations and actions before that point. An assumption that the correlation is stable over time enables you to use past correlation to predict future correlation. This assumption in turn may be instinctive or learned.

The events most are interested in distinguishing are of the form “action A results in the most preferred sensations”, “action B causes the most preferred sensations”, “action A yields the least preferred sensations”. Any event that is useful to know is of a similar form by Blackwell’s theorem: information is useful if and only if it changes decisions.

The usefulness of a signal source depends on how consistent the signals it gives about the action-sensation links (events) are with your future perceptions. These future perceptions are the signals from the second source – your senses – against which the first source is checked. The signals of the second source have the form “memory of action A and a preferred sensation at present”. Optimal learning about the usefulness of the first source uses Bayes’ rule and a prior probability distribution on the correlations between the first source and the second. The events of interest in this case are the levels of correlation. A signal about these levels is whether the first source gave a signal that coincided with later sensory information.

If the first source recommended a “best action” that later yielded a preferred sensation, then this increases the probability of high positive correlation between the first source and the second on average. If the recommended action was followed by a negative sensation, then this raises the probability of a negative correlation between the sources. Any known correlation is useful information, because it helps predict the utility consequences of actions.

Counterfactuals should be mentioned as a side note. Even if an action A resulted in a preferred sensation, a different action B might have led to an even better sensation in the counterfactual universe where B was chosen instead. Of course, B might equally well have led to a worse sensation. Counterfactuals require a model to evaluate – what the output would have been after a different input depends on the assumed causal chain from inputs to outputs.

Whether two sources are separate or copies is also a learnable event.

P-value cannot be less than 1/1024 in ten binary choices

Baez-Mendoza et al (2021) claim that for rhesus macaques choosing which of two others to reward in each trial, „the difference in the other’s reputation based on past interactions (i.e., how likely they were to reciprocate over the past 20 trials) had a significant effect on the animal’s choices [odds ratio (OR) = 1.54, t = 9.2, P = 3.5 × 10^-20; fig. S2C]”.

In 20 trials, there are ten chances to reciprocate if I understand the meaning of reciprocation in the study (monkey x gives a reward to the monkey who gave x a reward in the last trial). Depending on interpretation, there are 6-10 chances to react to reciprocation. Six if three trials are required for each reaction: the trial in which a monkey acts, the trial in which another monkey reciprocates and the trial in which a monkey reacts to the reciprocation. Ten if the reaction can coincide with the initial act of the next action-reciprocation pair.

Under the null hypothesis that the monkey allocates rewards randomly, the probability of giving the reward to the monkey who previously reciprocated the most 10 times out of 10 is 1/1024. The p-value is the probability that the observed effect is due to chance, given the null hypothesis. So the p-value cannot be smaller than about 0.001 for a 20-trial session, which offers at most 10 chances to react to reciprocation. The p-value cannot be 3.5*10^-20 as Baez-Mendoza et al (2021) claim. Their supplementary material does not offer an explanation of how this p-value was calculated.

Interpreting reciprocation or trials differently so that 20 trials offer 20 chances to reciprocate, the minimal p-value is 1/1048576, approximately 10^-6, again far from 3.5*10^-20.

A possible explanation is the sentence “The group performed an average of 105 ± 8.7 (mean ± SEM) trials per session for a total of 22 sessions.” If the monkey has a chance to react to past reciprocation in a third of the 105*22 sessions, then the p-value can indeed be of the order 10^-20. It would be interesting to know how the authors divide the trials into the reputation-building and reaction blocks.

Symmetry of matter seems impossible

I am not a physicist, so the following may be my misunderstanding. Symmetry seems theoretically impossible, except at one instant. If there was a perfectly symmetric piece of matter (after rotating or reflecting it around some axis, the set of locations of its atoms would be the same as before, just a possibly different atom in each location), then in the next instant of time, its atoms would move to unpredictable locations by the Heisenberg uncertainty principle (the location and momentum of a particle cannot be simultaneously determined). This is because the locations of the atoms would be known by symmetry in the first instant, thus their momenta unknown.

Symmetry may not provide complete information about the locations of the atoms, but constrains their possible locations. Such an upper bound on the uncertainty about locations puts a lower bound on the uncertainty about momenta. Momentum uncertainty creates location uncertainty in the next instant.

Symmetry is probably an approximation: rotating or reflecting a piece of matter, its atoms are in locations close to the previous locations of its atoms. Again, an upper bound on the location uncertainty about the atoms should put a lower bound on the momentum uncertainty. If the atoms move in uncertain directions, then the approximate location symmetry would be lost at some point in time, both in the future and the past.

Animal experiments on whether pose and expression control mood

Amy Cuddy promoted power poses which she claimed boosted confidence and success. Replication of her results failed (the effects were not found in other psychology studies), then succeeded again, so the debate continues. Similarly, adopting a smiling expression makes people happier. Measuring the psychological effects of posture and expression is complicated in humans. For example, due to experimenter demand effects. Animals are simpler and cheaper to experiment with, but I did not find any animal experiments on power poses on Google Scholar on 28.03.2021.

The idea of the experiment is to move the animal into a confident or scared pose and measure the resulting behaviour, stress hormones, dominance hormones, maybe scan the brain. Potentially mood-affecting poses differ between animals, but are well-known for common pets. Lifting a dog’s tail up its back is a confident pose. Moving the tail side to side or putting the chest close to the ground and butt up in a “play-with-me bow” is happy, excited. Putting the dog’s tail between the legs is scared. Moving the dog’s gums back to bare its teeth is angry. Arching a cat’s back is angry. Curling the cat up and half-closing its eyes is contented.

The main problem is that the animal may resist being moved into these poses or get stressed by the unfamiliar treatment. A period of habituation training is needed, but if the pose has an effect, then part of this effect realises during the habituation. In this case, the measured effect size is attenuated, i.e. the pre- and post-treatment mood and behaviour look similar.

A similar experiment in people is to have a person or a robot move the limbs of the participants of the experiment into power poses instead of asking them to assume the pose. The excuse or distraction from the true purpose of the experiment may be light physical exercise, physical therapy or massage. This includes a facial massage, which may stretch the face into a smile or compress into a frown. The usual questionnaires and measurements may be administered after moving the body or face into these poses or expressions.

Moon phase and sleep correlation is not quite a sine wave

Casiraghi et al. (2021) in Science Advances (DOI: 10.1126/sciadv.abe0465) show that human sleep duration and onset depends on the phase of the moon. Their interpretation is that light availability during the night caused humans to adapt their sleep over evolutionary time. Casiraghi et al. fit a sine curve to both sleep duration and onset as functions of the day in the monthly lunar cycle, but their Figure 1 A, B for the full sample and the blue and orange curves for the rural groups in Figure 1 C, D show a statistically significant deviation from a sine function. Instead of same-sized symmetric peaks and troughs, sleep duration has two peaks with a small trough between, then a large sharp trough which falls more steeply than rises, then two peaks again. Sleep onset has a vertically reflected version of this pattern. These features are statistically significant, based on the confidence bands Casiraghi and coauthors have drawn in Figure 1.

The significant departure of sleep patterns from a sine wave calls into question the interpretation that light availability over evolutionary time caused these patterns. What fits the interpretation of Casiraghi et al. is that sleep duration is shortest right before full moon, but what does not fit is that the duration is longest right after full and new moons, but shorter during a waning crescent moon between these.

It would better summarise the data to use the first four terms of a Fourier series instead of just the first term. There seems little danger of overfitting, given N=69 and t>60.

A questionable choice of the authors is to plot the sleep duration and onset of only the 35 best-fitting participants in Figure 2. A more honest choice yielding the same number of plots would pick every other participant in the ranking from the best fit to the worst.

In the section Materials and Methods, Casiraghi et al. fitted both a 15-day and a 30-day cycle to test for the effect of the Moon’s gravitational pull on sleep. The 15-day component was weaker in urban communities than rural, but any effect of gravity should be the same in both. By contrast, the effect of moonlight should be weaker in urban communities, but the urban community data (Figure 1 C, D green curve) fits a simple sine curve better than rural. It seems strange that sleep in urban communities would correlate more strongly with the amount of moonlight, like Figure 1 shows.

Leader turnover due to organisation performance is underestimated

Berry and Fowler (2021) “Leadership or luck? Randomization inference for leader effects in politics, business, and sports” in Science Advances propose a method they call RIFLE for testing the null hypothesis that leaders have no effect on organisation performance. The method is robust to serial correlation in outcomes and leaders, but not to endogenous leader turnover, as Berry and Fowler honestly point out. The endogeneity is that the organisation’s performance influences the probability that the leader is replaced (economic growth causes voters to keep a politician in office, losing games causes a team to replace its coach).

To test whether such endogeneity is a significant problem for their results, Berry and Fowler regress the turnover probability on various measures of organisational performance. They find small effects, but this underestimates the endogeneity problem, because Berry and Fowler use linear regression, forcing the effect of performance on turnover to be monotone and linear.

If leader turnover is increased by both success (get a better job elsewhere if the organisation performs well, so quit voluntarily) and failure (fired for the organisation’s bad performance), then the relationship between turnover and performance is U-shaped. Average leaders keep their jobs, bad and good ones transition elsewhere. This is related to the Peter Principle that an employee is promoted to her or his level of incompetence. A linear regression finds a near-zero effect of performance on turnover in this case even if the true effect is large. How close the regression coefficient is to zero depends on how symmetric the effects of good and bad performance on leader transition are, not how large these effects are.

The problem for the RIFLE method of Berry and Fowler is that the small apparent effect of organisation performance on leader turnover from OLS regression misses the endogeneity in leader transitions. Such endogeneity biases RIFLE, as Berry and Fowler admit in their paper.

The endogeneity may explain why Berry and Fowler find stronger leader effects in sports (coaches in various US sports) than in business (CEOs) and politics (mayors, governors, heads of government). A sports coach may experience more asymmetry in the transition probabilities for good and bad performance than a politician. For example, if the teams fire coaches after bad performance much more frequently than poach coaches from well-performing competing teams, then the effect of performance on turnover is close to monotone: bad performance causes firing. OLS discovers this monotone effect. On the other hand, if politicians move with equal likelihood after exceptionally good and bad performance of the administrative units they lead, then linear regression finds no effect of performance on turnover. This misses the bias in RIFLE, which without the bias might show a large leader effect in politics also.

The unreasonably large effect of governors on crime (the governor effect explains 18-20% of the variation in both property and violent crime) and the difference between the zero effect of mayors on crime and the large effect of governors that Berry and Fowler find makes me suspect something is wrong with that particular analysis in their paper. In a checks-and-balances system, the governor should not have that large of influence on the state’s crime. A mayor works more closely with the local police, so would be expected to have more influence on crime.

Diffraction grating of parallel electron beams

Diffraction gratings with narrow bars and bar spacing are useful for separating short-wavelength electromagnetic radiation (x-rays, gamma rays) into a spectrum, but the narrow bars and gaps are difficult to manufacture. The bars are also fragile and thus need a backing material, which may absorb some of the radiation, leaving less of it to be studied. Instead of manufacturing the grating out of a solid material composed of neutral atoms, an alternative may be to use many parallel electron beams. Electromagnetic waves do scatter off electrons, thus the grating of parallel electron beams should have a similar effect to a solid grating of molecules. My physics knowledge is limited, so this idea may not work for many reasons.

Electron beams can be made with a diameter a few nanometres across, and can be bent with magnets. Thus the grating could be made from a single beam if powerful enough magnets bend it back on itself. Or many parallel beams generated from multiple sources.

The negatively charged electrons repel each other, so the beams tend to bend away from each other. To compensate for this, the beam sources could target the beams to a common focus and let the repulsion forces bend the beams outward. There would exist a point at which the converging and then diverging beams are parallel. The region near that point could be used as the grating. The converging beams should start out sufficiently close to parallel that they would not collide before bending outward again.

Proton or ion beams are also a possibility, but protons and ions have larger diameter than electrons, which tends to create a coarser grating. Also, electron beam technology is more widespread and mature (cathode ray tubes were used in old televisions), thus easier to use off the shelf.

Training programs should be hands-on and use the scientific method

The current education and training programs (first aid, fire warden, online systems) in universities just take the form of people sitting in a room passively watching a video or listening to a talk. A better way would be to interactively involve the trainees, because active learning makes people understand faster and remember longer. Hands-on exercises in first aid or firefighting are also more interesting and useful.

At a minimum, the knowledge of the trainees should be tested, in as realistic a way as possible (using hands-on practical exercises). The test should use the scientific method to avoid bias: the examiner should be unconnected to the training provider. The trainer should not know the specific questions of the exam in advance (to prevent “teaching to the test”), only the general required knowledge. Such independent examination permits assessing the quality of the training in addition to the knowledge of the trainees. Double-blind testing is easiest if the goal of the training (the knowledge hoped for) is well defined (procedures, checklists, facts, mathematical solutions).

One problem is how to motivate the trainees to make an effort in the test. For example, in university lectures and tutorials, students do not try to solve the exercises, despite this being a requirement. Instead, they wait for the answers to be posted. One way to incentivise effort is to create competition by publicly revealing the test results.

Blind testing of bicycle fitting

Claims that getting a professional bike fit significantly improves riding comfort and speed and reduces overuse injuries seem suspicious – how can a centimetre here or there make such a large difference? A very wrong fit (e.g. an adult using a children’s bike) of course creates big problems, but most people can adjust their bike to a reasonable fit based on a few online suggestions.

To determine the actual benefit of a bike fit requires a randomised trial: have professionals determine the bike fit for a large enough sample of riders, measure and record the objective parameters of the fit (centimetres of seatpost out of the seat tube, handlebar height from the ground, pedal crank length, etc). Then randomly change the fit by a few centimetres or leave it unchanged, without the cyclist knowing, and let the rider test the bike. Record the speed, ask the rider to rate the comfort, fatigue, etc. Repeat for several random changes in fit. Statistically test whether the average speed, comfort rating and other outcome variables across the sample of riders are better with the actual fit or with small random changes. To eliminate the placebo effect, blind testing is important – the cyclists should not know whether and how the fit has been changed.

Another approach is to have each rider test a large sample of different bike fits, find the best one empirically, record its objective parameters and then have a sample of professional fitters (who should not know what empirical fit was found) choose the best fit. Test statistically whether the professionals choose the same fit as the cyclist.

A simpler trial that does not quite answer the question of interest checks the consistency of different bike fitters. The same person with the same bike in the same initial configuration goes to various fitters and asks them to choose a fit. After each fitting, the objective sizing of the bike is recorded and then the bike is returned to the initial configuration before the next fit. The test is whether all fitters choose approximately the same parameters. Inconsistency implies that most fitters cannot figure out the objectively best fit, but consistency does not imply that the consensus of the fitters is the optimal sizing. They could all be wrong the same way – consistency is insufficient to answer the question of interest.

Committing to an experimental design without revealing it

Pre-registering an experiment in a public registry of clinical trials keeps the experimenters honest (avoids ex post modifications of hypotheses to fit the data and “cherry-picking” the data by removing “outliers”), but unfortunately reveals information to competing research groups. This is an especially relevant concern in commercial R&D.

The same verifiability of honesty could be achieved without revealing scientific details by initially publicly distributing an encrypted description of the experiment, and after finishing the research, publishing the encryption key. Ex post, everyone can check that the specified experimental design was followed and all variables reported (no p-hacking). Ex ante, competitors do not know the trial details, so cannot copy it or infer the research direction.