Tag Archives: science

Animal experiments on whether pose and expression control mood

Amy Cuddy promoted power poses which she claimed boosted confidence and success. Replication of her results failed (the effects were not found in other psychology studies), then succeeded again, so the debate continues. Similarly, adopting a smiling expression makes people happier. Measuring the psychological effects of posture and expression is complicated in humans. For example, due to experimenter demand effects. Animals are simpler and cheaper to experiment with, but I did not find any animal experiments on power poses on Google Scholar on 28.03.2021.

The idea of the experiment is to move the animal into a confident or scared pose and measure the resulting behaviour, stress hormones, dominance hormones, maybe scan the brain. Potentially mood-affecting poses differ between animals, but are well-known for common pets. Lifting a dog’s tail up its back is a confident pose. Moving the tail side to side or putting the chest close to the ground and butt up in a “play-with-me bow” is happy, excited. Putting the dog’s tail between the legs is scared. Moving the dog’s gums back to bare its teeth is angry. Arching a cat’s back is angry. Curling the cat up and half-closing its eyes is contented.

The main problem is that the animal may resist being moved into these poses or get stressed by the unfamiliar treatment. A period of habituation training is needed, but if the pose has an effect, then part of this effect realises during the habituation. In this case, the measured effect size is attenuated, i.e. the pre- and post-treatment mood and behaviour look similar.

A similar experiment in people is to have a person or a robot move the limbs of the participants of the experiment into power poses instead of asking them to assume the pose. The excuse or distraction from the true purpose of the experiment may be light physical exercise, physical therapy or massage. This includes a facial massage, which may stretch the face into a smile or compress into a frown. The usual questionnaires and measurements may be administered after moving the body or face into these poses or expressions.

Moon phase and sleep correlation is not quite a sine wave

Casiraghi et al. (2021) in Science Advances (DOI: 10.1126/sciadv.abe0465) show that human sleep duration and onset depends on the phase of the moon. Their interpretation is that light availability during the night caused humans to adapt their sleep over evolutionary time. Casiraghi et al. fit a sine curve to both sleep duration and onset as functions of the day in the monthly lunar cycle, but their Figure 1 A, B for the full sample and the blue and orange curves for the rural groups in Figure 1 C, D show a statistically significant deviation from a sine function. Instead of same-sized symmetric peaks and troughs, sleep duration has two peaks with a small trough between, then a large sharp trough which falls more steeply than rises, then two peaks again. Sleep onset has a vertically reflected version of this pattern. These features are statistically significant, based on the confidence bands Casiraghi and coauthors have drawn in Figure 1.

The significant departure of sleep patterns from a sine wave calls into question the interpretation that light availability over evolutionary time caused these patterns. What fits the interpretation of Casiraghi et al. is that sleep duration is shortest right before full moon, but what does not fit is that the duration is longest right after full and new moons, but shorter during a waning crescent moon between these.

It would better summarise the data to use the first four terms of a Fourier series instead of just the first term. There seems little danger of overfitting, given N=69 and t>60.

A questionable choice of the authors is to plot the sleep duration and onset of only the 35 best-fitting participants in Figure 2. A more honest choice yielding the same number of plots would pick every other participant in the ranking from the best fit to the worst.

In the section Materials and Methods, Casiraghi et al. fitted both a 15-day and a 30-day cycle to test for the effect of the Moon’s gravitational pull on sleep. The 15-day component was weaker in urban communities than rural, but any effect of gravity should be the same in both. By contrast, the effect of moonlight should be weaker in urban communities, but the urban community data (Figure 1 C, D green curve) fits a simple sine curve better than rural. It seems strange that sleep in urban communities would correlate more strongly with the amount of moonlight, like Figure 1 shows.

Leader turnover due to organisation performance is underestimated

Berry and Fowler (2021) “Leadership or luck? Randomization inference for leader effects in politics, business, and sports” in Science Advances propose a method they call RIFLE for testing the null hypothesis that leaders have no effect on organisation performance. The method is robust to serial correlation in outcomes and leaders, but not to endogenous leader turnover, as Berry and Fowler honestly point out. The endogeneity is that the organisation’s performance influences the probability that the leader is replaced (economic growth causes voters to keep a politician in office, losing games causes a team to replace its coach).

To test whether such endogeneity is a significant problem for their results, Berry and Fowler regress the turnover probability on various measures of organisational performance. They find small effects, but this underestimates the endogeneity problem, because Berry and Fowler use linear regression, forcing the effect of performance on turnover to be monotone and linear.

If leader turnover is increased by both success (get a better job elsewhere if the organisation performs well, so quit voluntarily) and failure (fired for the organisation’s bad performance), then the relationship between turnover and performance is U-shaped. Average leaders keep their jobs, bad and good ones transition elsewhere. This is related to the Peter Principle that an employee is promoted to her or his level of incompetence. A linear regression finds a near-zero effect of performance on turnover in this case even if the true effect is large. How close the regression coefficient is to zero depends on how symmetric the effects of good and bad performance on leader transition are, not how large these effects are.

The problem for the RIFLE method of Berry and Fowler is that the small apparent effect of organisation performance on leader turnover from OLS regression misses the endogeneity in leader transitions. Such endogeneity biases RIFLE, as Berry and Fowler admit in their paper.

The endogeneity may explain why Berry and Fowler find stronger leader effects in sports (coaches in various US sports) than in business (CEOs) and politics (mayors, governors, heads of government). A sports coach may experience more asymmetry in the transition probabilities for good and bad performance than a politician. For example, if the teams fire coaches after bad performance much more frequently than poach coaches from well-performing competing teams, then the effect of performance on turnover is close to monotone: bad performance causes firing. OLS discovers this monotone effect. On the other hand, if politicians move with equal likelihood after exceptionally good and bad performance of the administrative units they lead, then linear regression finds no effect of performance on turnover. This misses the bias in RIFLE, which without the bias might show a large leader effect in politics also.

The unreasonably large effect of governors on crime (the governor effect explains 18-20% of the variation in both property and violent crime) and the difference between the zero effect of mayors on crime and the large effect of governors that Berry and Fowler find makes me suspect something is wrong with that particular analysis in their paper. In a checks-and-balances system, the governor should not have that large of influence on the state’s crime. A mayor works more closely with the local police, so would be expected to have more influence on crime.

Diffraction grating of parallel electron beams

Diffraction gratings with narrow bars and bar spacing are useful for separating short-wavelength electromagnetic radiation (x-rays, gamma rays) into a spectrum, but the narrow bars and gaps are difficult to manufacture. The bars are also fragile and thus need a backing material, which may absorb some of the radiation, leaving less of it to be studied. Instead of manufacturing the grating out of a solid material composed of neutral atoms, an alternative may be to use many parallel electron beams. Electromagnetic waves do scatter off electrons, thus the grating of parallel electron beams should have a similar effect to a solid grating of molecules. My physics knowledge is limited, so this idea may not work for many reasons.

Electron beams can be made with a diameter a few nanometres across, and can be bent with magnets. Thus the grating could be made from a single beam if powerful enough magnets bend it back on itself. Or many parallel beams generated from multiple sources.

The negatively charged electrons repel each other, so the beams tend to bend away from each other. To compensate for this, the beam sources could target the beams to a common focus and let the repulsion forces bend the beams outward. There would exist a point at which the converging and then diverging beams are parallel. The region near that point could be used as the grating. The converging beams should start out sufficiently close to parallel that they would not collide before bending outward again.

Proton or ion beams are also a possibility, but protons and ions have larger diameter than electrons, which tends to create a coarser grating. Also, electron beam technology is more widespread and mature (cathode ray tubes were used in old televisions), thus easier to use off the shelf.

Training programs should be hands-on and use the scientific method

The current education and training programs (first aid, fire warden, online systems) in universities just take the form of people sitting in a room passively watching a video or listening to a talk. A better way would be to interactively involve the trainees, because active learning makes people understand faster and remember longer. Hands-on exercises in first aid or firefighting are also more interesting and useful.

At a minimum, the knowledge of the trainees should be tested, in as realistic a way as possible (using hands-on practical exercises). The test should use the scientific method to avoid bias: the examiner should be unconnected to the training provider. The trainer should not know the specific questions of the exam in advance (to prevent “teaching to the test”), only the general required knowledge. Such independent examination permits assessing the quality of the training in addition to the knowledge of the trainees. Double-blind testing is easiest if the goal of the training (the knowledge hoped for) is well defined (procedures, checklists, facts, mathematical solutions).

One problem is how to motivate the trainees to make an effort in the test. For example, in university lectures and tutorials, students do not try to solve the exercises, despite this being a requirement. Instead, they wait for the answers to be posted. One way to incentivise effort is to create competition by publicly revealing the test results.

Blind testing of bicycle fitting

Claims that getting a professional bike fit significantly improves riding comfort and speed and reduces overuse injuries seem suspicious – how can a centimetre here or there make such a large difference? A very wrong fit (e.g. an adult using a children’s bike) of course creates big problems, but most people can adjust their bike to a reasonable fit based on a few online suggestions.

To determine the actual benefit of a bike fit requires a randomised trial: have professionals determine the bike fit for a large enough sample of riders, measure and record the objective parameters of the fit (centimetres of seatpost out of the seat tube, handlebar height from the ground, pedal crank length, etc). Then randomly change the fit by a few centimetres or leave it unchanged, without the cyclist knowing, and let the rider test the bike. Record the speed, ask the rider to rate the comfort, fatigue, etc. Repeat for several random changes in fit. Statistically test whether the average speed, comfort rating and other outcome variables across the sample of riders are better with the actual fit or with small random changes. To eliminate the placebo effect, blind testing is important – the cyclists should not know whether and how the fit has been changed.

Another approach is to have each rider test a large sample of different bike fits, find the best one empirically, record its objective parameters and then have a sample of professional fitters (who should not know what empirical fit was found) choose the best fit. Test statistically whether the professionals choose the same fit as the cyclist.

A simpler trial that does not quite answer the question of interest checks the consistency of different bike fitters. The same person with the same bike in the same initial configuration goes to various fitters and asks them to choose a fit. After each fitting, the objective sizing of the bike is recorded and then the bike is returned to the initial configuration before the next fit. The test is whether all fitters choose approximately the same parameters. Inconsistency implies that most fitters cannot figure out the objectively best fit, but consistency does not imply that the consensus of the fitters is the optimal sizing. They could all be wrong the same way – consistency is insufficient to answer the question of interest.

Committing to an experimental design without revealing it

Pre-registering an experiment in a public registry of clinical trials keeps the experimenters honest (avoids ex post modifications of hypotheses to fit the data and “cherry-picking” the data by removing “outliers”), but unfortunately reveals information to competing research groups. This is an especially relevant concern in commercial R&D.

The same verifiability of honesty could be achieved without revealing scientific details by initially publicly distributing an encrypted description of the experiment, and after finishing the research, publishing the encryption key. Ex post, everyone can check that the specified experimental design was followed and all variables reported (no p-hacking). Ex ante, competitors do not know the trial details, so cannot copy it or infer the research direction.

Blind testing of clothes

Inspired by blind taste testing, manufacturers’ claims about clothes could be tested by subjects blinded to what they are wearing. The test would work as follows. People put clothes on by feel with their eyes closed or in a pitch dark room and wear other clothes on top of the item to be tested. Thus the subjects cannot see what they are wearing. They then rate the comfort, warmth, weight, softness and other physical aspects of the garment. This would help consumers select the most practical clothing and keep advertising somewhat more honest than heretofore. For example, many socks are advertised as warm, but based on my experience, many of them do not live up to the hype. I would be willing to pay a small amount for data about past wearers’ experience. Online reviews are notoriously emotional and biased.

Some aspects of clothes can also be measured objectively – warmth is one of these, measured by heat flow through the garment per unit of area. Such data is unfortunately rarely reported. The physical measurements to conduct on clothes require some thought, to make these correspond to the wearing experience. For example, if clothes are thicker in some parts, then their insulation should be measured in multiple places. Some parts of the garment may usually be worn with more layers under or over it than others, which may affect the required warmth of different areas of the clothing item differently. Sweat may change the insulation properties dramatically, e.g. for cotton. Windproofness matters for whether windchill can be felt. All this needs taking into account when converting physical measurements to how the clothes feel.

Keeping an open mind and intellectual honesty

„Keep an open mind” is often used as an argument against science, or to justify ignoring evidence more broadly. Let’s distinguish two cases of keeping an open mind: before vs after the evidence comes in. It is good to keep an open mind before data is obtained – no hypothesis is ruled out. In reality, all possibilities have positive probability, no matter how great the amount and quality of information, so one should not dogmatically rule out anything even given the best evidence. However, for practical purposes a small enough probability is the same as zero. Decisions have to be made constantly (choosing not to decide is also a decision), so after enough scientific information is available, it is optimal to make up one’s mind, instead of keeping it open.
Intellectually honest people who want to keep an open mind after obtaining evidence would commit to it from the start: publicly say that no matter what the data shows in the future, they will ignore it and keep an open mind. Similarly, the intellectually honest who plan to make up their mind would also commit, in this case to a policy along the lines of „if the evidence says A, then do this, but if the evidence says B, then that”. The latter policy resembles (parts of) the scientific method.
The anti-science or just intellectually dishonest way of “keeping an open mind” is to do this if and only if the evidence disagrees with one’s prior views. In other words, favourable data is accepted, but unfavourable ignored, justifying the ignoring with the open mind excuse. In debates, the side that runs out of arguments and is about to lose is usually the one who recommends an open mind, and only at that late stage of the debate and conditional on own weak position. Similarly, “agreeing to disagree” is mostly recommended intellectually dishonestly by the losing side of an argument, to attempt to leave the outcome uncertain. This is an almost logically contradictory use of “agreeing to disagree”, because it is mathematically proven that rational agents putting positive probability on the same events cannot agree to disagree – if their posterior beliefs are common knowledge, then these must coincide.

Seasonings may reduce the variety of diet

Animals may evolve a preference for a varied diet in order to get the many nutrients they need. A test of this on mice would be whether their preference for different grains is negatively autocorrelated, i.e. they are less likely to choose a food if they have eaten more of it recently.

Variety is perceived mainly through taste, so the mechanism via which the preference for a varied diet probably operates is that consuming a substance repeatedly makes its taste less pleasant for the next meal. Spices and other flavourings can make the same food seem different, so may interfere with variety-seeking, essentially by deceiving the taste. A test of this on mice would flavour the same grain differently and check whether this attenuates the negative autocorrelation of consumption, both when other grains are available and when not.

If seasonings reduce variety-seeking, then access to spices may lead people to consume a more monotonous diet, which may be less healthy. A test of this hypothesis is whether increased access to flavourings leads to more obesity, especially among those constrained to eat similar foods over time. The constraint may be poverty (only a few cheap foods are affordable) or physical access (living in a remote, unpopulated area).

A preference for variety explains why monotonous diets, such as Atkins, may help lose weight: eating similar food repeatedly gets boring, so the dieter eats less.