Tag Archives: abstract musings

Economic and political cycles interlinked

Suppose the government’s policy determines the state of the economy with a lag that equals one term of the government. Also assume that voters re-elect the incumbent in a good economy, but choose the challenger in a bad economy. This voting pattern is empirically realistic and may be caused by voters not understanding the lag between the policy and the economy. Suppose there are two political parties: the good and the bad. The policy the good party enacts when in power puts the economy in a good state during the next term of government. The bad party’s policy creates a recession in the next term.

If the economy starts out doing well and the good party is initially in power, then the good party remains in power forever, because during each of its terms in government, it makes the economy do well the next term, so voters re-elect it the next term.

If the economy starts out in a recession with the good party in power, then the second government is the bad party. The economy does well during the second government’s term due to the policy of the good party in the first term. Then voters re-elect the bad party, but the economy does badly in the third term due to the bad party’s previous policy. The fourth government is then again the good party, with the economy in a recession. This situation is the same as during the first government, so cycles occur. The length of a cycle is three terms. In the first term, the good party is in power, with the other two terms governed by the bad party. In the first and third term, the economy is in recession, but in the second term, booming.

If the initial government is the bad party, with the economy in recession, then the three-term cycle again occurs, starting from the third term described above. Specifically, voters choose the good party next, but the economy does badly again because of the bad party’s current policy. Then voters change back to the bad party, but the economy booms due to the policy the good party enacted when it was in power. Re-election of the bad is followed by a recession, which is the same state of affairs as initially.

If the government starts out bad and the economy does well, then again the three-term cycle repeats: the next government is bad, with the economy in recession. After that, the good party rules, but the economy still does badly. Then again the bad party comes to power and benefits from the economic growth caused by the good party’s previous policy.

Overall, the bad party is in power two-thirds of the time and the economy in recession also two-thirds of the time. Recessions overlap with the bad party in only one-third of government terms.

Of course, reality is more complicated than the simple model described above – there are random shocks to the economy, policy lags are not exactly equal to one term of the government, the length of time a party stays in power is random, one party’s policy may be better in one situation but worse in another.

Social welfare functions derived from revealed preference

The social welfare functions used in policy evaluation typically put more weight on poorer people, justifying redistribution from the rich to the poor. The reasoning is that the marginal benefit of a unit of money is greater for the poor than the rich. However, people with a greater marginal value of money are more motivated to earn and save, other things equal, so more likely to become rich. In this case, the rich have on average a higher marginal benefit of money than the poor, or a lower marginal cost of accumulating it. If the justification for redistribution is an interpersonal utility comparison, then revealed preference suggests a greater Pareto weight for richer people, thus redistribution in the opposite direction to the usual.

If the marginal utility of money decreases in wealth or income, then people earn until the marginal benefit equals the marginal cost, so the comparison between the rich and the poor depends on their marginal cost of earning, evaluated at their current wealth and income. The cost and benefit of earning may both be higher or lower for richer people. In a one-shot model, whoever has a greater benefit should receive redistributive transfers to maximise a utilitarian welfare criterion. Dynamic indirect effects sometimes reverse this conclusion, because incentives for future work are reduced by taxation.

Those with a high marginal utility of money are more motivated to convince the public that their marginal utility is high and that they should receive a subsidy. The marginal utility is the difference between a benefit and a cost, which determine whether the poor or the rich have a greater incentive to lobby for redistributive transfers. The marginal cost of an hour of persuasion equals the person’s hourly wage, so depends on whether her income is derived mostly from capital or from labour. For example, both rentiers and low-wage workers have a low opportunity cost of time, so optimally lobby more than high-wage workers. If lobbying influences policy (which is empirically plausible), then the tax system resulting from the persuasion competition burdens the high-wage workers the heaviest and leaves loopholes and low rates for capital income and low wages. This seems to be the case in most countries.

A tax system based on lobbying is inefficient, because it is not the people with the greatest benefit that receive the subsidies (which equal the value of government services minus the taxes), but those with the largest difference between the benefit and the lobbying cost. However, the resulting taxation is constrained efficient under the restriction that the social planner cannot condition policy on people’s marginal costs of lobbying.

Seasonings may reduce the variety of diet

Animals may evolve a preference for a varied diet in order to get the many nutrients they need. A test of this on mice would be whether their preference for different grains is negatively autocorrelated, i.e. they are less likely to choose a food if they have eaten more of it recently.

Variety is perceived mainly through taste, so the mechanism via which the preference for a varied diet probably operates is that consuming a substance repeatedly makes its taste less pleasant for the next meal. Spices and other flavourings can make the same food seem different, so may interfere with variety-seeking, essentially by deceiving the taste. A test of this on mice would flavour the same grain differently and check whether this attenuates the negative autocorrelation of consumption, both when other grains are available and when not.

If seasonings reduce variety-seeking, then access to spices may lead people to consume a more monotonous diet, which may be less healthy. A test of this hypothesis is whether increased access to flavourings leads to more obesity, especially among those constrained to eat similar foods over time. The constraint may be poverty (only a few cheap foods are affordable) or physical access (living in a remote, unpopulated area).

A preference for variety explains why monotonous diets, such as Atkins, may help lose weight: eating similar food repeatedly gets boring, so the dieter eats less.

Compatibility with colleagues is like interoperability

Interacting with colleagues is like compatibility of programs, tools or machine parts – an individually very good component may be useless if it does not fit with the rest of the machine. A potentially very productive worker who does not work with others in the company does not contribute much.

The difference between an individual and a firm may be horizontal (different cultures, all similarly good) or vertical (bad vs good quality or productivity). The horizontal compatibility with colleagues includes personal appearance – wearing a shirt with a left-wing slogan may be fine in a left-wing company, but offend people in a right-wing one, and vice versa. When colleagues take offence, the strong emotions distract them from work, so a slogan on a shirt may reduce their productivity.

Vertical fitting in includes personal hygiene, because bad breath or body odour distracts others from work. Similarly, loud phone conversations or other noise are disruptive everywhere.

Laplace’s principle of indifference makes history useless

Model the universe in discrete time with only one variable, which can take values 0 and 1. The history of the universe up to time t is a vector of length t consisting of zeroes and ones. A deterministic universe is a fixed sequence. A random universe is like drawing the next value (0 or 1) according to some probability distribution every period, where the probabilities can be arbitrary and depend in arbitrary ways on the past history.
The prior distribution over deterministic universes is a distribution over sequences of zeroes and ones. The prior determines which sets are generic. I will assume the prior with the maximum entropy, which is uniform (all paths of the universe are equally likely). This follows from Laplace’s principle of indifference, because there is no information about the distribution over universes that would make one universe more likely than another. The set of infinite sequences of zeroes and ones is bijective with the interval [0,1], so a uniform distribution on it makes sense.
After observing the history up to time t, one can reject all paths of the universe that would have led to a different history. For a uniform prior, any history is equally likely to be followed by 0 or 1. The prediction of the next value of the variable is the same after every history, so knowing the history is useless for decision-making.
Many other priors besides uniform on all sequences yield the same result. For example, uniform restricted to the support consisting of sequences that are eventually constant. There is a countable set of such sequences, so the prior is improper uniform. A uniform distribution restricted to sequences that are eventually periodic, or that in the limit have equal frequency of 1 and 0 also works.
Having more variables, more values of these variables or making time continuous does not change the result. A random universe can be modelled as deterministic with extra variables. These extras can for example be the probability of drawing 1 next period after a given history.
Predicting the probability distribution of the next value of the variable is easy, because the probability of 1 is always one-half. Knowing the history is no help for this either.

Statistics with a single history

Only one history is observable to a person – the one that actually happened. Counterfactuals are speculation about what would have happened if choices or some other element of the past history had differed. Only one history is observable to humanity as a whole, to all thinking beings in the universe as a whole, etc. This raises the question of how to do statistics with a single history.

The history is chopped into small pieces, which are assumed similar to each other and to future pieces of history. All conclusions require assumptions. In the case of statistics, the main assumption is “what happened in the past, will continue to happen in the future.” The “what” that is happening can be complicated – a long chaotic pattern can be repeated. It should be specified what the patterns of history consist of before discussing them.

The history observable to a brain consists of the sensory inputs and memory. Nothing else is accessible. This is pointed out by the “brain in a jar” thought experiment. Memory is partly past sensory inputs, but may also depend on spontaneous changes in the brain. Machinery can translate previously unobservable aspects of the world into accessible sensory inputs, for example convert infrared and ultraviolet light into visible wavelengths. Formally, history is a function from time to vectors of sensory inputs.

The brain has a built-in ability to classify sensory inputs by type – visual, auditory, etc. This is why the inputs form a vector. For a given sense, there is a built-in “similarity function” that enables comparing inputs from the same sense at different times.

Inputs distinguished by one person, perhaps with the help of machinery, may look identical to another person. The interpretation is that there are underlying physical quantities that must differ by more than the “just noticeable difference” to be perceived as different. The brain can access physical quantities only through the senses, so whether there is a “real world” cannot be determined, only assumed. If most people’s perceptions agree about something, and machinery also agrees (e.g. measuring tape does not agree with visual illusions), then this “something” is called real and physical. The history accessible to humanity as a whole is a function from time to the concatenation of their sensory input vectors.

The similarity functions of people can also be aggregated, compared to machinery and the result interpreted as a physical quantity taking “similar” values at different times.

A set of finite sequences of vectors of sensory inputs is what I call a pattern of history. For example, a pattern can be a single sequence or everything but a given sequence. Patterns may repeat, due to the indistinguishability of physical quantities close to each other. The finer distinctions one can make, the fewer the instances with the same perception. In the limit of perfect discrimination of all variable values, history is unlikely to ever repeat. In the limit of no perception at all, history is one long repetition of nothing happening. The similarity of patterns is defined based on the similarity function in the brain.

Repeated similar patterns together with assumptions enable learning and prediction. If AB is always followed by C, then learning is easy. Statistics are needed when this is not the case. If half the past instances of AB are followed by C, half by D, then one way to interpret this is by constructing a state space with a probability distribution on it. For example, one may assume the existence of an unperceived variable that can take values c,d and assume that ABc leads deterministically to ABC and ABd to ABD. The past instances of AB can be interpreted as split into equal numbers of ABc and ABd. The prediction after observing AB is equal probabilities of C and D. This is a frequentist setup.

A Bayesian interpretation puts a prior probability distribution on histories and updates it based on the observations. The prior may put probability one on a single future history after each past one. Such a deterministic prediction is easily falsified – one observation contrary to it suffices. Usually, many future histories are assumed to have positive probability. Updating requires conditional probabilities of future histories given the past. The histories that repeat past patterns are usually given higher probability than others. Such a conditional probability system embodies the assumption “what happened in the past, will continue to happen in the future.”

There is a tradeoff between the length of a pattern and the number of times it has repeated. Longer patterns permit prediction further into the future, but fewer repetitions mean more uncertainty. Much research in statistics has gone into finding the optimal pattern length given the data. A long pattern contains many shorter ones, with potentially different predictions. Combining information from different pattern lengths is also a research area. Again, assumptions determine which pattern length and combination is optimal. Assumptions can be tested, but only under other assumptions.

Causality is also a mental construct. It is based on past repetitions of an AB-like pattern, without occurrence of BA or CB-like patterns.

The perception of time is created by sensory inputs and memory, e.g. seeing light and darkness alternate, feeling sleepy or alert due to the circadian rhythm and remembering that this has happened before. History is thus a mental construct. It relies on the assumptions that time exists, there is a past in which things happened and current recall is correlated with what actually happened. The preceding discussion should be restated without assuming time exists.

 

Bayesian vs frequentist statistics – how to decide?

Which predicts better, Bayesian or frequentist statistics? This is an empirical question. To find out, should we compare their predictions to the data using Bayesian or frequentist statistics? What if Bayesian statistics says frequentist is better and frequentist says Bayesian is better (Liar’s paradox)? To find the best method for measuring the quality of the predictions, should we use Bayesianism or frequentism? And to find the best method to find the best method for comparing predictions to data? How to decide how to decide how to decide, as in Lipman (1991)?

Health insurance insures not health, but wealth

If health insurance really insured health, it would offer a small constant health loss in exchange for reducing the probability of a big health loss. For example, it would offer a constant low-level headache but take away the chance of heart attack.
In reality, health insurance constantly takes away a small amount of money and (hopefully) in return removes a big monetary loss which may result from healthcare costs in the case of a serious health problem. This is wealth insurance, not health insurance.