Tag Archives: abstract musings

Privacy reduces cooperation, may be countered by free speech

Cooperation relies on reputation. For example, fraud in online markets is deterred by the threat of bad reviews, which reduce future trading with the defector. Data protection, specifically the “right to be forgotten” allows those with a bad reputation to erase their records from the market provider’s database and create new accounts with a clean slate. Bayesian participants of the market then rationally attach a bad reputation to any new account (“guilty until proven innocent”). If new entrants are penalised, then entry and competition decrease.

One way to counter this abusing of data protection laws to escape the consequences of one’s past misdeeds is to use free speech laws. Allow market participants to comment on or rate others, protecting such comments as a civil liberty. If other traders can identify a bad actor, for example using his or her government-issued ID, then any future account by the same individual can be penalised by attaching the previous bad comments from the start.

Of course, comments could be abused to destroy competitors’ reputations, so leaving a bad comment should have a cost. For example, the comments are numerical ratings and the average rating given by a person is subtracted from all ratings given by that person. Dividing by the standard deviation is helpful for making the ratings of those with extreme opinions comparable to the scores given by moderates. Normalising by the mean and standard deviation makes ratings relative, so pulling down someone’s reputation pushes up those of others.

However, if a single entity can control multiple accounts (create fake profiles or use company accounts), then he or she can exchange positive ratings between his or her own profiles and rate others badly. Without being able to distinguish new accounts from fake profiles, any rating system has to either penalise entrants or allow sock-puppet accounts to operate unchecked. Again, official ID requirements may deter multiple account creation, but privacy laws impede this deterrence. There is always the following trilemma: either some form of un-erasable web activity history is kept, or entrants are punished, or fake accounts go unpunished.

Avoiding the Bulow and Rogoff 1988 result on the impossibility of borrowing

Bulow and Rogoff 1988 NBER working paper 2623 proves that countries cannot borrow, due to their inability to credibly commit to repay, if after default they can still buy insurance. The punishment of defaulting on debt is being excluded from future borrowing. This punishment is not severe enough to motivate a country to repay, by the following argument. A country has two reasons to borrow: it is less patient than the lenders (values current consumption or investment opportunities relatively more) and it is risk-averse (either because the utility of consumption is concave, or because good investment opportunities appear randomly). Debt can be used to smooth consumption or take advantage of temporary opportunities for high-return investment: borrow when consumption would otherwise be low, pay back when relatively wealthy.

After the impatient country has run up its debt to the maximum level the creditors are willing to tolerate, the impatience motive to borrow disappears, because the lenders do not allow more consumption to be transferred from the future to the present. Only the insurance motive to borrow remains. The punishment for default is the inability to insure via debt, because in a low-consumption or valuable-investment state of affairs, no more can be borrowed. Bulow and Rogoff assume that the country can still save or buy insurance by paying in advance, so “one-sided” risk-sharing (pay back when relatively wealthy, or when investment opportunities are unavailable) is possible. This seemingly one-sided risk-sharing becomes standard two-sided risk-sharing upon default, because the country can essentially “borrow” from itself the amount that it would have spent repaying debt. This amount can be used to consume or invest in the state of the world where these activities are attractive, or to buy insurance if consumption and investment are currently unattractive. Thus full risk-sharing is achieved.

More generally, if the country can avoid the punishment that creditors impose upon default (evade trade sanctions by smuggling, use alternate lenders if current creditors exclude it), then the country has no incentive to repay, in which case lenders have no incentive to lend.

The creditors know that once the country has run up debt to the maximum level they allow, it will default. Thus rational lenders set the maximum debt to zero. In other words, borrowing is impossible.

A way around the no-borrowing theorem of Bulow and Rogoff is to change one or more assumptions. In an infinite horizon game, Hellwig and Lorenzoni allow the country to run a Ponzi scheme on the creditors, thus effectively “borrow from time period infinity”, which permits a positive level of debt. Sometimes even an infinite level of debt.

Another assumption that could realistically be removed is that the country can buy insurance after defaulting. Restricting insurance need not be due to an explicit legal ban. The insurers are paid in advance, thus do not exclude the country out of fear of default. Instead, the country’s debt contract could allow creditors to seize the country’s financial assets abroad, specifically in creditor countries, and these assets could be defined to include insurance premiums already paid, or the payments from insurers to the country. The creditors have no effective recourse against the sovereign debtor, but they may be able to enforce claims against insurance firms outside the defaulting country.

Seizing premiums to or payments from insurers would result in negative profits to insurers or restrict the defaulter to one-sided risk-sharing, without the abovementioned possibility of making it two-sided. Seizing premiums makes insurers unwilling to insure, and seizing payments from insurers removes the country’s incentive to purchase insurance. Either way, the country’s benefit from risk-sharing after default is eliminated. This punishment would motivate loan repayment, in turn motivating lending.

M-diagram of politics

Suppose a politician claims that X is best for society. Quiz:

1. Should we infer that X is best for society?

2. Should we infer that the politician believes that X is best for society?

3. Should we infer that X is best for the politician?

4. Should we infer that X is best for the politician among policies that can be `sold’ as best for society?

5. Should we infer that the politician believes that X is best for the politician?

This quiz illustrates the general principle in game theory that players best-respond to their perceptions, not reality. Sometimes the perceptions may coincide with reality. Equilibrium concepts like Nash equilibrium assume that on average, players have correct beliefs.

The following diagram illustrates the reasoning of the politician claiming X is best for society: M-diagram of politics In case the diagram does not load, here is its description: the top row has `Official goal’ and `Real goal’, the bottom row has `Best way to the official goal’, `Best way to the real goal that looks like a reasonable way to the official goal’ and `Best way to the real goal’. Arrows point in an M-shaped pattern from the bottom row items to the top items. The arrow from `Best way to the real goal that looks like a reasonable way to the official goal’ to `Official goal’ is the constraint on the claims of the politician.

The correct answer to the quiz is 5.

This post is loosely translated from the original Estonian one https://www.sanderheinsalu.com/ajaveeb/?p=140

Economic and political cycles interlinked

Suppose the government’s policy determines the state of the economy with a lag that equals one term of the government. Also assume that voters re-elect the incumbent in a good economy, but choose the challenger in a bad economy. This voting pattern is empirically realistic and may be caused by voters not understanding the lag between the policy and the economy. Suppose there are two political parties: the good and the bad. The policy the good party enacts when in power puts the economy in a good state during the next term of government. The bad party’s policy creates a recession in the next term.

If the economy starts out doing well and the good party is initially in power, then the good party remains in power forever, because during each of its terms in government, it makes the economy do well the next term, so voters re-elect it the next term.

If the economy starts out in a recession with the good party in power, then the second government is the bad party. The economy does well during the second government’s term due to the policy of the good party in the first term. Then voters re-elect the bad party, but the economy does badly in the third term due to the bad party’s previous policy. The fourth government is then again the good party, with the economy in a recession. This situation is the same as during the first government, so cycles occur. The length of a cycle is three terms. In the first term, the good party is in power, with the other two terms governed by the bad party. In the first and third term, the economy is in recession, but in the second term, booming.

If the initial government is the bad party, with the economy in recession, then the three-term cycle again occurs, starting from the third term described above. Specifically, voters choose the good party next, but the economy does badly again because of the bad party’s current policy. Then voters change back to the bad party, but the economy booms due to the policy the good party enacted when it was in power. Re-election of the bad is followed by a recession, which is the same state of affairs as initially.

If the government starts out bad and the economy does well, then again the three-term cycle repeats: the next government is bad, with the economy in recession. After that, the good party rules, but the economy still does badly. Then again the bad party comes to power and benefits from the economic growth caused by the good party’s previous policy.

Overall, the bad party is in power two-thirds of the time and the economy in recession also two-thirds of the time. Recessions overlap with the bad party in only one-third of government terms.

Of course, reality is more complicated than the simple model described above – there are random shocks to the economy, policy lags are not exactly equal to one term of the government, the length of time a party stays in power is random, one party’s policy may be better in one situation but worse in another.

Social welfare functions derived from revealed preference

The social welfare functions used in policy evaluation typically put more weight on poorer people, justifying redistribution from the rich to the poor. The reasoning is that the marginal benefit of a unit of money is greater for the poor than the rich. However, people with a greater marginal value of money are more motivated to earn and save, other things equal, so more likely to become rich. In this case, the rich have on average a higher marginal benefit of money than the poor, or a lower marginal cost of accumulating it. If the justification for redistribution is an interpersonal utility comparison, then revealed preference suggests a greater Pareto weight for richer people, thus redistribution in the opposite direction to the usual.

If the marginal utility of money decreases in wealth or income, then people earn until the marginal benefit equals the marginal cost, so the comparison between the rich and the poor depends on their marginal cost of earning, evaluated at their current wealth and income. The cost and benefit of earning may both be higher or lower for richer people. In a one-shot model, whoever has a greater benefit should receive redistributive transfers to maximise a utilitarian welfare criterion. Dynamic indirect effects sometimes reverse this conclusion, because incentives for future work are reduced by taxation.

Those with a high marginal utility of money are more motivated to convince the public that their marginal utility is high and that they should receive a subsidy. The marginal utility is the difference between a benefit and a cost, which determine whether the poor or the rich have a greater incentive to lobby for redistributive transfers. The marginal cost of an hour of persuasion equals the person’s hourly wage, so depends on whether her income is derived mostly from capital or from labour. For example, both rentiers and low-wage workers have a low opportunity cost of time, so optimally lobby more than high-wage workers. If lobbying influences policy (which is empirically plausible), then the tax system resulting from the persuasion competition burdens the high-wage workers the heaviest and leaves loopholes and low rates for capital income and low wages. This seems to be the case in most countries.

A tax system based on lobbying is inefficient, because it is not the people with the greatest benefit that receive the subsidies (which equal the value of government services minus the taxes), but those with the largest difference between the benefit and the lobbying cost. However, the resulting taxation is constrained efficient under the restriction that the social planner cannot condition policy on people’s marginal costs of lobbying.

Seasonings may reduce the variety of diet

Animals may evolve a preference for a varied diet in order to get the many nutrients they need. A test of this on mice would be whether their preference for different grains is negatively autocorrelated, i.e. they are less likely to choose a food if they have eaten more of it recently.

Variety is perceived mainly through taste, so the mechanism via which the preference for a varied diet probably operates is that consuming a substance repeatedly makes its taste less pleasant for the next meal. Spices and other flavourings can make the same food seem different, so may interfere with variety-seeking, essentially by deceiving the taste. A test of this on mice would flavour the same grain differently and check whether this attenuates the negative autocorrelation of consumption, both when other grains are available and when not.

If seasonings reduce variety-seeking, then access to spices may lead people to consume a more monotonous diet, which may be less healthy. A test of this hypothesis is whether increased access to flavourings leads to more obesity, especially among those constrained to eat similar foods over time. The constraint may be poverty (only a few cheap foods are affordable) or physical access (living in a remote, unpopulated area).

A preference for variety explains why monotonous diets, such as Atkins, may help lose weight: eating similar food repeatedly gets boring, so the dieter eats less.

Compatibility with colleagues is like interoperability

Interacting with colleagues is like compatibility of programs, tools or machine parts – an individually very good component may be useless if it does not fit with the rest of the machine. A potentially very productive worker who does not work with others in the company does not contribute much.

The difference between an individual and a firm may be horizontal (different cultures, all similarly good) or vertical (bad vs good quality or productivity). The horizontal compatibility with colleagues includes personal appearance – wearing a shirt with a left-wing slogan may be fine in a left-wing company, but offend people in a right-wing one, and vice versa. When colleagues take offence, the strong emotions distract them from work, so a slogan on a shirt may reduce their productivity.

Vertical fitting in includes personal hygiene, because bad breath or body odour distracts others from work. Similarly, loud phone conversations or other noise are disruptive everywhere.

Laplace’s principle of indifference makes history useless

Model the universe in discrete time with only one variable, which can take values 0 and 1. The history of the universe up to time t is a vector of length t consisting of zeroes and ones. A deterministic universe is a fixed sequence. A random universe is like drawing the next value (0 or 1) according to some probability distribution every period, where the probabilities can be arbitrary and depend in arbitrary ways on the past history.
The prior distribution over deterministic universes is a distribution over sequences of zeroes and ones. The prior determines which sets are generic. I will assume the prior with the maximum entropy, which is uniform (all paths of the universe are equally likely). This follows from Laplace’s principle of indifference, because there is no information about the distribution over universes that would make one universe more likely than another. The set of infinite sequences of zeroes and ones is bijective with the interval [0,1], so a uniform distribution on it makes sense.
After observing the history up to time t, one can reject all paths of the universe that would have led to a different history. For a uniform prior, any history is equally likely to be followed by 0 or 1. The prediction of the next value of the variable is the same after every history, so knowing the history is useless for decision-making.
Many other priors besides uniform on all sequences yield the same result. For example, uniform restricted to the support consisting of sequences that are eventually constant. There is a countable set of such sequences, so the prior is improper uniform. A uniform distribution restricted to sequences that are eventually periodic, or that in the limit have equal frequency of 1 and 0 also works.
Having more variables, more values of these variables or making time continuous does not change the result. A random universe can be modelled as deterministic with extra variables. These extras can for example be the probability of drawing 1 next period after a given history.
Predicting the probability distribution of the next value of the variable is easy, because the probability of 1 is always one-half. Knowing the history is no help for this either.

Statistics with a single history

Only one history is observable to a person – the one that actually happened. Counterfactuals are speculation about what would have happened if choices or some other element of the past history had differed. Only one history is observable to humanity as a whole, to all thinking beings in the universe as a whole, etc. This raises the question of how to do statistics with a single history.

The history is chopped into small pieces, which are assumed similar to each other and to future pieces of history. All conclusions require assumptions. In the case of statistics, the main assumption is “what happened in the past, will continue to happen in the future.” The “what” that is happening can be complicated – a long chaotic pattern can be repeated. It should be specified what the patterns of history consist of before discussing them.

The history observable to a brain consists of the sensory inputs and memory. Nothing else is accessible. This is pointed out by the “brain in a jar” thought experiment. Memory is partly past sensory inputs, but may also depend on spontaneous changes in the brain. Machinery can translate previously unobservable aspects of the world into accessible sensory inputs, for example convert infrared and ultraviolet light into visible wavelengths. Formally, history is a function from time to vectors of sensory inputs.

The brain has a built-in ability to classify sensory inputs by type – visual, auditory, etc. This is why the inputs form a vector. For a given sense, there is a built-in “similarity function” that enables comparing inputs from the same sense at different times.

Inputs distinguished by one person, perhaps with the help of machinery, may look identical to another person. The interpretation is that there are underlying physical quantities that must differ by more than the “just noticeable difference” to be perceived as different. The brain can access physical quantities only through the senses, so whether there is a “real world” cannot be determined, only assumed. If most people’s perceptions agree about something, and machinery also agrees (e.g. measuring tape does not agree with visual illusions), then this “something” is called real and physical. The history accessible to humanity as a whole is a function from time to the concatenation of their sensory input vectors.

The similarity functions of people can also be aggregated, compared to machinery and the result interpreted as a physical quantity taking “similar” values at different times.

A set of finite sequences of vectors of sensory inputs is what I call a pattern of history. For example, a pattern can be a single sequence or everything but a given sequence. Patterns may repeat, due to the indistinguishability of physical quantities close to each other. The finer distinctions one can make, the fewer the instances with the same perception. In the limit of perfect discrimination of all variable values, history is unlikely to ever repeat. In the limit of no perception at all, history is one long repetition of nothing happening. The similarity of patterns is defined based on the similarity function in the brain.

Repeated similar patterns together with assumptions enable learning and prediction. If AB is always followed by C, then learning is easy. Statistics are needed when this is not the case. If half the past instances of AB are followed by C, half by D, then one way to interpret this is by constructing a state space with a probability distribution on it. For example, one may assume the existence of an unperceived variable that can take values c,d and assume that ABc leads deterministically to ABC and ABd to ABD. The past instances of AB can be interpreted as split into equal numbers of ABc and ABd. The prediction after observing AB is equal probabilities of C and D. This is a frequentist setup.

A Bayesian interpretation puts a prior probability distribution on histories and updates it based on the observations. The prior may put probability one on a single future history after each past one. Such a deterministic prediction is easily falsified – one observation contrary to it suffices. Usually, many future histories are assumed to have positive probability. Updating requires conditional probabilities of future histories given the past. The histories that repeat past patterns are usually given higher probability than others. Such a conditional probability system embodies the assumption “what happened in the past, will continue to happen in the future.”

There is a tradeoff between the length of a pattern and the number of times it has repeated. Longer patterns permit prediction further into the future, but fewer repetitions mean more uncertainty. Much research in statistics has gone into finding the optimal pattern length given the data. A long pattern contains many shorter ones, with potentially different predictions. Combining information from different pattern lengths is also a research area. Again, assumptions determine which pattern length and combination is optimal. Assumptions can be tested, but only under other assumptions.

Causality is also a mental construct. It is based on past repetitions of an AB-like pattern, without occurrence of BA or CB-like patterns.

The perception of time is created by sensory inputs and memory, e.g. seeing light and darkness alternate, feeling sleepy or alert due to the circadian rhythm and remembering that this has happened before. History is thus a mental construct. It relies on the assumptions that time exists, there is a past in which things happened and current recall is correlated with what actually happened. The preceding discussion should be restated without assuming time exists.

 

Bayesian vs frequentist statistics – how to decide?

Which predicts better, Bayesian or frequentist statistics? This is an empirical question. To find out, should we compare their predictions to the data using Bayesian or frequentist statistics? What if Bayesian statistics says frequentist is better and frequentist says Bayesian is better (Liar’s paradox)? To find the best method for measuring the quality of the predictions, should we use Bayesianism or frequentism? And to find the best method to find the best method for comparing predictions to data? How to decide how to decide how to decide, as in Lipman (1991)?