Tag Archives: statistics

Laplace’s principle of indifference makes history useless

Model the universe in discrete time with only one variable, which can take values 0 and 1. The history of the universe up to time t is a vector of length t consisting of zeroes and ones. A deterministic universe is a fixed sequence. A random universe is like drawing the next value (0 or 1) according to some probability distribution every period, where the probabilities can be arbitrary and depend in arbitrary ways on the past history.
The prior distribution over deterministic universes is a distribution over sequences of zeroes and ones. The prior determines which sets are generic. I will assume the prior with the maximum entropy, which is uniform (all paths of the universe are equally likely). This follows from Laplace’s principle of indifference, because there is no information about the distribution over universes that would make one universe more likely than another. The set of infinite sequences of zeroes and ones is bijective with the interval [0,1], so a uniform distribution on it makes sense.
After observing the history up to time t, one can reject all paths of the universe that would have led to a different history. For a uniform prior, any history is equally likely to be followed by 0 or 1. The prediction of the next value of the variable is the same after every history, so knowing the history is useless for decision-making.
Many other priors besides uniform on all sequences yield the same result. For example, uniform restricted to the support consisting of sequences that are eventually constant. There is a countable set of such sequences, so the prior is improper uniform. A uniform distribution restricted to sequences that are eventually periodic, or that in the limit have equal frequency of 1 and 0 also works.
Having more variables, more values of these variables or making time continuous does not change the result. A random universe can be modelled as deterministic with extra variables. These extras can for example be the probability of drawing 1 next period after a given history.
Predicting the probability distribution of the next value of the variable is easy, because the probability of 1 is always one-half. Knowing the history is no help for this either.

Giving oneself tenure

Senior academics tell juniors that an assistant professor does not have to get tenure at his or her current university, but “in the profession”, i.e. at some university. To extend this reasoning, one does not have to get tenure at all, just guarantee one’s ability to pay one’s living costs with as low effort as possible. Government jobs are also secure – not quite tenure, but close.
Economically, tenure is guaranteed income for life (or until a mandatory retirement age) in exchange for teaching and administrative work. The income may vary somewhat, based on research and teaching success, but there is some lower bound on salary. Many nontenured academics are obsessed about getting tenure. The main reason is probably not the prestige of being called Professor, but the income security. People with families seem especially risk averse and motivated to secure their job.
Guaranteed income can be obtained by other means than tenure, e.g. by saving enough to live off the interest and dividends (becoming a rentier). Accumulating such savings is better than tenure, because there is no teaching and administration requirement. If one wishes, one can always teach for free. Similarly, research can be done in one’s free time. If expensive equipment is needed for the research, then one can pay a university or other institution for access to it. The payment may be in labour (becoming an unpaid research assistant). Becoming financially independent therefore means giving oneself more than tenure. Not many academics seem to have noticed this option, because they choose a wasteful consumerist lifestyle and do not plan their finances.
Given the scarcity of tenure-track jobs in many fields, choosing the highest-paying private-sector position (to accumulate savings), may be a quicker and more certain path to the economic equivalent of tenure than completing sequential postdocs. The option of an industry job seems risky to graduate students, because unlike in academia, one can get fired. However, the chance of layoffs should be compared to failing to get a second postdoc at an institution of the same or higher prestige. When one industry job ends, there are others. Like in academia, moving downward is easier than up.
To properly compare the prospects in academia and industry, one should look at the statistics, not listen to anecdotal tales of one’s acquaintances or the promises of recruiters. If one aspires to be a researcher, then one should base one’s life decisions on properly researched facts. It is surprising how many academics do not. The relevant statistics on the percentage of graduates or postdocs who get a tenure-track job or later tenure have been published for several fields (http://www.nature.com/ncb/journal/v12/n12/full/ncb1210-1123.html, http://www.education.uw.edu/cirge/wp-content/uploads/2012/11/so-you-want-to-become-a-professor.pdf, https://www.aeaweb.org/articles?id=10.1257/jep.28.3.205). The earnings in both higher education and various industries are published as part of national labour force statistics. Objective information on job security (frequency of firing) is harder to get, but administrative data from the Nordic countries has it.
Of course, earnings are not the whole story. If one has to live in an expensive city to get a high salary, then the disposable income may be lower than with a smaller salary in a cheaper location. Non-monetary aspects of the job matter, such as hazardous or hostile work environment, the hours and flexibility. Junior academics normally work much longer than the 40 hours per week standard in most jobs, but the highest-paid private-sector positions may require even more time and effort than academia. The hours may be more flexible in academia, other than the teaching times. The work is probably of the same low danger level. There is no reason to suppose the friendliness of the colleagues to differ.
Besides higher salary, a benefit of industry jobs is that they can be started earlier in life, before the 6 years in graduate school and a few more in postdoc positions. Starting early helps with savings accumulation, due to compound interest. Some people have become financially independent in their early thirties this way (see mrmoneymustache.com).
If one likes all aspects of an academic job (teaching, research and service), then it is reasonable to choose an academic career. If some aspects are not inherently rewarding, then one should consider the alternative scenario in which the hours spent on those aspects are spent on paid employment instead. The rewarding parts of the job are done in one’s free time. Does this alternative scenario yield a higher salary? The non-monetary parts of this scenario seem comparable to academia.
Tenure is becoming more difficult to get, as evidenced by the lengthening PhD duration, the increasing average number of postdocs people do before getting tenure, and by the lengthening tenure clocks (9 years at Carnegie Mellon vs the standard 6). Senior academics (who have guarateed jobs) benefit from increased competition among junior academics, because then the juniors will do more work for the seniors for less money. So the senior academics have an incentive to lure young people into academia (to work in their labs as students and postdocs), even if this is not in the young people’s interest. The seniors do not fear competition from juniors, due to the aforementioned guaranteed jobs.
Graduate student and postdoc unions are lobbying universities and governments to give them more money. This has at best a limited impact, because in the end the jobs and salaries are determined by supply and demand. If the unions want to make current students and postdocs better off, then they should discourage new students from entering academia. If they want everyone to be better off, then they should encourage research-based decision-making by everyone. I do not mean presenting isolated facts that support their political agenda (like the unions do now), but promoting the use of the full set of labour force statistics available, asking people to think about their life goals and what jobs will help achieve those goals, and developing predictive models along the lines of “if you do a PhD in this field in this university, then your probable job and income at age 30, 40, etc is…”.

Keeping journals honest about response times

Academic journals in economics commonly take 3-6 months after manuscript submission to send the first response (reject, accept or revise) to the author. The variance of this response time is large both within and across journals. Authors prefer to receive quick responses, even if these are rejections, because then the article can be submitted to the next journal sooner. The quicker an article gets published, the sooner the author can use it to get a raise, a grant or tenure. This creates an incentive for authors to preferentially submit to journals with short response times.
If more articles are submitted to a journal, then the journal has a larger pool of research to select from. If the selection is positively correlated with article quality, then a journal with a larger pool to select from publishes on average higher quality articles. Higher quality raises the prestige of a journal’s editors. So there is an incentive for a journal to claim to have short response times to attract authors. On the other hand, procrastination of the referees and the editors tends to lengthen the actual response times. Many journals publish statistics about their response times on their website, but currently nothing guarantees the journals’ honesty. There are well-known tricks (other than outright lying) to shorten the reported response time, for example considering an article submitted only when it is assigned to an editor, and counting the response time from that point. Assigning to an editor can take over two weeks in my experience.
To keep journals honest, authors who have submitted to a journal should be able to check whether their papers have been correctly included in the statistics. Some authors may be reluctant to have their name and paper title associated with a rejection from a journal. A rejection may be inferred from a paper being included in the submission statistics, but not being published after a few years. A way around this is to report the response time for each manuscript number. Each submission to a journal is already assigned a unique identifier (manuscript number), which does not contain any identifying details of the author. The submitter of a paper is informed of its manuscript number, so can check whether the response time publicly reported for that manuscript number is correct.
Currently, authors can make public claims about the response time they encountered (e.g. on https://www.econjobrumors.com/journals.php), but these claims are hard to check. An author wanting to harm a journal may claim a very long response time. If the authors’ reported response times are mostly truthful, then these provide information about a journal’s actual response time. Symmetrically, if the journals’ reported response times are accurate, then an author’s truthfulness can be statistically tested, with the power of the test depending on the number of articles for which the author reports the response time.

Probability of finding true love

The concept of true love has been invented by poets and other exaggerators. Evolutionarily, the optimal strategy is to settle with a good enough partner, not to seek the best in the world. But suppose for the sake of argument that a person A’s true love is another person B who exists somewhere in the world. What is the probability that A meets B?

There is no a priori reason why A and B have to be from the same country, have similar wealth or political views. Isn’t that what poets would have us believe – that love knows no boundaries, blossoms in unlikely places, etc?

Given the 7 billion people in the world, what fraction of them does a given person meet per lifetime? Depends on what is meant by “meets” – seeing each other from a distance, walking past each other on the street, looking at each other, talking casually. Let’s take literally the cliché “love at first sight” and assume that meeting means looking at each other. A person looks at a different number of people per day depending on whether they live in a city or in the countryside. There is also repetition, i.e. seeing the same person multiple times. A guess at an average number of new people a person looks at per day is 100. This times 365 times a 70-year lifespan is 2555000. Divide 7 billion by this and the odds of meeting one’s true love are thus about one in three thousand per lifetime.

Some questionable assumptions went into this conclusion, for example that the true love could be of any gender or age and that the meeting rate is 100 per day. Restricting the set candidates to a particular gender and age group proportionately lowers the number of candidates met and the total number of candidates, so leaves the conclusion unchanged.

Someone on the hunt for a partner may move to a big city, sign up for dating websites and thereby raise the meeting rate (raise number met while keeping total number constant), which would improve the odds. On the other hand, if recognizing one’s true love takes more than looking at them, e.g. a conversation, then the meeting rate could fall to less than one – how many new people per day do you have a conversation with?

Some people claim to have met their true love, at least in the hearing of their current partner. The fraction claiming this is larger than would be expected based on the calculations above. There may be cognitive dissonance at work (reinterpreting the facts so that one’s past decision looks correct). Or perhaps the perfect partner is with high probability from the same ethnic and socioeconomic background and the same high school class (this is called homophily in sociology). Then love blossoms in the most likely places.

Statistics with a single history

Only one history is observable to a person – the one that actually happened. Counterfactuals are speculation about what would have happened if choices or some other element of the past history had differed. Only one history is observable to humanity as a whole, to all thinking beings in the universe as a whole, etc. This raises the question of how to do statistics with a single history.

The history is chopped into small pieces, which are assumed similar to each other and to future pieces of history. All conclusions require assumptions. In the case of statistics, the main assumption is “what happened in the past, will continue to happen in the future.” The “what” that is happening can be complicated – a long chaotic pattern can be repeated. It should be specified what the patterns of history consist of before discussing them.

The history observable to a brain consists of the sensory inputs and memory. Nothing else is accessible. This is pointed out by the “brain in a jar” thought experiment. Memory is partly past sensory inputs, but may also depend on spontaneous changes in the brain. Machinery can translate previously unobservable aspects of the world into accessible sensory inputs, for example convert infrared and ultraviolet light into visible wavelengths. Formally, history is a function from time to vectors of sensory inputs.

The brain has a built-in ability to classify sensory inputs by type – visual, auditory, etc. This is why the inputs form a vector. For a given sense, there is a built-in “similarity function” that enables comparing inputs from the same sense at different times.

Inputs distinguished by one person, perhaps with the help of machinery, may look identical to another person. The interpretation is that there are underlying physical quantities that must differ by more than the “just noticeable difference” to be perceived as different. The brain can access physical quantities only through the senses, so whether there is a “real world” cannot be determined, only assumed. If most people’s perceptions agree about something, and machinery also agrees (e.g. measuring tape does not agree with visual illusions), then this “something” is called real and physical. The history accessible to humanity as a whole is a function from time to the concatenation of their sensory input vectors.

The similarity functions of people can also be aggregated, compared to machinery and the result interpreted as a physical quantity taking “similar” values at different times.

A set of finite sequences of vectors of sensory inputs is what I call a pattern of history. For example, a pattern can be a single sequence or everything but a given sequence. Patterns may repeat, due to the indistinguishability of physical quantities close to each other. The finer distinctions one can make, the fewer the instances with the same perception. In the limit of perfect discrimination of all variable values, history is unlikely to ever repeat. In the limit of no perception at all, history is one long repetition of nothing happening. The similarity of patterns is defined based on the similarity function in the brain.

Repeated similar patterns together with assumptions enable learning and prediction. If AB is always followed by C, then learning is easy. Statistics are needed when this is not the case. If half the past instances of AB are followed by C, half by D, then one way to interpret this is by constructing a state space with a probability distribution on it. For example, one may assume the existence of an unperceived variable that can take values c,d and assume that ABc leads deterministically to ABC and ABd to ABD. The past instances of AB can be interpreted as split into equal numbers of ABc and ABd. The prediction after observing AB is equal probabilities of C and D. This is a frequentist setup.

A Bayesian interpretation puts a prior probability distribution on histories and updates it based on the observations. The prior may put probability one on a single future history after each past one. Such a deterministic prediction is easily falsified – one observation contrary to it suffices. Usually, many future histories are assumed to have positive probability. Updating requires conditional probabilities of future histories given the past. The histories that repeat past patterns are usually given higher probability than others. Such a conditional probability system embodies the assumption “what happened in the past, will continue to happen in the future.”

There is a tradeoff between the length of a pattern and the number of times it has repeated. Longer patterns permit prediction further into the future, but fewer repetitions mean more uncertainty. Much research in statistics has gone into finding the optimal pattern length given the data. A long pattern contains many shorter ones, with potentially different predictions. Combining information from different pattern lengths is also a research area. Again, assumptions determine which pattern length and combination is optimal. Assumptions can be tested, but only under other assumptions.

Causality is also a mental construct. It is based on past repetitions of an AB-like pattern, without occurrence of BA or CB-like patterns.

The perception of time is created by sensory inputs and memory, e.g. seeing light and darkness alternate, feeling sleepy or alert due to the circadian rhythm and remembering that this has happened before. History is thus a mental construct. It relies on the assumptions that time exists, there is a past in which things happened and current recall is correlated with what actually happened. The preceding discussion should be restated without assuming time exists.

 

Bayesian vs frequentist statistics – how to decide?

Which predicts better, Bayesian or frequentist statistics? This is an empirical question. To find out, should we compare their predictions to the data using Bayesian or frequentist statistics? What if Bayesian statistics says frequentist is better and frequentist says Bayesian is better (Liar’s paradox)? To find the best method for measuring the quality of the predictions, should we use Bayesianism or frequentism? And to find the best method to find the best method for comparing predictions to data? How to decide how to decide how to decide, as in Lipman (1991)?