Theory and data both needed for prediction

Clearly, data is required for prediction. Theory only says: “If this, then that.” It connects assumptions and conclusions. Data tells whether the assumptions are true. It allows the theory to be applied.
Theory is also required for prediction, although that is less obvious. For example, after observing a variable taking the value 1 a million times, what is the prediction for the next realization for the variable? Under the theory that the variable is constant, the next value is predicted to be 1. If the theory says there are a million 1-s followed by a million 0-s followed by a million 1-s etc, then the next value is 0. This theory may sound more complicated than the other, but prediction is concerned with correctness, not complexity. Also, the simplicity of a theory is a slippery concept – see the “grue-bleen example” in philosophy.
The constant sequence may sound like a more “natural” theory, but actually both the “natural” and the correct theory depend on where the data comes from. For example, the data may be generated by measuring whether it is day or night every millisecond. Day=1, night=0. Then a theory that a large number of 1-s are followed by a large number of 0-s, etc is more natural and correct than the theory that the sequence is constant.
Sometimes the theory is so simple that it is not noticed, like when forecasting a constant sequence. Which is more important for prediction, theory or data? Both equally, because the lack of either makes prediction impossible. If the situation is simple, then theorists may not be necessary, but theory still is.

Leave a Reply

Your email address will not be published.

WordPress Anti Spam by WP-SpamShield