In a nice article on the pitfalls of statistics published today on KnowledgeWharton (The Use -- and Misuse -- of Statistics: How and Why Numbers Are So Easily Manipulated - http://knowledge.wharton.upenn.edu/article.cfm?articleid=1928) there is an interesting discussion on statistics and how tricky it actually is. Nice, but it doesn’t go far enough.
How do we usually proceed to study a field or phenomena where there is lots of apparent or real heterogeneity? Well, we are trained to look for simple explanations, to infer from patterns and regularities the existence of laws (when in doubt apply Occam’s razor) and expect our units of analysis (whether they are cities, people, firms, ecologies or economic transactions - let’s call them agents) to conform to those laws with some individual variations. Assuming the existence of a representative agent, we also expect that it is possible to rank our agents according to how distant they are from the representative agent.
Now imagine what the world looked before this type of reasoning was introduced. Unbounded variability, endless forms, capricious behaviours, permanent amazement at the diversity of the natural and social phenomena. No surprise that Plato introduced the myth of the cavern to try to establish some order in the messiness of reality (for Plato all earthly forms were flawed reflections of the ideal type, which didn’t exist on earth).
Then arrived Quetelet, Demoivre, Gauss, Pearson, etc. and it must have been intellectual nirvana. By using simple concepts such as averages and variances, they could explain the amazing diversity of reality. It worked everywhere, from atoms to voters, from societies to natural systems. The promise of statistics must have seemed unbound. Quetelet thought, in a typical Platonian or pre-communist fashion, that the mean was the embodiment of the ideal form. Variance was evil and extreme variances indicated pathological behaviours. Order was in homogeneity and the mean represent the signature of the ‘right’ value. Perfection rested with the average person and consequently the role of politics was to create the average society. Many sciences followed. Substitute mean with equilibrium, throw in the invisible hand and you get today’s market fundamentalism. In today’s FT George Soros writes:
“for the past 25 years or so the financial authorities and institutions they regulate have been guided by market fundamentalism: the belief that markets tend toward equilibrium and that deviations from it occur in random manner. All the innovations – risk management, trading techniques, the alphabet soup of derivatives and synthetic financial instruments were based on that belief. The innovations remained unregulated because authorities believe markets are self-correcting”
As often in the history of ideas we forget the assumptions on which theories are built. For Gaussian statistics (and linear science at large), they are basically 2: independence and randomness. Now, how many instances do you know in the social sciences in which phenomena, or datapoints, are truly independent from each other and random? But try to take any sample of articles in the social sciences and especially in economics and management, and you will see that Gaussian statistics rules uncontested. Even worse, alternative methods and the underlying weltanschaung (vision of the world) are actively resisted.
As Mandelbrot (the inventor of fractal geometry) puts it:
“The most diverse attempts continue to be made, to discredit in advance all evidence based on the use of doubly logarithmic graphs. But I think this method would have remained uncontroversial, were it not for the nature of the conclusion to which it leads. Unfortunately, a straight doubly logarithmic graph indicates a distribution that flies in the face of the Gaussian dogma, which long ruled uncontested. The failure of applied statisticians and social scientists to heed Zipf helps account for the striking backwardness of their fields. (Mandelbrot, 1983, 404)”
With a colleague from UCLA, Bill McKelvey, I have been studying the misuse of Gaussian statistics and exploring the potential of what is known as Paretian (from Pareto, the Italian economist/sociologist) science. We find that almost anywhere you look at you find distributions that carry the unmistakable sign of Pareto distributions (also known as Zipf, or power law, these are long-tailed distributions where both mean and variance are unstable or don’t exist.), which scream for a non Gaussian interpretation.
More on this in my next blog
Comments (3)
A nice reminder of the constant challenge of matching tools to real-world questions, that tools are often abused, and new tools have learning and mindshare curves.
Posted by WalterRSmith | April 3, 2008 6:14 PM
Posted on April 3, 2008 18:14
What is the methodological difference between believing in the universal applicability of Gaussian statistics and believing in the date of 4004 BC for the origin of the world?
Posted by Jerome Ravetz | April 28, 2008 3:40 PM
Posted on April 28, 2008 15:40
I'm guessing that 4004 BC is a reference to the Ussher Chronology of the Bible.
Gaussian statistics is a theoretical model which has applications in the real world, to the extent that the axioms and assumptions of the model also hold true in the real world. I think this is the point Dave makes about independence and randomness. If the axioms don't hold, the predictions won't be reliable.
The Ussher Chronology was an attempt to use the Bible as a raw dataset and apply empirical methods (which were quite novel in the mid 17th century) to deduce facts about the age of the earth.
There was nothing predictive or generalisable in Ussher's work: he wasn't trying to make inferences about how God operated or about any other worlds God might choose to create.
It's also worth pointing out that the Bible is not primary data. It's a narrative, is quite selective in its use of evidence, presents findings that are already interprteted and contains inconsistencies and omissions.
Posted by Gordon Rae | May 5, 2008 2:36 PM
Posted on May 5, 2008 14:36