The Campaign for Real Statistics (© brown box 2002)
The Campaign for real statistics does not exist. Yet. But if it ever does, this is what it would be about. Statistics are widely misreported, and it gets me all annoyed when people get wound up about something that was never true anyway. Read on, I'll tell you about;
Dreadful Data Collection
Miserably inappropriate Control Groups
Tragically erroneous implications of Cause and Effect
Utterly incorrect interpretation of results
Highly misleading media reporting
About Those Pollsters
What is the Campaign For Real Statistics
Lies, damn lies, and statistics. If this is the case, how come
statistics are used so widely? Statistics in the right hands are
useful analysis tools, which can be used to analyse risks or
opportunities and help lead to solutions. But in the wrong hands they
will mislead and misinform, can cause unrest and can maliciously or
negligently be used so to do.
Nearly all statistics concern people in one way or another;
How they react
How they think
What they want
What they like
What they dislike
How they are affected by events
The problems with stats come from two areas, either they are inaccurately gathered, or gathered in good faith and reported incorrectly.
The main factors to look for in good stats are the source of the data, and the quality of the control group.
Source of Data
How was the data gathered in the first place? A classic example is statistics on Sex. The common ones are;
Age at which teenagers lose their virginity
Frequency at which adult couples have sex.
In the first one we are shown that 90% of boys and 4% of girls have lost their virginity by age 15. How anyone can fail to see the inaccuracy of these statistics astounds me. I think it would be reasonable to assume from these stats that out of 100 boys and 100 girls, if 90 of the boys had sex once only, the four girls had 22 different partners.
The dramatic inaccuracy in this data is the method of data collection, which was that a group of teenagers were asked at what age they lost their virginity. The boys clearly considered it an act of bravado to say 14 or 15, the girls obviously considering a later age to be a more acceptable answer. The true statistic is that 90% boys claim, when asked, to have lost their virginity by age 15.
Similarly, many surveys come out with the 'fact' that adult couples have sex 3 times per week. I would suggest the reason for this is that once one survey came out with this, it became a well known fact. So ask any adult and this is what they say - no-one wants to be seen missing out.
The truth is, there needs to be good objectivity in the source of data, and what particularly needs to be avoided is changing the results in the very act of collecting them, as in the two cases above.
Another area to look at is the control group. These are used when the stats are to be used to show a change in risk for one group, who are exposed in some way, or who undertake different activities from the rest of us. To do this data must be gathered from two groups - the study group and the control group. The control group is there to represent 'the rest of us'. For the stats to be accurate, the quality and relevancy of this control group is paramount. An example of this is chlorine in water. Chlorine was first added to drinking water supplies around the turn of the last century. It acts as a disinfectant, and in short years wiped out cholera and typhoid, and there was a dramatic increase in life expectancy. Not the kind of thing you'd want to be without. In the mid nineties, a study suggested that chlorinated water could lead to an increase in liver disease, suggesting that the use of chlorine could be a health risk. The big question to be asked here is, who were the control group of people who do not drink chlorinated water supplies? They would have to be outside the UK, and outside of Europe come to that. As far as I can see, only in a developing country (an African one, for instance) would you find a non-chlorinated water drinking group of any size. The relevance of such a control group to a group living in England would be virtually nothing. There would be so many lifestyle differences between the two groups that it would be impossible to isolate the effects of Chlorinated water drinking. In fact, quite possibly the much lower life expectancy of the control group may be enough to ensure no-one lived long enough to develop liver disease.
Cause and Effect
Here's an interesting article
from the BBC about causality.
Any important issue here is the attempt to use statistics to establish cause and effect. Let me show some examples:
1. Tall people bang their heads more. Suggestion - the reason they bang their heads more is because they are tall. Well, that seems a fairly safe conclusion.
2. The more fire engines arrive at a fire, the greater chances of someone dying in that fire. Suggestion - the arrival of fire engines endangers lives. I think most people would dispute this proposed cause and effect.
Those were pretty clear cut examples, but most often it is not so clear. How about this:
3. People who live in Greece have less heart disease than those in England. Suggestion, living in Greece will reduce your risk of heart disease. Maybe. Maybe not. There are a host of differences between Greek people and English - the climate, diet, ancestry, all sorts of things. Any, none or some combination of these things may be the cause of the heart disease risk.
4. Since that factory opened more people have asthma problems. Conclusion, the reason people have asthma is something to do with the factory. Well, maybe, but what else has changed in the same period. Did the increase in Asthma occur in other places too? All these questions need to be examined and answered before its time to draw conclusions from the statistics.
5. This one was reported in January 2004 in the UK. Someone did some research on the usefullness of assistant teachers in primary schools classes.The research compared performance of children in classes with an assistant with that of those where no assistant was present, then concluded that assistants don't make any difference. What it failed to do was question why only some classes have an assistant - clearly they are placed in classes that are under acheiveing. So the fact that those classes subsequently so no difference in performance actually demonstrates that the assistants are very effective.
Interpretation of Results
The main issue with establishing cause and effect using statistics is the necessity to isolate the issue being measured from other influencing factors. For instance, it can be clearly shown that people with a larger shoe size are better at mathematics. Its quite simple, you get a bunch of people, a list of tricky sums, and check all the shoe sizes, and plot out the results. You will get a clear correlation in results. Does this mean that large feet does equate with mathematical ability? Or that tall people, shown by several studies to be favoured in society, both have larger feet and are favoured in schooling? More likely it means that you forgot to exclude children from the study.
The example is simple, but what matrix of subtle links are there in your control group which affect the results, other than the one you are trying to measure. And what lengths have been gone to avoid such effects. Many of the statistics flying around in the popular press take little attempt to iron out such inaccuracies. For instance, all of the MORI, and similar, polls include only those people who are prepared to stop and answer questions in the street. Some research has suggested that this is quite a small proportion of the population, and therefore not necessarily representative of the whole.
What About Those Pollsters?
This raises another issue. The pollsters are one of the main producers of statistics. They claim that a survey of around 1000 people is enough to provide a representative sample of the whole population. The fact behind this claim is that when they question a larger group the results are much the same. However, whenever they question a larger group, it is just a larger group of people who answer questions in the street. People who visit town centres at lunch time to be able to answer questions in the street, and so on. Extrapolating these results to the whole population is surely dubious, and this maybe explains the miserable failure to predict the result of the 1993 general election.
Further, all of the polls contain the underlying assumption that those who answer the poll tell the truth.
The media are widely guilty of spreading hysteria (and also bringing statistics into disrepute) by wildly misinterpreting, or maybe just plain misunderstanding, statistics. The most common form of this is reporting increased risk as actual risk.
Many reports have been published on the subject of passive smoking. A recent report was widely reported in the media, including the BBC, as saying that the risk of lung cancer from passive smoking is now thought to be 15%, and not 24% as previously suggested.
Think about this. The risk IS 15%. That would mean 15% of the population contract lung cancer as a result of passive smoking. Completely untrue of course. The INCREASE in risk of lung cancer for passive smokers, compared to non-passive smokers, is 15%. To put this into context, you must first know the base level of risk for the control group. This is around 0.4% - 4 out of each 1000 persons contracting lung cancer. For the passive smokers it is 15% greater than this, 0.46 %, or 4.6 persons out of 1000. In other words, a very marginal effect and definitely not worth all the fuss.
However, in this particular set of statistics, the control group issue is very important. To compare the risks to passive smokers, the study must have found some way to isolate a group of non-passive smokers - some group of people who never breath in anyone else's smoke. How did they do this? In fact they didn't. The base assumption is that the majority of non-smokers are also non-passive smokers. The issue actually is how the identified the group of passive smokers. This was done by picking a group of non-smokers who have the greatest exposure to cigarette smoke. The group was none smokers who live with 40 or more a day smokers. This was done so that if there is any effect it stands a chance of being big enough to measurable. And even in this group the effect was very small. For the rest, the effect is probably negligible.
Compare this to the news stories "have you ever smelled someone else smoking on the bus? According to a new survey, you stand a 15% chance of contracting lung cancer"
What is the Campaign for Real statistics?
I would like to see much more informed and meaningful statistics used to describe risks and chances of occurrence. Ones that would help us make informed choices, rather than ones that create wild scare mongering. This needs to be done in two ways;
Firstly, check that statistics have been properly prepared, with reference to appropriate control groups, quality data collection that irons out spurious effects and isolates the effect being studied, to provide reasonable cause and effect implications.
Secondly, is to make sure that the statistics are correctly understood and reported in the media, so that the right conclusion is put across.
To achieve these things, anyone wishing to promote a set of statistics should get a Campaign For Real Statistics (CRS) audit and stamp of approval before going public. The kind of thing that people would come to trust, to say 'yes, but what do the CRS say about that'.
If you wish to lend your support, make comment, criticise, or add your own stories, please email me.
Content of this
webpage © Brown Box 2002
Back to the br0wn box