This is going to be one of those annoying posts where I tell you to first go and read something else before coming back.
Sometimes, when research results are too good to be true, people start thinking there might be something fishy going on. Jens Foerster, for instance, was called out on the excessive linearity of his effects. He’d measure some cognitive variable, then apply one experimental manipulation that should decrease the expression of this variable, and another manipulation that should increase it. Too many times, the increase and decrease were just about equal in magnitude. What the nasty data police then tested is what the odds are of getting such a beautifully linear effect from his small samples, given the assumption that the effect in the population truly is linear. And the chances of getting such clean results, so many times in a row, are virtually nil. Real data are noisy; these were too good to be true.
Yesterday I came across a blog post that’s too good to be true in a different way. It was a post that outlined problematic research practices in such a neat manner that I wondered whether it might be satire. But it’s not, and I’ll do a quick recap, but really, do go and read it.
What it describes is the winning strategy for staying in science, through the contrasting behaviour of two lab members. The protagonist is charmingly called The Turkish Woman (it must be satire!), a person who throughout the text complies with her supervisor’s instructions. As her reward, she publishes five papers. These are referenced at the bottom of the post.
What she is initially introduced to is a ‘failed study which had null results’, a rich dataset where the hypothesis was not confirmed. She is told these data were expensive and time-consuming to acquire – and that there must be something in there that can be salvaged. (It’s going to be a cautionary tale about incentivising researchers to p-hack!) Every day, they pore through the data, reanalyse it in new ways, and come up with a different set of hypotheses. (It’s going to be about hindsight bias!) This goes on until a variety of discoveries are made while ‘digging through the data’, and a set of papers gets published. (Read it!)
The other lab member is a postdoc who was not interested in being involved in this. They publish less, they leave academia, and their main role in the post is to provide a contrast to the winning strategy.
It is a post that aims to accentuate hard work, efficiency, capitalizing on opportunities, a collaborative spirit, and dedication. It ends up highlighting questionable research practices, misrepresenting exploratory research as confirmatory, and a lack of understanding why null results are important.
I took a peek at three papers from this series. None of them mention that other publications have come from the same dataset. They all mention only a selection of tested variables, not all, as if only a few things were measured each time. These are linked to hypotheses that are made to look like they existed before the data were collected.
The post even outlines how to data mine. You have a hypothesis, but maybe it didn’t work. But did it work at lunch, if not at dinner? Did it work with small groups and not large groups? One can go on like this, creating new combinations of variables until a result shines through. Statistical tests are a messy business, our criteria are not stringent, the samples are small, something is bound to come up as significant if we look hard enough. Now, it’s significant whether we looked at it or not – testing the data in many different ways is not the problem. The problem is not reporting all the other variables that were collected and all the other tests that were carried out. Because if we knew this was one result out of, say, 200 tests, then we would be less likely to give it much credence. Especially compared to a situation where it’s exactly the thing that the researchers had a hypothesis about, and lo and behold, it turned out to be true!
Critically, when a result like that becomes part of the published record, there has to be a way of disconfirming it. Considering an experiment failed because a hypothesis was not confirmed, means that null findings won’t get published. What we end up with in the literature is a whole lot of random noise, which nobody gets to correct.
The author of the post is a professor at Cornell, well cited and successful. The post stings because there is truth to it – the person producing such noise folded into compelling narratives might well be more likely to make it in academia than the person who would publish the null result instead. I was not the only one to think it must be satire. But it’s certainly educational. Too good to be true, such a description of the driving forces of the replication crisis laid bare, for all eyes to see.