Why is peer review so random?
Why is peer review so random?
8 Scientific Papers That Were Rejected Before Going on to Win a Nobel Prize
Funding Analysis: Researchers Say NIH Grant Funding Allocation Seems No Better Than Lottery
The same paper resubmitted to the same journal after several years often ends up rejected due to 'serious methodological errors'
For people whose profession revolves around making order out of seemingly-random observations, scientists sure are inconsistent at judging the work of other scientists. Why? It certainly doesn't seem to be like this at all levels. For example according to the GRE's website,
For the Analytical Writing section, each essay receives a score from two trained raters, using a six-point holistic scale. In holistic scoring, raters are trained to assign scores on the basis of the overall quality of an essay in response to the assigned task. If the two assigned scores differ by more than one point on the scale, the discrepancy is adjudicated by a third GRE reader. Otherwise, the two scores on each essay are averaged.
This implies that it's uncommon for two assigned scores to differ by more than one point on the scale, i.e. GRE essay raters usually agree. Similarly, as far as I know, undergraduate thesis readers, MS thesis readers and even PhD thesis readers don't usually come to diametrically opposed judgments on the piece of work. Yet once it gets to research-level material, peer reviewers no longer seem to agree. Why?
Because the space is very sparse, untrodden, and thus there is little statistics in novel areas to judge how good the work is?
– Captain Emacs
Aug 13 at 7:26
Its worth noting that in the case of the NIH grant allocations, the actual conclusion was that rank within the top 20% was random, but that reviewers did agree generally on which applications were in the top 20%. Thats more agreement than your GRE example.
– Ian Sudbery
Aug 13 at 8:54
We hear about only the few cases where peer review went wrong. But in most cases it does what it is supposed to.
– GEdgar
Aug 13 at 12:23
Nature only introduced peer review in 1967, so the 1930s papers rejected by Nature were rejected by the editor, not as a result of peer review.
– Count Iblis
Aug 13 at 21:37
10 Answers
10
The biggest difference is that, up to PhD thesis level, the person doing the assessing is more of an expert than the person being assessed. In almost all these cases there is an agreed set of standard skills, techniques and knowledge that any assessor can be expected to possess and any assessee is being measured against.
This isn't so true of a PhD thesis, but in the end once a supervisor/thesis committee has green lit a student, almost all PhD theses are passed.
It's definitely not true higher up. In almost all cases the person being reviewed will be more of an expert in their work than anyone doing the reviewing. The only exceptions will be direct competitors, and they will be excluded. We are talking right at the edge of human knowledge, different people have different knowledge and skill sets.
I'm quite surprised that the GRE scores are so consistent. It’s long been known that essay marking is pretty arbitrary (see for example Diederich 1974[1]).
Mind you 1 mark on a 6 mark scale is 15% – a pretty big difference. In our degree a 70 and above is a 1st class degree – the best mark there is, whereas 55 is a 2:2, a degree that won't get you an interview for most graduate jobs. Losing 15% on a grant assessment will almost certainly lose you the grant.
But even to obtain this level of consistency, the graders must have been given a pretty prescriptive grading rubric. In research, no such rubric exists; there are not pre-defined criteria against which a piece of research is measured, and any attempt to lay one down would more or less break the whole point of research.
@Mehrdad Already you run into issues. What does "reproducibility" mean for a math theorem? What does "correctness" mean for a philosophical text? And I think everyone would be hard-pressed to give numerical grades for each of these categories given a paper. And there's no threshold above which a paper is accepted and below which the paper is refused.
– Najib Idrissi
Aug 13 at 12:27
The footnote expanding "Diederich 1974" is missing.
– David Richerby
Aug 13 at 14:20
I believe [1] is "Diederich, P. B. (1974). Measuring growth in English. Urbana, IL: National Council of Teachers of English." ( eric.ed.gov/?id=ED097702 ) Another related paper by the author, with a slightly more useful abstract: Diederich, P. B., French, J. W. and Carlton, S. T. (1961), FACTORS IN JUDGMENTS OF WRITING ABILITY. ETS Research Bulletin Series, 1961: i-93. doi:10.1002/j.2333-8504.1961.tb00286.x (which says "free access" but may just be my university)
– tolos
Aug 13 at 16:10
@mehrdad - your 2,3, and 4 are all extremely subjective.
– Mazura
Aug 13 at 18:22
@Mehrdad those are categories of criteria, not a rubric. Whether a particular piece of work meets any of your criteria is, as said above, entirely subjective. Meanwhile a GRE type essay will have very specific criteria: When we grad UG essays the essay comes with a 2 or 3 page list of appropriate content + a guide that says things like: Content - 5: Most taught content (some gaps allowed), plus some non-taught content, 4: Most taught content, but some gaps 3: Major gaps ... Strucuture - 5: Well constructed and logically water tight arguement etc
– Ian Sudbery
Aug 14 at 8:54
Good question. Hard to answer. Some thoughts:
Considering these observations, it is unlikely to expect two review reports to be aligned. Then the difficult decision transfers to the associate editor who is also a volunteer and not specialized in the author’s field.
Leaves the question why it is accepted while outside science this wouldn’t be. Honestly, I don’t know. Just some guesses:
Added based on comment:
- reviewers are busy scientists
- reviewers are career-wise not rewarded for conducting reviews
Good list! I would add that reviewers might not have a lot of time and rush paper review, even though they volunteered. Some people also might not be very invested in doing a thorough review, because they do not benefit directly. Both are probably not often the case, but it seems likely.
– Ian
Aug 13 at 10:42
Something new is already on its way! researchers.one
– Peaceful
Aug 13 at 15:37
Thank you, @Ian. I fully agree and added your points at the bottom of my post.
– Alice
Aug 14 at 7:28
Thank you @Peacefull. I follow you here. Science is changing. Actually, I believe this is a very interesting time where scientists on the entire globe can and will shape the future of science.
– Alice
Aug 14 at 7:31
"Jourals do not have the funding to train and attract qualified professionals/scientists as reviewers?" I would say that there isn't demand on journals to do this and they're not going to do it until the demand forces them to.
– Dean MacGregor
Aug 15 at 17:33
With respect to the good papers being rejected problem, a factor that doesn't seem to have been mentioned yet is that the consequences of accepting a bogus paper are much worse than those of rejecting a good paper. If a good paper is rejected, it can always be resubmitted to a different journal. And if the authors first revise according to the reviewer comments, the version that ends up getting published may well be better written than the one that was rejected. All that's lost is time.
But if a bogus paper is accepted, other scientists may see it in the literature, assume its results to be valid, and build their own work upon it. This could result in significant lost time on their part, as experiments that depend on the bogus result don't work out as they should (which at least may lead to the bogus paper being retracted if the errors are bad enough). Or maybe they'll avoid researching along a line that would have worked, because the bogus paper implies it wouldn't, or worse, they'll end up with inaccurate results themselves and end up putting another paper with bad data into the literature. All of these are far worse outcomes than just needing to resubmit a paper, so false negatives are preferred to false positives when reviewing.
This is really the best answer and makes the most sense and includes the least cynicism. Not sure why this isn't hugely upvoted since it really does explain the core challenges of the process (danger of accepting a bad paper).
– raddevus
Aug 13 at 20:30
This is definitely a good perspective and argument. However, currently also many bogus papers pass the review process. I can point them out in my field. And good papers or good papers with contradicting results don’t make it. I have an example where people got killed because a published paper became the standard (in medicine) and opposing papers were rejected (resulting in very angry and disappointed scientists) Personally I believe a scientist is never relieved from judging the quality of published papers. I do not rely on the fact that a paper is published. I use my own judgemental skills.
– Alice
Aug 14 at 7:45
In addition, there is nothing to suggest that the rejected papers were bad at that time of the original submission, and that only after they had been revised were they clear enough for publication. It would be rare IMO that an author would not substantially revise the writing/presentation after rejection: after all, the odds are that if you submit the same paper twice, the result will be the same twice. I’m actually surprised only 8 such papers have been found.
– ZeroTheHero
3 hours ago
This won't really answer your question, I realize, but I'd like to address your first example - rejected papers that later led to Nobel prizes.
Sometimes a piece of work is Frame Breaking and it leads to a Paradigm Shift within a field. This has happened many times in history, since at least Copernicus and Galileo. Einstein's early work on relativity was rejected among the physics/astronomy hoi oligoi as it was too different from the belief in the Aether at the time. The most prominent members of the field reject a radically new idea and their students, who are pervasively represented usually go along.
It has been said that revolutions in physics require the death or retirement of the most respected researchers so that the ideas of the young can get a fair hearing and come to the fore.
That is in fact an explanation of at least some of the eight papers referenced in your first link.
I don't think that many of us write paradigm changing papers, but it occasionally happens. The truly brilliant (not guilty) among us often must labor in near silence and obscurity for most of a generation. The next generation may celebrate them, or it may take even longer.
When a reviewer is faced with a truly frame breaking paper they, by definition, have no frame of reference in which to evaluate it. It is orthogonal to their entire way of thinking. "This must be nonsense", is the too-natural response.
Read, for example, the short Wikipedia biography of Ramanujan.
Thomas Kuhn's book "The Structure of Scientific Revolutions" is the classic reading on this point.
– WBT
Aug 13 at 17:15
This is the most relevant answer to the direct question - Nobel worthy papers are radical or revolutionary by nature. This naturally makes their review all the more sceptical an affair.
– J...
2 days ago
It's not just physics where revolutions require a changing of the guard. This happens in geology and chemistry as well.
– Peter Shor
2 days ago
@PeterShor, true, and not even just science.
– Buffy
2 days ago
Different tasks, different results.
All the GRE graders have to do is assign scores but they are doing so to dozens or hundreds of essays. They receive clear guidance and examples about what score given essays should probably receive. So it’s basically checking boxes to justify a small set of results.
A peer review analysis is fundamentally different since you’re asking for a much more technically difficult task. They have to evaluate if the analysis is accurate, not if it’s responsive to a prompt. There’s no set of examples to draw on either. So the focus of peer review can be very different for different reviewers who may have different sets of expertise and certainly will have their own points of view.
There’s no set of examples to draw on either what about all the already-published papers?
– Allure
2 days ago
To compare academic peer review to GRE grading -- that makes apples and oranges look all but identical. Let's step a little closer:
Similarly, as far as I know, undergraduate thesis readers, MS thesis readers and even PhD thesis readers don't usually come to diametrically opposed judgments on the piece of work.
That is certainly not always true and highly field dependent. In certain parts of academia it is a standard grad student horror story that Committee Member A insists that the thesis be cast in terms of Theoretical Perspective X, while Committee Member B insists that the thesis be cast in terms of Theoretical Perspective Y, where X and Y may be intellectually incompatible or sociologically incompatible: i.e., each theory has rejection of the other as a central tenet. This is more common in humanities where the nature of "theory" to the rest of the work is rather different, but it is not unheard of in the sciences either.
As a frequent committee member, I also happen to know that coming to a consensus judgment is a sociological phenomenon as well as an intellectual one -- i.e., some differences in judgment are limited only to the private discussion following the defense and other differences in judgment are never verbalized at all.
This is helpful in understanding the disparity in peer review: in peer review, the different referees are (in my experience, at least) never in direct communication with each other, and in fact may not be seeing each other's verdicts at all: as a referee, I believe that I have never been shown another referee report. In fact,
Who watches the watchmen?
There is no aspect of the academic process that makes me feel like a lone masked vigilante more than being a referee. Surely people who do GRE grading go through some lengthy training process of repeated practice evaluating, feedback on those evaluations, discussion of the larger goals, and so forth. There is nothing like this for academic referees. We get no practice, and there is very little evaluation of our work. If I turn in what is (I guess!) an unusually comprehensive report unusually quickly, I will often get a "Hey, thanks!" email from the editor. In the (thankfully rather small) number of instances where my referee reports were months overdue, I either heard nothing from the editors (I am ashamed to say that once I figured out on my own that a paper I thought I had had for a few months had actually been an entire year) or got carefully polite pleas for me to turn in the report. I have never gotten any negative feedback after the fact. Unlike GRE graders, referees are volunteers.
I find (again, in my experience and in my academic field of mathematics) that referees are almost never given instructions that amount to any more than "1) Use your best judgment. 2) We are a really good journal and want you to impose high standards." I also notice that 2) is said for journals of wildly differing quality. What does it mean to "impose high standards"? I take that directive seriously and fire my shots into the dark as carefully as I can, but....of course that is ridiculously, maximally subjective.
On the other hand, your third link is pretty alarming. It describes a systematic process of resubmitting papers that had been accepted and published by prestigious journals within the last three years to the same journal that published them. In the majority of cases, the journals did not recognize that they had published the papers before. I find that very surprising.
– Pete L. Clark
Aug 13 at 17:03
We write more and more, and the typical submission quality seems to be going down. This has various reasons, including bad incentives in particular in China. If your salary directly on the papers accepted, quantity beats quality...
IMHO we are close to a tipping point now. Many of the expert reviewers refuse almost any reviewing request - because so many submissions are so sloppy, that it's quite annoying to review them. It should be different: most submissions should be so high quality that you enjoy reading that and can focus on the details. So more and more experts are just annoyed. They delegate more of the reviewing to students, or simply refuse. But that now means the remaining reviewers get more requests, and more bad papers. This can tip quickly, just like most ecosystems.
So the editors need to find other reviewers, and we get less and less expert reviewers. This also opens doors to scams and schemes. Multimedia Tools and Applications for example seems to have fallen prey to editor and reviewer manipulation scheme.
So what's the solution? I don't know.
Contributing a point beyond other answers:
Different levels of effort going into the review leads to different outcomes.
Papers are often written such that on a first pass read, it's supposed to read "pretty good" even if a more critical deep read and/or check of references would expose gaping holes, serious methodological issues, and alternative explanations for the results observed. Sometimes, an even more-effortful review can find that these issues don't actually matter in the particular case applicable to that specific paper (though the author should generally add this to the paper text itself).
While reviewers are incentivized to do a good job by the general knowledge that the system depends on that, specific instances are generally not incentivized and reviews sometimes get left to the last minute with a reviewer who's short on sleep and long on other tasks, who doesn't put in the effort for a good review. Thus, the result could be very different than even the same paper getting reviewed by the same reviewer at a different time. With no visibility into the factors affecting that outcome, it seems random.
To address the aspect of:
The same paper resubmitted to the same journal after several years often ends up rejected due to 'serious methodological errors'
In about one third of the papers I reviewed, I identified fundamental flaws that could not be addressed by revising the paper (you would have to write a new paper instead). Some examples just to give you a taste:
While I may have been wrong about these things, the authors never addressed my concerns, be it in a rebuttal or version of the paper published in another journal (which never happened in most of these cases) – which is something they should do even if I am wrong.
Now these issues may seem like they should have been easy to spot, but evidently they weren’t: I spotted some of these flaws only when writing up the actual review, and I witnessed (and performed) quite a few jaw droppings when discussing papers with co-reviewing colleagues¹ whom I knew to be thorough. Also, in some cases I saw reports of other referees who were otherwise exhaustive but did not spot the issues.
So, to conclude: Even fundamental flaws are difficult to spot. A given reviewer only has a comparably small chance to spot a given flaw in a paper. Therefore there is a considerable chance that all of the reviewers fail.
¹ Yes, that’s a thing in my field and fully accepted by the journals.
I am curious about the "discussing papers with co-reviewing colleagues" part, since (as I mentioned in my answer) that is absolutely not a thing in my field (mathematics). (Although in fact, in my field the most common number of referees is one.) How does that work?
– Pete L. Clark
Aug 14 at 13:37
@PeteL.Clark: Essentially, a referee is allowed seek the input of colleagues as long as they ensure that the reviewed material is kept confidential and name the additional reviewers when submitting the review (policy example). Typically, an advisor takes an advisee onto the team.
– Wrzlprmft♦
Aug 14 at 15:24
The fundamental difference between grading GRE essays and reviewing scientific papers submitted for publication has been has been cogently discussed in several previous answers. The fundamental difference between reviewing grant applications and papers for publication has not.
Publications. When a paper is submitted to a journal it is usually supposed to be a finished product, or at least one finished step toward a defined goal. It is truly difficult
to find reviewers who are able to assess the importance of a paper and to
find every gap in reasoning or every imperfection in technique, but at least
reviewers of journal articles have the results of a piece of research at hand.
A potential Nobel paper may cover material so new or so far off the beaten track that it will be especially difficult to review fairly. A paper resubmitted after several years may have been based on procedures or techniques that have
been considerably refined in the meantime. So maybe they were state
of the art at the time of original submission, but are now 'seriously flawed' in terms of currently
available methods. So it is hardly surprising that reviewers don't score 100% on
those tasks.
However, even though reviewers are unpaid, overworked volunteers, working with no specialized training or feedback on the details of reviewing, I think it is surprising how well journal reviewing works in practice.
Grants. By contrast, making judgments about research grants is an entirely different
kind of activity. Some years ago (when US federal funding was available at a much higher level than it is now), I spent several years at a federal agency with a reasonably large budget for supporting basic and applied research in a variety of scientific fields. So I will try to address this part of the picture briefly. I will begin by saying that I am not at all surprised that a panel of research scientists would find the funding of NIH (or any other US government agency) to
be 'no better than a lottery'.
Generally speaking, if you know exactly what you are doing,
how long it will take, and how much it will cost, you're not doing research. Reviewers can often be useful, in assessing a proposer's track record of success and providing a rough idea whether the proposer is competent to
undertake research in a particular area. (I should add that most program directors are well aware of the standards, biases, foibles, and strengths of the reviewers they use. I was seldom surprised by the contents of a requested review, but the few surprises were extremely valuable.) However, reviewer input is only a part of the picture.
Going beyond reviewer input, program directors in granting agencies have to take other factors into account. To some degree they must consider
financial, political, and infrastructural factors. 'Political' usually means that that money was appropriated or donated
specifically to support a particular scientific goal. Infrastructural concerns
may center on developing technologies that are agency goals, training graduate students in fields where there are not enough researchers, whether the institution requesting the grant has the sophistication for adequate stewardship, and so on.
In the
US, agencies such as NIH, NSF, DoE, EPA, various defense agencies, and various privately funded agencies may have very different goals. However clearly these agency
missions and objectives may be spelled out in 'requests for proposals', they are often ignored by grant applicants, who might make a better case for their work if the
appropriate connections were made clear.
In spite of these constraints on awarding of grants, program directors strive to
support nothing but the highest quality science, and I believe they usually succeed at that. In my experience, almost all of them view themselves as scientists first and agency bureaucrats second.
Often their success is with the considerable help of reviewers, but sometimes not.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thanks for the links, @Allure. I've been rejected soo much I'm sure that my Nobel prize is in the mail. I'd better start preparing my acceptance speech. :)
– St. Inkbug
Aug 13 at 7:08