Self-correction of science: a comparative study of negative citations and post-publication peer review

This study investigates whether negative citations in articles and comments posted on post-publication peer review platforms are both equally contributing to the correction of science. These 2 types of written evidence of disputes are compared by analyzing their occurrence in relation to articles that have already been retracted or corrected. We identified retracted or corrected articles in a corpus of 72,069 articles coming from the Engineering field, from 3 journals (Science, Tumor Biology, Cancer Research) and from 3 authors with many retractions to their credit (Sarkar, Schön, Voinnet). We used Scite to retrieve contradicting citations and PubPeer to retrieve the number of comments for each article, and then we considered them as traces left by scientists to contest published results. Our study shows that contradicting citations are very uncommon and that retracted or corrected articles are not more contradicted in scholarly articles than those that are neither retracted nor corrected but they do generate more comments on Pubpeer, presumably because of the possibility for contributors to remain anonymous. Moreover, post-publication peer review platforms, although external to the scientific publication process contribute more to the correction of science than negative citations. Consequently, post-publication peer review venues, and more specifically the comments found on it, although not contributing to the scientific literature, are a mechanism for correcting science. Lastly, we introduced the idea of strengthening the role of contradicting citations to rehabilitate the clear expression of judgment in scientific papers.


Introduction
Science needs both innovation and self-correction. This has been ensured for centuries by academic controversy (disputatio 1 ), in other words, free and contradictory discussion within scientific peers. Scientists should both be sceptical of any new claim and provide 1 3 statements, hypotheses and theories that can be falsified (refuted) (Popper 1959). Science self-correction can result in the production of new knowledge with publications that rely on previous statements, oppose to them or just ignore them. It can also take a more drastic way with the official purge of the scientific literature and the withdrawal of what is recognized as false science. Retraction notices or errata are then published to correct the scientific record.
Our study focuses on the process of challenging past works and investigates 2 mechanisms: negative citations and post-publication peer review. Negative or contradicting citations can be found in scientific writing but post-publication peer review comments do not contribute to the scientific literature as such. Nevertheless, both target and cite a preexisting work by expressing a disagreement.
We have made the methodological choice to compare them by analyzing their occurrence in relation to articles that have already been retracted or corrected, in other words, publications whose contentious nature has already been acknowledged by a correction of the scientific record. This work is therefore at the boundary of 3 research topics: retracted or corrected papers, negative in-text citations, and post-publication peer review. It will address the following research question: are negative citations and post-publication peer review comments both equally contributing to the correction of science?

Retracted and corrected papers
Retractions are a way to alert the scientific community (and beyond) that flawed research has been published. The problematic pieces of work are therefore supposed to be purged from the literature or at least flagged because they are partially or fully inaccurate. They should not be used to build new research. Retractions are reserved for circumstances in which significant portions of an article are incorrect or cannot be substantiated whereas errata are published when isolated inaccuracies have been identified (Furman et al. 2012).
In both cases, the reasons why a paper is retracted or corrected are various: fabrication or falsification of data and plagiarism are commonly agreed to be scientific misconduct, but honest errors can also lead to the correction of published literature. The frontier is sometimes hard to establish but the intention to deceive is the key aspect (Bar-Ilan and Halevi 2018;Fanelli 2009). Bar-Ilan and Halevi (2018) coined the term "scientific distortion" to describe a category of articles including intentional and unintentional errors, both being considered hurdles for the advancement of science.
Although Ioannidis (2005) claims that "most research findings are false for most research designs and most fields", retracted and corrected papers are very scarce. Data from the RetractionWatch 2 database show that the absolute number of annual retractions has grown over the past decade (from 100 annual cases before 2000 to nearly 1000 in 2014) but even if the rate doubled from 2003 to 2009, it has remained stable since 2012 and only 4 of every 10,000 papers are now retracted (Brainard 2018). Although different from the ordinary process of self-correction of science (when new knowledge updates the old), retractions and errata are also a way to correct the scholarly record by changing the status of previously published works (Dougherty 2019).
The identification of retracted and corrected papers is difficult and despite COPE guidelines 3 and the Crossmark initiative, 4 publishers and bibliographic databases do not adopt any standards and are not consistent in their reporting of cases.

Contradicting or negative citations
De Solla Price (1963) discusses the phenomenon of cumulation of papers ("the way in which each paper is built on a foundation of previous papers then, in turn, is one of several points of departure for the next") and stated that "the most obvious manifestation of this scholarly bricklaying is the citation of references". A citation is, therefore, the explicit reference to another scientific piece of work within the full-text of a scholarly paper. And as Gross et al. (2002) stated, from the late twentieth century it has become a routine to conclude scientific articles with the list of references to past literature cited in the preceding text.
Even if it has been done widely for decades, merely counting citations is controversial (Cano 1989;Chubin and Moitra 1975;Kaplan 1965) and it is partly because "citation-incontext analysis" should also be taken into account (Small 1980). There is indeed a wide variety of reasons for a researcher to cite a previous piece of work, and it is obvious that citing is not always performed in a positive or supportive manner. In the early 1980s, (Garfield 1964) already discussed adding useful markers such as "critique", "data spurious" or "conclusions wrong" to describe the kind of relationship between the citing and cited documents. It is the same idea that is taken up by (Peroni and Shotton 2012) by proposing the CiTO ontology (Citation Typing Ontology): it provides the authors with the ability to capture their intent when citing previous works, as it allows them to add specific metadata to annotate a citation with its reasons. Table 1 shows that whereas negative citations of all types are always mentioned as one or several specific categories in citation classification schemes, the studies that rely on them show consensually that they are very uncommon, most of the times representing less than 5% of all references cited in a document. This very low rate is quite surprising since, as MacRoberts and MacRoberts (1984) stated, "criticism is the life blood of science". Catalini et al. (2015) suggest that criticisms expressed through citations could also be part of the "falsification" process defined by Popper (1959). But making a negative criticism towards another researcher requires a greater effort than making a perfunctory citation; it implies creating a context in which to justify an attack and explain why previous studies fell short (Rousseau et al. 2018;White 2001). It also means to take a risk by openly stating one's disagreement. Moreover Catalini et al. (2015) have shown that the authors were more willing to criticize researchers located further away geographically and explain it may be socially costly to negatively cite the work of a local colleague.
Among those studies presented in Table 1, early ones aimed at describing the motives for citations and demonstrating that they were not equal. The corpora they analysed were

Citations of retracted and corrected papers
The studies about citations of retracted papers have mainly explored temporal characteristics, focusing on post-retraction rate of citations, and concluded that citations continue long after retraction date, mostly because authors are unaware of the new status of the document they once read and now cite ( There is variance in the percentage of negative citations they identify: from 0 to 15% (Gabehart 2005;Kochan and Budd 1992), and even up to 32% in Garfield and Welljams-Dorof (1990), the unique study (Breuning case) not focusing on post-retraction period. When revisiting this study, Korpela (2010) discovered negative citations were overrepresented but also signalled an upsurge of positive citations after several years. For these authors, the continuous citations of retracted articles after the publication of the retraction notice is a serious problem. But we believe that citations of these articles are problematic both before and after the retraction notice. And of course, it is even worse if the citation is supportive. The problem is indeed the dissemination of false statements and results at any time. What is more, some studies acknowledge that it is hard to determine a retraction date (Bar-Ilan and Halevi 2017; Neale et al. 2010), so comparative studies before and after retraction might not be accurate.

Post-publication peer review
Our study also investigates post-publication peer review comments as another type of traces left by scientists to contest published results. Traditional peer review is necessary to a sound process of publication and can be one of the mechanisms involved in the self-correcting nature of science. But peer review is unable to detect all problems, neither scientific flaws nor misconduct. Comments or suggestions can be posted on academic social networks (e.g. Research-Gate, ScienceOpen) or with the help of dedicated tools (e.g. Hypothes.is). Some publishers also allow commenting on each publication on their website (e.g. PLoS, F1000) but most journals do not welcome correspondence or comments that criticize their publications (Barbour and Stell 2020). That is why post-publication peer review platforms have recently developed (Dubois and Guaspare 2019; Teixeira da Silva and Bornemann-Cimenti 2017). The most famous is PubPeer where comments can be made anonymously.
Although positive and praising comments can be made, they are in practice mainly negative or critical. Post-publication peer review venues are to be considered as tools to identify erroneous works that went through the traditional peer review process (Teixeira da Silva et al. 2017).

Methods and data
We introduce the idea to consider comments left on PubPeer as contradicting citations. Indeed, commenting on Pubpeer is the same principle as making a citation, with the consequence of creating a link between the comment and the target publication. However, what radically differentiates a comment on PubPeer from a citation in an article is that the former can be anonymous.
In order to evaluate negative citations in articles and comments on PubPeer as mechanisms for correcting science, we have built up a corpus of articles associated with retraction or correction notices. It is a way of having a material that has already been officially contested and corrected. From a methodological point of view, this saves us from having to individuate false science, and, therefore, it allows us to build a corpus of contested articles in an objective manner.
We used Scopus and the Web of Science to constitute a corpus of 72,069 articles (metadata), focusing on a particular discipline, on some particular journals and even individuals. We used 3 different criteria (discipline, journal, author) and constituted 7 datasets for comparison (Table 2): • Discipline: we chose the Engineering field which seems to have not been the focus of as much attention as biomedical sciences. We retrieved the ISSN of all journals classified in the Engineering subject area according to the ASJC classification scheme 5 and made two queries to retrieve the corresponding articles from the Web of Science and Scopus databases, limiting the results to 2012-2015 publication years. • Journal: • we chose 2 journals in biomedical field, Cancer Research and Tumor Biology, which are journals with high rates of retractions according to the RetractionWatch database, • we chose a multidisciplinary high-impact journal, Science, because some studies hypothesized that articles in high-profile journals are more prone to retractions (Fang and Casadevall 2011;Furman et al. 2012) and also because the most prestigious journals publish the least reliable science according to Brembs (2018).
We used their ISSN to retrieve articles from the Web of Science and Scopus databases, limiting the results to 2012-2015 publication years. • Author: with 3 high-profile researchers whose substantial part of their work has been officially retracted or corrected: Sarkar, Schön and Voinnet. We made 3 queries and retrieved all published articles of these authors with no publication date limit.
The corpus compilation is long and painstaking because retractions and errata are uncommon. It is therefore, necessary to have a very important initial database to obtain a sufficiently representative corpus. And they are also difficult to identify and locate online (Hesselmann et al. 2017;Poworoznek 2003; Teixeira da Silva and Bornemann-Cimenti 2017) and in bibliographic databases. That is the reason why we used both Scopus and Web of Science to identify retraction notices and cross-checked with RetractionWatch database. To be as exhaustive as possible, we retrieved the erratum notices for the corpus we delineated, extracted the DOI or the article title from each notice title (see example), and then tagged the original articles. When the same article is associated to one or several erratum notices and is also retracted, we tagged it as retracted.
Example of a title erratum notice: "Erratum: Recruitment of RNA polymerase II by the pioneer transcription factor PHA4 (Science (2015)  Then we used data from two websites to retrieve two kinds of critiques and enrich the corpus at article level: • Scite 6 to retrieve the number of contradicting citations, • PubPeer 7 to retrieve the number of comments.
Scite was launched in 2019. According to their founders, it is a platform that allows finding if a scientific article has been supported, contradicted, or mentioned by subsequent studies. Scite automatically extracts citations from papers and classifies them using deep learning models and a network of experts.
PubPeer was founded in 2012, it is a commenting website for centralized post-publication discussions; some of them have led to retractions or corrections.
We assumed that the number of contradicting citations retrieved by Scite and the number of comments posted on PubPeer are indicators of controversy and attempts to make a correction.
As we are interested in citations, we focus here on articles with at least 1 citation according to Scite database, that is 45,811 articles [the corpus is avalaible for download Bordignon (2020)].
The corpus contains 1.28% of retracted articles (see Table 2 for distribution by sub-corpus).
Scite returns 3 values (number of mentioning, contradicting, and supporting citations); we reduce each of them to the total number of citations found by Scite (not the number of citations identified by Scopus or the Web of Science). PubPeer returns a number of comments, we process them as contradicting citations and thus also reduce them to the number of citations identified by Scite.
Our objective here is not to test Scite's accuracy and we must assume that its detection of citation polarity, even if it is perfectible, gives a trend on which we can rely in this study. We also decided to group the retracted and corrected articles under a single label.

Negative citations
Consistently with previous studies, negative citations are very uncommon in the whole corpus: 0.29% (n = 1904) on average citations per article. 97.02% are mere mentions and only 2.71% are supporting citations (Table 3). There are slight differences according to the corpus tested. In the Engineering domain, authors seem to be accustomed to citations of mere mentions, which is consistent with the rate found for Schön's papers, whose work is close to this domain.
On focusing on retracted or corrected articles, we aimed at checking whether papers that became officially known to be contentious generate more negative citations, either before or after they were flagged. But in fact, no such trend has been detected.  Figure 1 shows the average contradicting citation rate for the 7 corpora and the difference between retracted/corrected articles and the others. There is no significant emerging trend. Nevertheless, here is what we can say: • Aside from the Voinnet corpus (and the Schön corpus as well, but there is no reference point), retracted or corrected articles are less contradicted than "normal" articles. But for both Engineering and Science corpora, the rates are too close to each other to be interpreted as different. • Corpora from individual cases are those for which the contradicting citation rate of retracted/corrected articles is the highest; however, the profiles of these 3 cases are completely different.
On the whole, these results show that negative citations are not more frequent towards retracted or corrected articles. Generally speaking, as they are too under-used by the authors, they do not have the impact they could have in the process of correcting science. Incidentally, the total absence of citations of papers that are supposed to be cited (because they are closely related to the topic) must be at least as significant an indicator.

Comments on PubPeer
Assuming that comments on PubPeer are a form of contestation very close to a negative citation, we tested the articles of our corpora in PubPeer to check if questioning the content of a publication is more frequent outside the usual publication process, and more particularly on a dedicated platform that allows anonymous comments. Table 4 shows the percentage of PubPeer comments received in respect to the total number of citations (in Scite database) and also displays the difference between retracted/corrected articles and the others.
Our data show that in general, corrected and retracted papers receive more comments on PubPeer than others. Furthermore, individual and mediatized cases (Voinnet and Sarkar corpora) generate the most comments on average in terms of the number of citations.

Comparison between negative citations and comments on Pubpeer
Lastly, Fig. 2 shows no correlation between the rate of negative citations and comments on PubPeer and confirms PubPeer as a possible place where the debate and contestation of findings can be carried out.
Our results show that Pubpeer, although external to the scientific publication process, contributes more to the correction of science than negative citations. The shallow analysis of some retraction notices also attests to this, since they often explicitly mention PubPeer as the place where the debate took place (e.g. "As raised on PubPeer, the article was found to contain images with signs of duplication and manipulation…"), thus feeding the decision to correct the record with a retraction or an official corrigendum. If negative citations are so uncommon, it is because it is difficult for an author to disagree publicly for fear of consequences. By allowing anonymity, Pubpeer overcomes this limitation, facilitating criticism and stimulating debate to improve the soundness of science. In some extreme cases, the scientific record is corrected through retraction or correction, but in other contexts, negative critiques at least provide a signal to the community that "scientific distortion" has been reported. It is essential that comments remain rigorously moderated by Pubpeer and that they stay "factual and verifiable", as stated in the platform guidelines. This is to avoid anonymous criticisms done with malicious intent in an attempt to weaken a competitor, as may be seen on non-scientific platforms, for example on consumer review websites (Wu et al. 2020).
Since our data show that negative citations are extremely scarce and supportive citations are hardly more frequent, this could lead us to question the interest in tagging the polarity of in-text citations, as suggested by Garfield (1964) and made possible by the CiTO ontology Peroni and Shotton (2012).
But when we consider the growing popularity of Pubpeer and the motivation of contributors to elaborate on the contradiction they provide, we could imagine on the contrary that journals should require a minimum ratio of contradicting citations to more easily see the benefit of an article in relation to previous works and to more easily identify "scientific distortions". This idea is compliant with the Leiden Manifesto (Hicks et al. 2015) decrying that performance evaluation is now led by the data rather than by judgment. It is also in the line of Griesemer (2020)'s proposition to "reimagine a role for judgment in the face of the data-driven metrics". Contradicting citations could be seen as a first step or a mere contribution to the rehabilitation of judgment. Further work is needed to explore the implications of that suggestion so that it will not fuel new opportunities to game the metrics.

Conclusion and future work
Our study is consistent with previous works and shows that contradicting citations are very uncommon: 0.29% of citations in the whole corpus we tested. Retracted or corrected articles are not more contradicted in scholarly articles than those that are neither retracted nor corrected but they do generate more comments on Pubpeer, presumably because of the possibility for contributors to remain anonymous. Consequently, post-publication peer review venues, and more specifically the comments found on it, although not contributing to the scientific literature, are a mechanism for correcting science. Future work, including surveys involving authors, may confirm that it is the fear of expressing their disagreement formally that motivates researchers to contradict their peers anonymously on Pubpeer.
Our study deals with a twofold scarce material (retracted/corrected articles and contradicting citations), it is thus also necessary to confirm our results with other corpora and to take into consideration the reason for the retraction or erratum. Indeed, plagiarism will not trigger criticism like data manipulation or scientific mistakes would, and errata sometimes refer to errors that do not affect the soundness of science (e.g., an erratum published for an error in the bibliography). We, therefore, plan to use the classification of reasons developed by RetractionWatch in their database and to carry out automatic analysis of the texts of the retraction and erratum notices. Lastly, our study is dependent on the accuracy of Scite, a recent tool whose reliability was not the subject of our study but which will have to be measured to go further. In a future project, we aim to re-evaluate Scite results and also to explore the possibility of further refining its classification by proposing nuances in contradiction. The text-mining of the comments on Pubpeer will also enable us to better understand the expression of contradiction and to fuel future tools to detect it.
We have also cautiously introduced the idea of strengthening the role of contradicting citations to rehabilitate the clear expression of judgment in scientific papers. We will explore the implications of that suggestion in further studies so that it will not be used as a new way to game the metrics. In the meantime, the tools that allow leaving a comment straight at the article level or those that alert an article has been commented on any platform, are undoubtedly a means of increasing the expression of contradiction on the one hand and the openness to criticism on the other.