Content moderation is vital on preprint servers like arXiv.org. Today, more than ever, moderators are experiencing immense pressure due to the rising number of AI slop papers, while sociologists have shown how moderators function as gatekeepers, who scan and sort submissions.1 But how did the practice of preprint content moderation start?
While preprints were initially distributed by authors themselves, moderation was inherent to the practice of preprint communication. But as soon as the of preprints became the purview of libraries in the late 1950s and early 1960s, the practices of controlling the preprint literature had to be made more explicit. However, since libraries did not perform peer review, or other forms to judge the quality of the academic content of papers, their practices encompassed collecting, sorting, and registering incoming papers. Library staff improved their handling of the newest accessions and were innovative in creating methods to handle papers, as the influx of preprints to the library grew and researchers’ information demands increased – essentially prefiguring the later content moderation on preprint servers.

“No attempt would be made to select the papers according to the scientific value of the work presented therein.”2 – this was how the library at CERN qualified its work to control the preprint literature. With the rising demand in preprints, the library at the Geneva laboratory, began to understand itself as an important gatekeeper of the preprint literature, however, without appropriating for itself editorial qualities like journal editors. Instead, the 1965 CERN Library Staff Manual states: “The usefulness of a special library depends in large measure on its selectivity.” At the same time it warns that a “heterogeneous mass of vaguely related documentation can choke or crowed out the relevant and important items.”
Libraries were aided by physicists in making their selections and innovative in developing new methods to select and classify content. At the CERN library, staff introduced a pragmatic categorization system to manage the constant stream of preprints . Papers were given simple subject categories from the field of high-energy physics, such as “theoretical particle physics,” “high energy experimental physics,” “experimental techniques,” “detectors,” or “accelerators,” purely to enable the list to be sorted in a hierarchical order.3 Libraries also employed “scientific information officers,” who scanned papers sent in and helped categorize them.4 Scientific information officer was a specific role in the organization, usually for people trained in both physics and librarianship, who had retired from active scientific research but remained in touch with recent developments in the field due to their academic expertise. As with content moderation today, the library at CERN recognized early on that such forms of content moderation introduced “certain dangers”: the manual admits, “the rejection of ‘border’ material is inevitably somewhat arbitrary.” There are no ‘objective’ criteria, which can determine what counts as useful information.

- Reyes-Galindo, L. Automating the Horae: Boundary-Work in the Age of Computers. Soc. Stud. Sci. 46(4), 586–606 (2016). ↩︎
- See Roth, P. H. Formalizing informal communication: an archaeology of the pre-web preprint infrastructure at CERN. Minerva (2026). ↩︎
- Ibid. ↩︎
- Roth, P.H. How libraries classified physics preprints before arXiv and set the stage for distinguishing insiders from outsiders. Nat Rev Phys 8, 188–189 (2026). ↩︎