Preprint Libraries and the History of Information Infrastructure in Physics

Preprint content moderation avant la lettre?

Content moderation is vital on preprint servers like arXiv.org. Today, more than ever, moderators are experiencing immense pressure due to the rising number of AI slop papers, while sociologists have shown how moderators function as gatekeepers, who scan and sort submissions.1 But how did the practice of preprint content moderation start?

While preprints were initially distributed by authors themselves, moderation was inherent to the practice of preprint communication. But as soon as the of preprints became the purview of libraries in the late 1950s and early 1960s, the practices of controlling the preprint literature had to be made more explicit. However, since libraries did not perform peer review, or other forms to judge the quality of the academic content of papers, their practices encompassed collecting, sorting, and registering incoming papers. Library staff improved their handling of the newest accessions and were innovative in creating methods to handle papers, as the influx of preprints to the library grew and researchers’ information demands increased – essentially prefiguring the later content moderation on preprint servers.

Two female library staff sorting papers on top of the card catalog at the CERN Central Library.
Library staff cataloging preprints at the CERN Central Library in 1968 (c) CERN

“No attempt would be made to select the papers according to the scientific value of the work presented therein.”2 – this was how the library at CERN qualified its work to control the preprint literature. With the rising demand in preprints, the library at the Geneva laboratory, began to understand itself as an important gatekeeper of the preprint literature, however, without appropriating for itself editorial qualities like journal editors. Instead, the 1965 CERN Library Staff Manual states: “The usefulness of a special library depends in large measure on its selectivity.” At the same time it warns that a “heterogeneous mass of vaguely related documentation can choke or crowed out the relevant and important items.”

Libraries were aided by physicists in making their selections and innovative in developing new methods to select and classify content. At the CERN library, staff introduced a pragmatic categorization system to manage the constant stream of preprints . Papers were given simple subject categories from the field of high-energy physics, such as “theoretical particle physics,” “high energy experimental physics,” “experimental techniques,” “detectors,” or “accelerators,” purely to enable the list to be sorted in a hierarchical order.3 Libraries also employed “scientific information officers,” who scanned papers sent in and helped categorize them.4 Scientific information officer was a specific role in the organization, usually for people trained in both physics and librarianship, who had retired from active scientific research but remained in touch with recent developments in the field due to their academic expertise. As with content moderation today, the library at CERN recognized early on that such forms of content moderation introduced “certain dangers”: the manual admits, “the rejection of ‘border’ material is inevitably somewhat arbitrary.” There are no ‘objective’ criteria, which can determine what counts as useful information.

Classified advertisement for the position of a Scientific Information Officer at CERN.
Job ad by the CERN Scientific Information Service in the September 1967 Issue of the New Scientist.
  1. Reyes-Galindo, L. Automating the Horae: Boundary-Work in the Age of Computers. Soc. Stud. Sci. 46(4), 586–606 (2016). ↩︎
  2. See Roth, P. H. Formalizing informal communication: an archaeology of the pre-web preprint infrastructure at CERNMinerva (2026). ↩︎
  3. Ibid. ↩︎
  4. Roth, P.H. How libraries classified physics preprints before arXiv and set the stage for distinguishing insiders from outsidersNat Rev Phys 8, 188–189 (2026). ↩︎

Leave a comment