Proofreading a Technical Text



Distributed Proofreaders recently made Alfred Russel Wallace’s two-volume book The Geographical Distribution of Animals (1876) available for free download from Project Gutenberg (Volume I and Volume II).

Wallace and fellow naturalist Charles Darwin not only were colleagues in their researches, but also collaboratively originated seminal ideas about the development of animal species, resulting in what is now generally known as evolution.

Scientific or technical works like Geographical Distribution can present special challenges to the Distributed Proofreaders volunteers who work on them. This post explores some of those challenges.

The Distributed Proofreaders Process

Distributed Proofreaders volunteers acquire scanned images of public domain books either from online sources like The Internet Archive or by scanning the books manually. The scanned images for Geographical Distribution came from The Internet Archive.

The scanned page images are run through optical character recognition (OCR) software to turn them into editable text. Sometimes the resulting text contains what we call “scannos” — misinterpretations of the image by the OCR software, such as a speck on the image rendered as a period, or the word “I” rendered as a numeral 1. Under the guidance of a Project Manager, volunteers proofread the text for errors and to format it, a page at a time, in several rounds. The Distributed Proofreaders process enables many volunteers to work on the same book at the same time. Another volunteer (the post-processor) assembles the final product into a complete e-book which, after final checks for errors, is then posted to Project Gutenberg.

During the proofreading phase, many problems can be resolved easily. For example, a scanno, such as “carnage” for “carriage,” is simply corrected to match what appears in the original page image. Not all problems are small ones, though. The proofreader who encounters a more difficult problem, such as one of those discussed below, is required only to leave a note about it for future volunteers working on the text. Some proofreaders choose to go further and search reference materials, such as dictionaries, and ask for help in the project’s discussion forum or one of the specialised forums at Distributed Proofreaders.

While many projects at Distributed Proofreaders are straightforward, others present challenges like poor printing, resulting in poor scan quality and therefore errors in the raw text; antiquated language found in older texts; many or large tables of data, etc. The object is to determine the author’s true intention and reflect that in the final product.

Proofreading Geographical Distribution

From May to October, 2016, Distributed Proofreaders volunteers worked on the first volume, resolving (or attempting to resolve) several thorny issues, communicating with each other and the Project Manager in the Project Discussion.

This text had good quality scans, with very few typographical, spelling or grammar issues. The challenges lay in the fact that it was a deeply technical work with specialised biological terminology. Here are some of the interactions volunteers had in the Project Discussion.

Differentiating between æ and œ ligatures

With the clear scans, it was generally easy to distinguish between æ and œ ligatures. But the original printer apparently had some trouble doing so when working from the author’s manuscript. Misreading of the ligatures led to subsequent mistakes that were easily perpetuated in the rest of the work, even by such a scrupulous authority as Wallace.  Of course, in extenuation, the Internet age has made it much easier to check doubtful cases than it was in Wallace’s day.

One volunteer’s research could not determine whether Cænyra was a typo for the more likely Cœnyra. My researches led to the Bulletin of the British Museum (Natural History), Entomology Supplement 9 (1967), where Francis Hemming states that Cænyra is “an incorrect subsequent spelling of Cœnyra.”

Both Turacœna and Turacæna occur twice in Volume I, but Turacœna is not italicised on two pages, which makes it much easier to identify. Turacœna also appears twice, and Turacæna not at all, in Volume II. Volume II includes the index, and Wallace states in the errata for Volume II that misspellings have been corrected in the Index.  These facts make Turacœna the people’s choice.

Typographic or spelling errors

A very rare typographic error in Geographical Distribution is Wallace’s reference to “the living three-handed armadillos” for three-banded armadillos.

There is a reference, with a clear connection to kingfishers, to the genus Halycon. Exhaustive, in-depth research (even using dead-tree books on my shelf) suggested that it is a long-standing error which had been perpetuated. The genus, in my humble informed opinion, should be Halcyon (as Wallace has it in the second volume, as well as several times in the first volume). In other words, a rare typo.

When a typesetter uses the upside-down letter n, it will turn into the confusing letter u, as in Otiorhynchus vs. Otiorhyuchus. Which is correct? I go for the confusing u with n theory, as rhynchus in Greek refers to nose, beak or snout, and rhyuchus is not a sensible construction. This is where familiarity with Latin or Greek roots saves the day.

But if one sees the word drougo, knowing about drongo, or finds the word scink which is usually spelled skink nowadays; there is considerable doubt. Is it an older version, or a typo? Why does he have Ethiopian, except for the one occurrence of Ethiopean?

Sometimes the puzzle is intractable without a true subject specialist’s advice.  For instance, is it Ptilornis or Ptilorhis or even Ptiloris? Ptilorhis appears to be a late misspelling; but a Ptiloris exists; and Ptilornis ends with the root of ornithology.

Dealing with typos is, of course, the real elephant (Loxodonta africana or Elephas maximus) in the room. There are two kinds of taxonomists: lumpers and splitters. The splitters at one time had about a dozen elephant species; nowadays the lumpers are in the ascendance, and we have only two. Just in case you wondered.

One of the volunteers documented a few variations in spelling or typography: honey-sucker and honeysucker; king-fisher (in the index) and kingfisher (everywhere else); wood-pecker once, elsewhere woodpecker; aerial or aërial… The list goes on. It is for the post-processor ultimately to make the final decision about standardising such variations, but sharp-eyed proofreaders can help by leaving notes about their observations.

Scientific nomenclature

The system of naming organisms with a genus name followed by a species name is  universal, if complicated. This was never completely stable, and some tough  investigations had to be undertaken to decide which version (where the volumes had  more than one) was to be accepted.

A Distributed Proofreaders volunteer agonises: “How do you feel about Wallace’s occasional habit … to start species name with a capital letter? For me, it seems [to] violate everything I’ve learned about scientific names.… Have the rules regarding capitals been different, earlier?”

Wikipedia has an interesting article about binomial nomenclature, with links to more information.  It appears that for animals, the rule was changed to make species’ names start with a lower-case letter, a change that only happened many years later for plants.

Nowadays the rule is explicit and rigid — the genus starts with a capital and the species with a lower-case letter. In the old days there were many different rules at different times, so in the case of this project, we must follow Wallace’s usage.

Hyphenating biological names

I had to leave a general note about end-of-line hyphens splitting biological names. “Whenever I find one I check the name; but in any case, these are extremely rarely hyphenated, so please don’t put the hyphens back in unless you are absolutely certain!”

Rewards of Distributed Proofreading

Understanding historical context

Working with old and unusual material which might be otherwise unobtainable frequently supplies a context for current ideas. One example is Wallace’s puzzlement about the strange and sometimes anomalous animal habitats he found. I can’t help thinking how delighted he would have been to hear about continental drift, explained by plate tectonics, the theory which the South African geologist Alexander du Toit put on a solid footing after Alfred Wegener first floated the idea in 1912, decades after Geographical Distribution was published. This quote from Wallace illustrates my meaning perfectly:

Should we ever arrive at a fair knowledge of the physical changes that have resulted in the present condition, we shall almost certainly find that many of the differences and anomalies of their existing fauna and flora will be accounted for.

Understanding the author’s character

Wallace, like many naturalists, collected insects, including beetles. As he explained:

[These] families comprise the extensive series of ground beetles (Carabidæ) containing about 9,000 species, and the Longicorns, which are nearly as numerous and surpass them in variety of form and colour as well as in beauty. The Cetoniidæ and Buprestidæ are among the largest and most brilliant of beetles; the Lucanidæ are pre-eminent for remarkable form, and the Cicindelidæ for elegance; and all the families are especial favourites with entomologists, so that the whole earth has been ransacked to procure fresh species.

Results deduced from a study of these will, therefore, fairly represent the phenomena of distribution of Coleoptera, and, as they are very varied in their habits, perhaps of insects in general.

I am reminded of J.B.S Haldane, who was a British scientific polymath of the early 20th Century. It is variously reported that his reply to a question by a theologian whether anything could be concluded about the Creator from the study of natural history was “an inordinate fondness for beetles.”

Making texts accessible to all

Apart from the new things we Distributed Proofreaders volunteers learn every day from working on public domain projects, we have the great satisfaction of “preserving history one page at a time” and introducing new readers to the rewards of great old books like this one.

This post was contributed by Bess Richfield, a Distributed Proofreaders volunteer.

