Proofreading a Technical Text

April 1, 2019

geodistribmap

Introduction

Distributed Proofreaders recently made Alfred Russel Wallace’s two-volume book The Geographical Distribution of Animals (1876) available for free download from Project Gutenberg (Volume I and Volume II).

Wallace and fellow naturalist Charles Darwin not only were colleagues in their researches, but also collaboratively originated seminal ideas about the development of animal species, resulting in what is now generally known as evolution.

Scientific or technical works like Geographical Distribution can present special challenges to the Distributed Proofreaders volunteers who work on them. This post explores some of those challenges.

The Distributed Proofreaders Process

Distributed Proofreaders volunteers acquire scanned images of public domain books either from online sources like The Internet Archive or by scanning the books manually. The scanned images for Geographical Distribution came from The Internet Archive.

The scanned page images are run through optical character recognition (OCR) software to turn them into editable text. Sometimes the resulting text contains what we call “scannos” — misinterpretations of the image by the OCR software, such as a speck on the image rendered as a period, or the word “I” rendered as a numeral 1. Under the guidance of a Project Manager, volunteers proofread the text for errors and to format it, a page at a time, in several rounds. The Distributed Proofreaders process enables many volunteers to work on the same book at the same time. Another volunteer (the post-processor) assembles the final product into a complete e-book which, after final checks for errors, is then posted to Project Gutenberg.

During the proofreading phase, many problems can be resolved easily. For example, a scanno, such as “carnage” for “carriage,” is simply corrected to match what appears in the original page image. Not all problems are small ones, though. The proofreader who encounters a more difficult problem, such as one of those discussed below, is required only to leave a note about it for future volunteers working on the text. Some proofreaders choose to go further and search reference materials, such as dictionaries, and ask for help in the project’s discussion forum or one of the specialised forums at Distributed Proofreaders.

While many projects at Distributed Proofreaders are straightforward, others present challenges like poor printing, resulting in poor scan quality and therefore errors in the raw text; antiquated language found in older texts; many or large tables of data, etc. The object is to determine the author’s true intention and reflect that in the final product.

Proofreading Geographical Distribution

From May to October, 2016, Distributed Proofreaders volunteers worked on the first volume, resolving (or attempting to resolve) several thorny issues, communicating with each other and the Project Manager in the Project Discussion.

This text had good quality scans, with very few typographical, spelling or grammar issues. The challenges lay in the fact that it was a deeply technical work with specialised biological terminology. Here are some of the interactions volunteers had in the Project Discussion.

Differentiating between æ and œ ligatures

With the clear scans, it was generally easy to distinguish between æ and œ ligatures. But the original printer apparently had some trouble doing so when working from the author’s manuscript. Misreading of the ligatures led to subsequent mistakes that were easily perpetuated in the rest of the work, even by such a scrupulous authority as Wallace.  Of course, in extenuation, the Internet age has made it much easier to check doubtful cases than it was in Wallace’s day.

One volunteer’s research could not determine whether Cænyra was a typo for the more likely Cœnyra. My researches led to the Bulletin of the British Museum (Natural History), Entomology Supplement 9 (1967), where Francis Hemming states that Cænyra is “an incorrect subsequent spelling of Cœnyra.”

Both Turacœna and Turacæna occur twice in Volume I, but Turacœna is not italicised on two pages, which makes it much easier to identify. Turacœna also appears twice, and Turacæna not at all, in Volume II. Volume II includes the index, and Wallace states in the errata for Volume II that misspellings have been corrected in the Index.  These facts make Turacœna the people’s choice.

Typographic or spelling errors

A very rare typographic error in Geographical Distribution is Wallace’s reference to “the living three-handed armadillos” for three-banded armadillos.

There is a reference, with a clear connection to kingfishers, to the genus Halycon. Exhaustive, in-depth research (even using dead-tree books on my shelf) suggested that it is a long-standing error which had been perpetuated. The genus, in my humble informed opinion, should be Halcyon (as Wallace has it in the second volume, as well as several times in the first volume). In other words, a rare typo.

When a typesetter uses the upside-down letter n, it will turn into the confusing letter u, as in Otiorhynchus vs. Otiorhyuchus. Which is correct? I go for the confusing u with n theory, as rhynchus in Greek refers to nose, beak or snout, and rhyuchus is not a sensible construction. This is where familiarity with Latin or Greek roots saves the day.

But if one sees the word drougo, knowing about drongo, or finds the word scink which is usually spelled skink nowadays; there is considerable doubt. Is it an older version, or a typo? Why does he have Ethiopian, except for the one occurrence of Ethiopean?

Sometimes the puzzle is intractable without a true subject specialist’s advice.  For instance, is it Ptilornis or Ptilorhis or even Ptiloris? Ptilorhis appears to be a late misspelling; but a Ptiloris exists; and Ptilornis ends with the root of ornithology.

Dealing with typos is, of course, the real elephant (Loxodonta africana or Elephas maximus) in the room. There are two kinds of taxonomists: lumpers and splitters. The splitters at one time had about a dozen elephant species; nowadays the lumpers are in the ascendance, and we have only two. Just in case you wondered.

One of the volunteers documented a few variations in spelling or typography: honey-sucker and honeysucker; king-fisher (in the index) and kingfisher (everywhere else); wood-pecker once, elsewhere woodpecker; aerial or aërial… The list goes on. It is for the post-processor ultimately to make the final decision about standardising such variations, but sharp-eyed proofreaders can help by leaving notes about their observations.

Scientific nomenclature

The system of naming organisms with a genus name followed by a species name is  universal, if complicated. This was never completely stable, and some tough  investigations had to be undertaken to decide which version (where the volumes had  more than one) was to be accepted.

A Distributed Proofreaders volunteer agonises: “How do you feel about Wallace’s occasional habit … to start species name with a capital letter? For me, it seems [to] violate everything I’ve learned about scientific names.… Have the rules regarding capitals been different, earlier?”

Wikipedia has an interesting article about binomial nomenclature, with links to more information.  It appears that for animals, the rule was changed to make species’ names start with a lower-case letter, a change that only happened many years later for plants.

Nowadays the rule is explicit and rigid — the genus starts with a capital and the species with a lower-case letter. In the old days there were many different rules at different times, so in the case of this project, we must follow Wallace’s usage.

Hyphenating biological names

I had to leave a general note about end-of-line hyphens splitting biological names. “Whenever I find one I check the name; but in any case, these are extremely rarely hyphenated, so please don’t put the hyphens back in unless you are absolutely certain!”

Rewards of Distributed Proofreading

Understanding historical context

Working with old and unusual material which might be otherwise unobtainable frequently supplies a context for current ideas. One example is Wallace’s puzzlement about the strange and sometimes anomalous animal habitats he found. I can’t help thinking how delighted he would have been to hear about continental drift, explained by plate tectonics, the theory which the South African geologist Alexander du Toit put on a solid footing after Alfred Wegener first floated the idea in 1912, decades after Geographical Distribution was published. This quote from Wallace illustrates my meaning perfectly:

Should we ever arrive at a fair knowledge of the physical changes that have resulted in the present condition, we shall almost certainly find that many of the differences and anomalies of their existing fauna and flora will be accounted for.

Understanding the author’s character

Wallace, like many naturalists, collected insects, including beetles. As he explained:

[These] families comprise the extensive series of ground beetles (Carabidæ) containing about 9,000 species, and the Longicorns, which are nearly as numerous and surpass them in variety of form and colour as well as in beauty. The Cetoniidæ and Buprestidæ are among the largest and most brilliant of beetles; the Lucanidæ are pre-eminent for remarkable form, and the Cicindelidæ for elegance; and all the families are especial favourites with entomologists, so that the whole earth has been ransacked to procure fresh species.

Results deduced from a study of these will, therefore, fairly represent the phenomena of distribution of Coleoptera, and, as they are very varied in their habits, perhaps of insects in general.

I am reminded of J.B.S Haldane, who was a British scientific polymath of the early 20th Century. It is variously reported that his reply to a question by a theologian whether anything could be concluded about the Creator from the study of natural history was “an inordinate fondness for beetles.”

Making texts accessible to all

Apart from the new things we Distributed Proofreaders volunteers learn every day from working on public domain projects, we have the great satisfaction of “preserving history one page at a time” and introducing new readers to the rewards of great old books like this one.

This post was contributed by Bess Richfield, a Distributed Proofreaders volunteer.


The Proofreading Quizzes

February 1, 2019

I am one of the thousands of volunteers at Distributed Proofreaders. We’re Distributed because we’re located in different places all over the globe and we’re Proofreaders because we read text looking for errors. We turn out-of-copyright printed books into electronic eBooks, which have selectable/searchable text and which are also suitable for text-to-speech software, and then make those eBooks available to all, for free, via Project Gutenberg.

Once we have a scanned image of a page from a printed book, we run Optical Character Recognition (OCR) software on it to turn the image of text into actual editable text. The OCR accuracy is good, but tends to still leave many mistakes (what we call “scannos”) in the created text. We then, in multiple passes, verify the OCR’s results.

In striving towards a high quality for the finished eBooks we aim for a consistent result from all the many different volunteers. This is achieved by following a set of Proofreading Guidelines which explain what to change and how to do it.

And to help people familiarize themselves with the Guidelines, we have a set of Proofreading Quizzes and Tutorials. These act as an instructional aid for people to learn what to do and also as an ongoing refresher course, as it is strongly recommended that all volunteers redo the Proofreading Quizzes every six months or so.

The Proofreading Quizzes start with the basics and gradually introduce more and more elements, covering what to do with things found in easier books through to quite hard and challenging books. Each quiz is accompanied by a brief tutorial which explains everything one needs to know to complete the quiz.

Part of such a scanned image of a page from a printed book might look like this:

quizimage

and the OCR software may have generated for it the text:

quiz_rawscan

We then compare that OCR generated text with the scanned image of the printed page and correct any mistakes which the OCR made to have the text be the same as in the image:

quiz_corrected

It’s very much like those spot-the-differences types of games/puzzles. Whilst Proofreading we ignore things like italics and just verify the text has the correct characters. Layout and style issues, such as italics, are dealt with in later Formatting rounds of our process.

The quiz process lets volunteers actually try their hands at proofreading as they work through the quizzes and tutorials. And it provides the answers online in an automated way — you don’t have to wait for feedback.

Here’s a little quiz to start you off:

  • Do you have an attention to detail?
  • Do you like those spot-the-differences games?
  • Do you like learning new things and facing new challenges?

If you have answered yes to these questions, you may enjoy being a Proofreader at Distributed Proofreaders. Try the Proofreading Quizzes and find out!

This post was contributed by FallenArchangel, a DP volunteer.


Preserving the Past … For the Future

January 1, 2019

Preserving the Past … For the Future … One Dig at a Time

archaeologyLooking forward to another day at the archaeology dig. Putting on the coffee and getting breakfast. Water containers to be filled with fresh water — it’s going to be HOT today, so need to take extra. Grabbing some food to throw into my pack along with the water. A trip to the barn to check on my animals — fresh water, everyone looks good. Throwing my pack into my vehicle and away I go!

Need to dig carefully — looks like someone broke a clay pot — all in pieces — and each piece needs to be carefully extracted from the soil. The pot will be reconstructed in the lab at a future time. Notes, notes, notes, never ending — this is the important stuff — keeping track of soil changes, artifacts found, any “stains” in the soil that may be the remains of poles holding up ancient structures. Here’s some rock debris — someone chipping away on a precious piece of rock to make a projectile point, scrapper, or other implement. Each piece of rock must be collected and labeled carefully. Some charcoal here — an ancient fire pit, rock-lined — need to photograph and draw a rough sketch. Wonder what they were cooking: deer? rabbit? fish? Maybe some of the potsherds from the broken clay pot can be sent out for protein analysis.

One never knows what is going to be found at a dig — but each little bit tells the story of the past and must be carefully preserved for future generations.

I’m very dirty and very tired and mosquito-eaten — but it’s been a good day and I feel great!

Preserving the Past … For the Future … One Page at a Time

That’s what I did as an archaeologist volunteer — but it’s not so very different from what I do as a Distributed Proofreaders volunteer.

Getting up in the morning and turning on the computer before doing anything else. Putting on the coffee and grabbing some breakfast. Logging into Distributed Proofreaders.

What shall be read today? Sometimes science, sometimes travel, sometimes anthropology, sometimes just choosing something different that I never even considered reading. Every book is important — the 5-page books to the 1,000-page books. The religious books — books of poems — science books — fictional books — travel books — music books — medical books — all interesting and need to be carefully proofed.

Here’s a book on engineering — wonder what sorts of things engineers were working on way back then? Another on an African tribe — a culture different from mine — thinking and doing things according to their needs and wants — wonder what they would think of Western culture? And another book on ocean biology — maybe will read this one for a while. All those Latin names of shells and sea creatures — they require a reader’s full attention. Here’s another book on submarines — somewhat technical — think I’ll read this next. Some math formulae and engineering terms — wonder how submarines have changed from past times to today?

Never know what books will be in the queue to be proofed but every one is important, each book tells a story of the past and must be meticulously proofed, formatted and preserved for future generations.

My back hurts, I need more coffee, my eyes are glazing over — but it’s been a good day and I feel great!

This post was contributed by eyecrochet, a DP volunteer.

The DP Blog wishes all its readers a very happy and healthy New Year!


A Spell of Proofing

December 1, 2017

proofreader_cropped“I have some free time. I get to proof!” Proofing (as we call proofreading at Distributed Proofreaders) is relaxing. I get into a flow where time and place disappear and I am just in the page — in the zone.

“What shall I proof today? The project I have been chipping away at, a page at a time, has moved on. Oh, this project that I’ve been dipping into appears to be stuck in the round. What’s stopping it? Ah, it’s a page with a lot of Greek on it. I don’t think I can leave that page better than I found it. I’ll leave it for someone else.” Perhaps I’ll post about it in the Greek Team forum.

“Look, here’s a book someone proofed up to the Table of Contents (ToC).” I enjoy proofing ToCs because they often hold a few missed errors. “See — that page number is 33, not 38. It’s a bit obscure, but since the next entry is for page 35, it’s likely 33.” I’ll leave a note.

33[**38]

“Ooh look, it’s one of those old-fashioned detailed ToC entries that lists out subjects covered in the chapter separated by dashes. This line starts with a dash so the dash and the word following it need to move up to the prior line. The word is followed by a dash so that needs to move up too.” I change:

porches–rocking chairs–stoops
–steps–lazy conversation–sunset

to

porches–rocking chairs–stoops–steps–lazy
conversation–sunset

“The post-processor is going to have fun with that!”

I’m at the bottom of the page. Let me hit WordCheck (DP’s version of spellcheck). “Hunh. I didn’t notice ‘explain’ was mis-typeset ‘explarn’. I’d better exit and add a note.”

explarn[**explain]

I return to WordCheck. “Looks good.” Save and close.

“I’ve wrapped up the ToC and Illustrations pages. I’m not really interested in the content of this project. What else is available?”

“Oh, I see a novel, a Western. That should have different types of errors to seek out and find.”

I open a page. “Ugh — dialect. I’ll do just this page then find something else.” But dialect means dialogue. Dialogue often means quotation marks misplaced in the text — often mis-spaced ones or ones attached to the speaker instead of the conversation. “Yep, there’s one.”

he said,” Bring that thar hoss over hyar.”

I change that to:

he said, “Bring that thar hoss over hyar.”

Novels, juveniles, and Westerns often seem to have the worst typesetting: missing or misplaced quotation marks, missing periods at the ends of sentences, misspellings. They’re laced with dialect that at times makes reading and understanding the intended word difficult at best.

Speaking of reading: There’s proofing and there’s reading. It really helps to do both to find errors — but not at the same time. “Oh, this is really interesting.” “I didn’t know that.” “What happens next?” Sliding from proofing to reading can mean my eyes gloss over errors, unconsciously mentally fixing instances where a word is repeated, not noticing misplaced quotation marks, but still laser-focusing on typos, incorrect word usage and lack of continuity. Proofing to match letter and punctuation marks can mean I miss the typo because the letters match. These are all important errors to catch. Making separate reading passes and proofing passes as the page is open can help me find different kinds of errors. Muddling both into a single pass risks missing things.

“What? My free hour is up? How can that be? I just got started!”

This post was contributed by WebRover, a Distributed Proofreaders volunteer.


The Typesetters, the Proofreaders, and the Scribes

February 1, 2017

scribeAt Distributed Proofreaders, we are all volunteers. We are under no time pressure to proof a certain number of pages, lines or characters. When we check out a page, we can take our careful time to complete it.

We can choose a character-dense page of mind-numbing lists of soldier’s names, ship’s crews, or index pages. We are free to select character-light pages of poetry, children’s tales or plays. Of course these come with their own challenges such as punctuation, dialogue with matching quotes or stage directions. We can pick technical manuals with footnotes, history with side notes, or  science with Latin biology names. We can switch back and forth to chip away at a tedious book interspersed with pages from a comedy or travelogue.

Every so often though, I stop and think about the original typesetters.

They didn’t get to pick their subject material, their deadline or their quota. They worked upside-down and backwards. They didn’t get to sit in their own home in their chosen desk set-up, with armchair, large screen, laptop or other comforts. Though we find errors in the texts that they set, many books contain very few of these errors. When I pause between tedious pages, I wonder how they did it.

Beyond the paycheck, what motivated them to set type on the nth day of the nnth page of a book that consisted mostly of lists, or indices? Even for text that would be more interesting to the typesetter, the thought of them having to complete a certain number of pages in a given day to meet a printing deadline is just impressive.

printing pressI know many have jobs today that require repetitive activities. But how many are so detail-oriented, with no automation, that leave a permanent record of how attentive you were vs. how much you were thinking about lunch? Maybe it was easier to review and go back and fix errors than I picture it to be. Maybe they got so they could set type automatically and be able to think of other things or converse.

When I’m proofing a challenging page, I sometimes think of that person who put those letters together for that page. I realize my task is so much easier. If I want I can stop after that page and hope some other proofer will do a page or two before I pick up that project again.  I can stop, eat dinner, and come back tomorrow to finish the page when I’m fresh.

I imagine a man standing at a workbench with his frames of letters and numbers and punctuation at one side, picking out the type one by one, hoping that the “I” box doesn’t contain a misplaced “l” or “1.” I see him possibly thinking about how much easier life is for him than it was for the medieval scribe. The scribe was working on a page for days, weeks, even months, one hand-drawn character at a time. I see the typesetter appreciating how much improved his own life is and how much more available his work makes books to his current readers. And I smile as I see him smile.

This post was contributed by WebRover, a DP volunteer.


Proofing with Maps

August 8, 2015

While proofing for Distributed Proofreaders, I often find myself opening up a mapping application to locate rivers, towns, buildings, forts, streets, etc. that are mentioned, described, or central to a project.  Sometimes it’s to figure out where they are. Sometimes it’s to try and see what’s being described.

map

For example, Early Western Travels, 1748-1846, Volume XXIII, describes some rock formations that the footnote identified as being in Dawson and Valley Counties, Montana. Using that information, I was able to view a photo of the rock formations. I’ve also found remote tiny towns that still exist in the American West — one even had a preserved historical district.

Florizel’s Folly (in progress at DP) led me to Brighton, EnglandYellowstone’s Living Geology: Earthquakes and Mountains (also in progress) to Old Faithful.

I posted in the DP forums about this and found another proofreader who was using mapping software to locate parks that were mentioned in old bird books as locations of certain birds. This person was interested in whether the parks have the same birds.

Of course, I look at maps because I love maps. So starting with a specific reference point from a book, I can get lost for half an hour or more exploring, envisioning, and virtually visiting. Anywhere. And how exciting when I get a chance to visit in person a site I’ve visited before via mapping software; for example, the Pony Express Statue in Sacramento Old Town.

If you haven’t tried this before, do! You may find yourself addicted.

This post was contributed by WebRover, a DP volunteer.


%d bloggers like this: