The April Baby’s Book of Tunes

May 1, 2019

This post is published in celebration of the 100th anniversary of Children’s Book Week, April 29 to May 5, 2019.

Greenaway illustration

Distributed Proofreaders volunteers have always loved preserving children’s books, from the famous to the obscure. Hot off the Press has highlighted quite a few of our outstanding e-book versions of works for young people, such as L. Frank Baum’s Oz books, popular juvenile series starring the likes of Nancy Drew and the Hardy Boys, dime-novel series like Motor Matt, and books illustrated by beloved artists like Walter Crane. But these are just a tiny fraction of the total: Distributed Proofreaders has contributed over 3,700 children’s books to Project Gutenberg.

One very recent example is The April Baby’s Book of Tunes, published in 1900. It tells the story of three little girls in Germany who are stuck indoors during an unexpected April snowstorm just before Easter. Their mother entertains them by setting a variety of well-known English nursery rhymes to music.

Though credited only as “the author of Elizabeth and her German Garden,” the author was Elizabeth von Arnim, then known as the Countess von Arnim-Schlagenthin. Born Mary Annette Beauchamp in Australia but raised in England, she married a Prussian count in 1891. They had five children (three of whom were the models for the little girls in The April Baby’s Book of Tunes), but the marriage was not a happy one. The count’s propensity for racking up debts eventually led to his being imprisoned for fraud. This in turn led to her writing her successful semi-autobiographical novel, Elizabeth and Her German Garden (1898), under the pseudonym “Elizabeth.” She wrote some 20 books, mainly novels; two of them, The Enchanted April (1922) and Mr. Skeffington (1940), were made into popular films. After the count’s death in 1910, a turbulent affair with H.G. Wells, and another unhappy marriage, this time to the 2nd Earl Russell (whom she satirized in her 1921 novel Vera), she led a peripatetic life that took her all over Europe and the United States. She died in Charleston, South Carolina, in 1941.

Perhaps the most delightful part of The April Baby’s Book of Tunes are its 16 charming color illustrations by the great children’s book artist Kate Greenaway. It was one of Greenaway’s last published works; she tragically died in 1901 of breast cancer at the age of 55. She was justly famous for her use of vibrant color in depicting beautiful children in beautiful surroundings, and The April Baby’s Book of Tunes is no exception.

As if that weren’t enough, the book features 10 little songs, prettily arranged for voice and piano (presumably by von Arnim herself; the composer is not credited). In the HTML version at Project Gutenberg, you can click on links to hear the music and download the notation if you wish.

The April Baby’s Book of Tunes is “Elizabeth’s” only children’s book, but it’s a lovely example of the genre. It could not fail to be, with Kate Greenaway’s entrancing illustrations.

This post was contributed by Linda Cantoni, a Distributed Proofreaders volunteer.

Celebrating 37,000 Titles

April 16, 2019

Distributed Proofreaders celebrates the 37,000th title it has posted to Project Gutenberg, French Painting of the 19th Century in the National Gallery of Art. Congratulations and thanks to all the Distributed Proofreaders volunteers who worked on it.

french_painting_cover_blogAmong the joys for those who love both art and books are museum publications featuring their collections or exhibitions. Distributed Proofreaders’ 37,000th title, French Painting of the 19th Century in the National Gallery of Art is an excellent example of the delights available in this form. It’s a short booklet, just 43 pages, but it’s filled with lovely color plates of 16 selections from the National Gallery of Art in Washington, D.C.

The National Gallery of Art was the brainchild of Pittsburgh banker and industrialist Andrew Mellon.  He had begun collecting art in the 1890s on the advice of his friend Henry Clay Frick, whose own vast collection later became a very fine New York City museum. In 1936, Mellon — who was then embroiled in tax difficulties — approached the Roosevelt Administration with an offer to build a national art gallery, to be formed from his personal art collection and maintained by the U.S. Government with the help of a substantial financial endowment. Mellon never saw the gallery completed, however; he died in 1937, and the gallery opened in 1941.

The French Paintings booklet approaches the art chronologically, from the neoclassical work of Jacques-Louis David — famous for his portraits of Napoleon, one of which is included in the booklet — to modernists like Auguste Renoir, whose Girl with a Watering Can graces the cover. Each color plate is accompanied by a short description of the painting and its place in art history, as well as the donor’s identity (the majority were donated by New York banker Chester Dale).

Museum publications like this one were designed to make art — or history or science — more accessible to museum visitors. Distributed Proofreaders and Project Gutenberg make publications like this accessible worldwide to anyone with access to a computer, tablet, or smartphone, including people who may never see a museum in person.

This post was contributed by Linda Cantoni, a Distributed Proofreaders volunteer.

Proofreading a Technical Text

April 1, 2019



Distributed Proofreaders recently made Alfred Russel Wallace’s two-volume book The Geographical Distribution of Animals (1876) available for free download from Project Gutenberg (Volume I and Volume II).

Wallace and fellow naturalist Charles Darwin not only were colleagues in their researches, but also collaboratively originated seminal ideas about the development of animal species, resulting in what is now generally known as evolution.

Scientific or technical works like Geographical Distribution can present special challenges to the Distributed Proofreaders volunteers who work on them. This post explores some of those challenges.

The Distributed Proofreaders Process

Distributed Proofreaders volunteers acquire scanned images of public domain books either from online sources like The Internet Archive or by scanning the books manually. The scanned images for Geographical Distribution came from The Internet Archive.

The scanned page images are run through optical character recognition (OCR) software to turn them into editable text. Sometimes the resulting text contains what we call “scannos” — misinterpretations of the image by the OCR software, such as a speck on the image rendered as a period, or the word “I” rendered as a numeral 1. Under the guidance of a Project Manager, volunteers proofread the text for errors and to format it, a page at a time, in several rounds. The Distributed Proofreaders process enables many volunteers to work on the same book at the same time. Another volunteer (the post-processor) assembles the final product into a complete e-book which, after final checks for errors, is then posted to Project Gutenberg.

During the proofreading phase, many problems can be resolved easily. For example, a scanno, such as “carnage” for “carriage,” is simply corrected to match what appears in the original page image. Not all problems are small ones, though. The proofreader who encounters a more difficult problem, such as one of those discussed below, is required only to leave a note about it for future volunteers working on the text. Some proofreaders choose to go further and search reference materials, such as dictionaries, and ask for help in the project’s discussion forum or one of the specialised forums at Distributed Proofreaders.

While many projects at Distributed Proofreaders are straightforward, others present challenges like poor printing, resulting in poor scan quality and therefore errors in the raw text; antiquated language found in older texts; many or large tables of data, etc. The object is to determine the author’s true intention and reflect that in the final product.

Proofreading Geographical Distribution

From May to October, 2016, Distributed Proofreaders volunteers worked on the first volume, resolving (or attempting to resolve) several thorny issues, communicating with each other and the Project Manager in the Project Discussion.

This text had good quality scans, with very few typographical, spelling or grammar issues. The challenges lay in the fact that it was a deeply technical work with specialised biological terminology. Here are some of the interactions volunteers had in the Project Discussion.

Differentiating between æ and œ ligatures

With the clear scans, it was generally easy to distinguish between æ and œ ligatures. But the original printer apparently had some trouble doing so when working from the author’s manuscript. Misreading of the ligatures led to subsequent mistakes that were easily perpetuated in the rest of the work, even by such a scrupulous authority as Wallace.  Of course, in extenuation, the Internet age has made it much easier to check doubtful cases than it was in Wallace’s day.

One volunteer’s research could not determine whether Cænyra was a typo for the more likely Cœnyra. My researches led to the Bulletin of the British Museum (Natural History), Entomology Supplement 9 (1967), where Francis Hemming states that Cænyra is “an incorrect subsequent spelling of Cœnyra.”

Both Turacœna and Turacæna occur twice in Volume I, but Turacœna is not italicised on two pages, which makes it much easier to identify. Turacœna also appears twice, and Turacæna not at all, in Volume II. Volume II includes the index, and Wallace states in the errata for Volume II that misspellings have been corrected in the Index.  These facts make Turacœna the people’s choice.

Typographic or spelling errors

A very rare typographic error in Geographical Distribution is Wallace’s reference to “the living three-handed armadillos” for three-banded armadillos.

There is a reference, with a clear connection to kingfishers, to the genus Halycon. Exhaustive, in-depth research (even using dead-tree books on my shelf) suggested that it is a long-standing error which had been perpetuated. The genus, in my humble informed opinion, should be Halcyon (as Wallace has it in the second volume, as well as several times in the first volume). In other words, a rare typo.

When a typesetter uses the upside-down letter n, it will turn into the confusing letter u, as in Otiorhynchus vs. Otiorhyuchus. Which is correct? I go for the confusing u with n theory, as rhynchus in Greek refers to nose, beak or snout, and rhyuchus is not a sensible construction. This is where familiarity with Latin or Greek roots saves the day.

But if one sees the word drougo, knowing about drongo, or finds the word scink which is usually spelled skink nowadays; there is considerable doubt. Is it an older version, or a typo? Why does he have Ethiopian, except for the one occurrence of Ethiopean?

Sometimes the puzzle is intractable without a true subject specialist’s advice.  For instance, is it Ptilornis or Ptilorhis or even Ptiloris? Ptilorhis appears to be a late misspelling; but a Ptiloris exists; and Ptilornis ends with the root of ornithology.

Dealing with typos is, of course, the real elephant (Loxodonta africana or Elephas maximus) in the room. There are two kinds of taxonomists: lumpers and splitters. The splitters at one time had about a dozen elephant species; nowadays the lumpers are in the ascendance, and we have only two. Just in case you wondered.

One of the volunteers documented a few variations in spelling or typography: honey-sucker and honeysucker; king-fisher (in the index) and kingfisher (everywhere else); wood-pecker once, elsewhere woodpecker; aerial or aërial… The list goes on. It is for the post-processor ultimately to make the final decision about standardising such variations, but sharp-eyed proofreaders can help by leaving notes about their observations.

Scientific nomenclature

The system of naming organisms with a genus name followed by a species name is  universal, if complicated. This was never completely stable, and some tough  investigations had to be undertaken to decide which version (where the volumes had  more than one) was to be accepted.

A Distributed Proofreaders volunteer agonises: “How do you feel about Wallace’s occasional habit … to start species name with a capital letter? For me, it seems [to] violate everything I’ve learned about scientific names.… Have the rules regarding capitals been different, earlier?”

Wikipedia has an interesting article about binomial nomenclature, with links to more information.  It appears that for animals, the rule was changed to make species’ names start with a lower-case letter, a change that only happened many years later for plants.

Nowadays the rule is explicit and rigid — the genus starts with a capital and the species with a lower-case letter. In the old days there were many different rules at different times, so in the case of this project, we must follow Wallace’s usage.

Hyphenating biological names

I had to leave a general note about end-of-line hyphens splitting biological names. “Whenever I find one I check the name; but in any case, these are extremely rarely hyphenated, so please don’t put the hyphens back in unless you are absolutely certain!”

Rewards of Distributed Proofreading

Understanding historical context

Working with old and unusual material which might be otherwise unobtainable frequently supplies a context for current ideas. One example is Wallace’s puzzlement about the strange and sometimes anomalous animal habitats he found. I can’t help thinking how delighted he would have been to hear about continental drift, explained by plate tectonics, the theory which the South African geologist Alexander du Toit put on a solid footing after Alfred Wegener first floated the idea in 1912, decades after Geographical Distribution was published. This quote from Wallace illustrates my meaning perfectly:

Should we ever arrive at a fair knowledge of the physical changes that have resulted in the present condition, we shall almost certainly find that many of the differences and anomalies of their existing fauna and flora will be accounted for.

Understanding the author’s character

Wallace, like many naturalists, collected insects, including beetles. As he explained:

[These] families comprise the extensive series of ground beetles (Carabidæ) containing about 9,000 species, and the Longicorns, which are nearly as numerous and surpass them in variety of form and colour as well as in beauty. The Cetoniidæ and Buprestidæ are among the largest and most brilliant of beetles; the Lucanidæ are pre-eminent for remarkable form, and the Cicindelidæ for elegance; and all the families are especial favourites with entomologists, so that the whole earth has been ransacked to procure fresh species.

Results deduced from a study of these will, therefore, fairly represent the phenomena of distribution of Coleoptera, and, as they are very varied in their habits, perhaps of insects in general.

I am reminded of J.B.S Haldane, who was a British scientific polymath of the early 20th Century. It is variously reported that his reply to a question by a theologian whether anything could be concluded about the Creator from the study of natural history was “an inordinate fondness for beetles.”

Making texts accessible to all

Apart from the new things we Distributed Proofreaders volunteers learn every day from working on public domain projects, we have the great satisfaction of “preserving history one page at a time” and introducing new readers to the rewards of great old books like this one.

This post was contributed by Bess Richfield, a Distributed Proofreaders volunteer.

Printer Credits

March 1, 2019

Colophon of the Aldine Press

Having grown up in an age of ever increasingly sophisticated advertising, I’m amazed at the poor way printers represented their work in the printers’ credits I see in the older books we work on at Distributed Proofreaders.

Printers’ credits? I’m not sure what the proper name is. I’m speaking of the two or three lines near the beginning or end of a book where the printer’s name, address, and any other information appear.

I’m not sure of the purpose of this information appearing in a book. Perhaps it helps support a publisher’s claim that the book has been published in a particular country and is therefore protected by that country’s copyright laws. It doesn’t appear to be in every book, so it can’t have been a legal requirement to include it. Printing companies presumably had established reputations. Some may have been known for quick typesetting and turnaround, some for low prices, some for accurate technical work or quality binding. Publishers must have needed to know these reputations to select an appropriate printer for a particular job. I hope publishers weren’t depending on the two or three lines of printed identification to identify the quality of a potential printer’s work.

No, I don’t think this would have been the way printers were presenting themselves to publishing houses for future business. I expect printing companies advertised to publishers in some other way to establish business relationships. However, the printer’s credit in a book is the only way the printers are known to the reading public. In any case, I’d think they’d want to put their best foot, er, font and printing quality forward! I’d think they’d want to at least meet the quality of the rest of the book they printed.

In the early days of European printing, printers were often also the publisher, editor, bookseller, and even author or translator. The famous Aldine Press of Venice was one such printing firm, engaging giants of humanism such as Erasmus to produce translations of ancient classics imprinted with its distinctive dolphin colophon.

By the 19th Century, with specialization, the printer’s role had become separate, and the colophons belonged to the publishers. In some cases, printers may have wanted to omit a printer credit to avoid potential prosecution for printing banned or pirated books. But books containing printers’ credits that are poorly printed neither protect the printer from prosecution nor present the printer’s capabilities in a favorable light.

The printed book itself demonstrates the printer’s quality of work, but I’d expect the information naming the printing firm in the book to represent their best work. Printers’ credits first caught my eye as representing the worst example of printing in a book! I often questioned whether this blob of text had been originally printed in a book or just rubber-stamped crookedly after the fact onto the page.

As I found myself looking to see how poorly the printers’ credit looked in each book, my impression was that they typically use the worst font and have the blurriest impression of anything in the book. It’s as if they were trying to make the credit look as bad as they could. Besides the rubber-stamp look, the printing impression often looks incomplete, part of letters missing or blurred. Blobs of ink fill in the open spots of letters. But don’t just take my word for it. Here are some examples.

Let’s start with a few that have the rubber-stamp look:

printer credit

printer credit

printer credit

Here’s one with an unusual layout:

printer credit

Here’s one squeezed at the bottom of the last index page of a book, unevenly printed on a single line:

printer credit


Finally, here are a couple of quality printer’s credits to be the exception:



Perhaps now you’ll find yourself looking to see just how poor an impression these printers make! Or maybe you’ll find the high-quality entry that proves the exception.

This post was contributed by WebRover, a DP volunteer.

The Proofreading Quizzes

February 1, 2019

I am one of the thousands of volunteers at Distributed Proofreaders. We’re Distributed because we’re located in different places all over the globe and we’re Proofreaders because we read text looking for errors. We turn out-of-copyright printed books into electronic eBooks, which have selectable/searchable text and which are also suitable for text-to-speech software, and then make those eBooks available to all, for free, via Project Gutenberg.

Once we have a scanned image of a page from a printed book, we run Optical Character Recognition (OCR) software on it to turn the image of text into actual editable text. The OCR accuracy is good, but tends to still leave many mistakes (what we call “scannos”) in the created text. We then, in multiple passes, verify the OCR’s results.

In striving towards a high quality for the finished eBooks we aim for a consistent result from all the many different volunteers. This is achieved by following a set of Proofreading Guidelines which explain what to change and how to do it.

And to help people familiarize themselves with the Guidelines, we have a set of Proofreading Quizzes and Tutorials. These act as an instructional aid for people to learn what to do and also as an ongoing refresher course, as it is strongly recommended that all volunteers redo the Proofreading Quizzes every six months or so.

The Proofreading Quizzes start with the basics and gradually introduce more and more elements, covering what to do with things found in easier books through to quite hard and challenging books. Each quiz is accompanied by a brief tutorial which explains everything one needs to know to complete the quiz.

Part of such a scanned image of a page from a printed book might look like this:


and the OCR software may have generated for it the text:


We then compare that OCR generated text with the scanned image of the printed page and correct any mistakes which the OCR made to have the text be the same as in the image:


It’s very much like those spot-the-differences types of games/puzzles. Whilst Proofreading we ignore things like italics and just verify the text has the correct characters. Layout and style issues, such as italics, are dealt with in later Formatting rounds of our process.

The quiz process lets volunteers actually try their hands at proofreading as they work through the quizzes and tutorials. And it provides the answers online in an automated way — you don’t have to wait for feedback.

Here’s a little quiz to start you off:

  • Do you have an attention to detail?
  • Do you like those spot-the-differences games?
  • Do you like learning new things and facing new challenges?

If you have answered yes to these questions, you may enjoy being a Proofreader at Distributed Proofreaders. Try the Proofreading Quizzes and find out!

This post was contributed by FallenArchangel, a DP volunteer.

Preserving the Past … For the Future

January 1, 2019

Preserving the Past … For the Future … One Dig at a Time

archaeologyLooking forward to another day at the archaeology dig. Putting on the coffee and getting breakfast. Water containers to be filled with fresh water — it’s going to be HOT today, so need to take extra. Grabbing some food to throw into my pack along with the water. A trip to the barn to check on my animals — fresh water, everyone looks good. Throwing my pack into my vehicle and away I go!

Need to dig carefully — looks like someone broke a clay pot — all in pieces — and each piece needs to be carefully extracted from the soil. The pot will be reconstructed in the lab at a future time. Notes, notes, notes, never ending — this is the important stuff — keeping track of soil changes, artifacts found, any “stains” in the soil that may be the remains of poles holding up ancient structures. Here’s some rock debris — someone chipping away on a precious piece of rock to make a projectile point, scrapper, or other implement. Each piece of rock must be collected and labeled carefully. Some charcoal here — an ancient fire pit, rock-lined — need to photograph and draw a rough sketch. Wonder what they were cooking: deer? rabbit? fish? Maybe some of the potsherds from the broken clay pot can be sent out for protein analysis.

One never knows what is going to be found at a dig — but each little bit tells the story of the past and must be carefully preserved for future generations.

I’m very dirty and very tired and mosquito-eaten — but it’s been a good day and I feel great!

Preserving the Past … For the Future … One Page at a Time

That’s what I did as an archaeologist volunteer — but it’s not so very different from what I do as a Distributed Proofreaders volunteer.

Getting up in the morning and turning on the computer before doing anything else. Putting on the coffee and grabbing some breakfast. Logging into Distributed Proofreaders.

What shall be read today? Sometimes science, sometimes travel, sometimes anthropology, sometimes just choosing something different that I never even considered reading. Every book is important — the 5-page books to the 1,000-page books. The religious books — books of poems — science books — fictional books — travel books — music books — medical books — all interesting and need to be carefully proofed.

Here’s a book on engineering — wonder what sorts of things engineers were working on way back then? Another on an African tribe — a culture different from mine — thinking and doing things according to their needs and wants — wonder what they would think of Western culture? And another book on ocean biology — maybe will read this one for a while. All those Latin names of shells and sea creatures — they require a reader’s full attention. Here’s another book on submarines — somewhat technical — think I’ll read this next. Some math formulae and engineering terms — wonder how submarines have changed from past times to today?

Never know what books will be in the queue to be proofed but every one is important, each book tells a story of the past and must be meticulously proofed, formatted and preserved for future generations.

My back hurts, I need more coffee, my eyes are glazing over — but it’s been a good day and I feel great!

This post was contributed by eyecrochet, a DP volunteer.

The DP Blog wishes all its readers a very happy and healthy New Year!

Crossword: Uncle Wiggily

December 1, 2018

Enjoy the holiday season with a crossword puzzle based on Uncle Wiggily’s Squirt Gun, a humorous illustrated children’s book of the early 20th Century, provided to Project Gutenberg by the volunteers at Distributed Proofreaders.

wiggily_crossword_grid image

In order to solve the puzzle, first read the book — it’s easy and amusing — then decide how you want to proceed:

  • Use the interactive version. Just click on a blank square and the corresponding clue pops up. Type in the answer and click OK (or, if you’re stumped, click the Solve button). Clicking the Check Puzzle button at the bottom gives the number of errors and incomplete words, if you want to see how you’re getting on.
  • Download the printable PDF version and print out the puzzle to solve it the old-fashioned way, with your favorite writing implement. Check your solution with the PDF answer key. No peeking! (But who’s to know?)

Happy Puzzling!

This crossword was created by FallenArchangel, a Distributed Proofreaders volunteer, using the free EclipseCrossword app.

