Celebrating 37,000 Titles

April 16, 2019

Distributed Proofreaders celebrates the 37,000th title it has posted to Project Gutenberg, French Painting of the 19th Century in the National Gallery of Art. Congratulations and thanks to all the Distributed Proofreaders volunteers who worked on it.

french_painting_cover_blogAmong the joys for those who love both art and books are museum publications featuring their collections or exhibitions. Distributed Proofreaders’ 37,000th title, French Painting of the 19th Century in the National Gallery of Art is an excellent example of the delights available in this form. It’s a short booklet, just 43 pages, but it’s filled with lovely color plates of 16 selections from the National Gallery of Art in Washington, D.C.

The National Gallery of Art was the brainchild of Pittsburgh banker and industrialist Andrew Mellon.  He had begun collecting art in the 1890s on the advice of his friend Henry Clay Frick, whose own vast collection later became a very fine New York City museum. In 1936, Mellon — who was then embroiled in tax difficulties — approached the Roosevelt Administration with an offer to build a national art gallery, to be formed from his personal art collection and maintained by the U.S. Government with the help of a substantial financial endowment. Mellon never saw the gallery completed, however; he died in 1937, and the gallery opened in 1941.

The French Paintings booklet approaches the art chronologically, from the neoclassical work of Jacques-Louis David — famous for his portraits of Napoleon, one of which is included in the booklet — to modernists like Auguste Renoir, whose Girl with a Watering Can graces the cover. Each color plate is accompanied by a short description of the painting and its place in art history, as well as the donor’s identity (the majority were donated by New York banker Chester Dale).

Museum publications like this one were designed to make art — or history or science — more accessible to museum visitors. Distributed Proofreaders and Project Gutenberg make publications like this accessible worldwide to anyone with access to a computer, tablet, or smartphone, including people who may never see a museum in person.

This post was contributed by Linda Cantoni, a Distributed Proofreaders volunteer.


Proofreading a Technical Text

April 1, 2019

geodistribmap

Introduction

Distributed Proofreaders recently made Alfred Russel Wallace’s two-volume book The Geographical Distribution of Animals (1876) available for free download from Project Gutenberg (Volume I and Volume II).

Wallace and fellow naturalist Charles Darwin not only were colleagues in their researches, but also collaboratively originated seminal ideas about the development of animal species, resulting in what is now generally known as evolution.

Scientific or technical works like Geographical Distribution can present special challenges to the Distributed Proofreaders volunteers who work on them. This post explores some of those challenges.

The Distributed Proofreaders Process

Distributed Proofreaders volunteers acquire scanned images of public domain books either from online sources like The Internet Archive or by scanning the books manually. The scanned images for Geographical Distribution came from The Internet Archive.

The scanned page images are run through optical character recognition (OCR) software to turn them into editable text. Sometimes the resulting text contains what we call “scannos” — misinterpretations of the image by the OCR software, such as a speck on the image rendered as a period, or the word “I” rendered as a numeral 1. Under the guidance of a Project Manager, volunteers proofread the text for errors and to format it, a page at a time, in several rounds. The Distributed Proofreaders process enables many volunteers to work on the same book at the same time. Another volunteer (the post-processor) assembles the final product into a complete e-book which, after final checks for errors, is then posted to Project Gutenberg.

During the proofreading phase, many problems can be resolved easily. For example, a scanno, such as “carnage” for “carriage,” is simply corrected to match what appears in the original page image. Not all problems are small ones, though. The proofreader who encounters a more difficult problem, such as one of those discussed below, is required only to leave a note about it for future volunteers working on the text. Some proofreaders choose to go further and search reference materials, such as dictionaries, and ask for help in the project’s discussion forum or one of the specialised forums at Distributed Proofreaders.

While many projects at Distributed Proofreaders are straightforward, others present challenges like poor printing, resulting in poor scan quality and therefore errors in the raw text; antiquated language found in older texts; many or large tables of data, etc. The object is to determine the author’s true intention and reflect that in the final product.

Proofreading Geographical Distribution

From May to October, 2016, Distributed Proofreaders volunteers worked on the first volume, resolving (or attempting to resolve) several thorny issues, communicating with each other and the Project Manager in the Project Discussion.

This text had good quality scans, with very few typographical, spelling or grammar issues. The challenges lay in the fact that it was a deeply technical work with specialised biological terminology. Here are some of the interactions volunteers had in the Project Discussion.

Differentiating between æ and œ ligatures

With the clear scans, it was generally easy to distinguish between æ and œ ligatures. But the original printer apparently had some trouble doing so when working from the author’s manuscript. Misreading of the ligatures led to subsequent mistakes that were easily perpetuated in the rest of the work, even by such a scrupulous authority as Wallace.  Of course, in extenuation, the Internet age has made it much easier to check doubtful cases than it was in Wallace’s day.

One volunteer’s research could not determine whether Cænyra was a typo for the more likely Cœnyra. My researches led to the Bulletin of the British Museum (Natural History), Entomology Supplement 9 (1967), where Francis Hemming states that Cænyra is “an incorrect subsequent spelling of Cœnyra.”

Both Turacœna and Turacæna occur twice in Volume I, but Turacœna is not italicised on two pages, which makes it much easier to identify. Turacœna also appears twice, and Turacæna not at all, in Volume II. Volume II includes the index, and Wallace states in the errata for Volume II that misspellings have been corrected in the Index.  These facts make Turacœna the people’s choice.

Typographic or spelling errors

A very rare typographic error in Geographical Distribution is Wallace’s reference to “the living three-handed armadillos” for three-banded armadillos.

There is a reference, with a clear connection to kingfishers, to the genus Halycon. Exhaustive, in-depth research (even using dead-tree books on my shelf) suggested that it is a long-standing error which had been perpetuated. The genus, in my humble informed opinion, should be Halcyon (as Wallace has it in the second volume, as well as several times in the first volume). In other words, a rare typo.

When a typesetter uses the upside-down letter n, it will turn into the confusing letter u, as in Otiorhynchus vs. Otiorhyuchus. Which is correct? I go for the confusing u with n theory, as rhynchus in Greek refers to nose, beak or snout, and rhyuchus is not a sensible construction. This is where familiarity with Latin or Greek roots saves the day.

But if one sees the word drougo, knowing about drongo, or finds the word scink which is usually spelled skink nowadays; there is considerable doubt. Is it an older version, or a typo? Why does he have Ethiopian, except for the one occurrence of Ethiopean?

Sometimes the puzzle is intractable without a true subject specialist’s advice.  For instance, is it Ptilornis or Ptilorhis or even Ptiloris? Ptilorhis appears to be a late misspelling; but a Ptiloris exists; and Ptilornis ends with the root of ornithology.

Dealing with typos is, of course, the real elephant (Loxodonta africana or Elephas maximus) in the room. There are two kinds of taxonomists: lumpers and splitters. The splitters at one time had about a dozen elephant species; nowadays the lumpers are in the ascendance, and we have only two. Just in case you wondered.

One of the volunteers documented a few variations in spelling or typography: honey-sucker and honeysucker; king-fisher (in the index) and kingfisher (everywhere else); wood-pecker once, elsewhere woodpecker; aerial or aërial… The list goes on. It is for the post-processor ultimately to make the final decision about standardising such variations, but sharp-eyed proofreaders can help by leaving notes about their observations.

Scientific nomenclature

The system of naming organisms with a genus name followed by a species name is  universal, if complicated. This was never completely stable, and some tough  investigations had to be undertaken to decide which version (where the volumes had  more than one) was to be accepted.

A Distributed Proofreaders volunteer agonises: “How do you feel about Wallace’s occasional habit … to start species name with a capital letter? For me, it seems [to] violate everything I’ve learned about scientific names.… Have the rules regarding capitals been different, earlier?”

Wikipedia has an interesting article about binomial nomenclature, with links to more information.  It appears that for animals, the rule was changed to make species’ names start with a lower-case letter, a change that only happened many years later for plants.

Nowadays the rule is explicit and rigid — the genus starts with a capital and the species with a lower-case letter. In the old days there were many different rules at different times, so in the case of this project, we must follow Wallace’s usage.

Hyphenating biological names

I had to leave a general note about end-of-line hyphens splitting biological names. “Whenever I find one I check the name; but in any case, these are extremely rarely hyphenated, so please don’t put the hyphens back in unless you are absolutely certain!”

Rewards of Distributed Proofreading

Understanding historical context

Working with old and unusual material which might be otherwise unobtainable frequently supplies a context for current ideas. One example is Wallace’s puzzlement about the strange and sometimes anomalous animal habitats he found. I can’t help thinking how delighted he would have been to hear about continental drift, explained by plate tectonics, the theory which the South African geologist Alexander du Toit put on a solid footing after Alfred Wegener first floated the idea in 1912, decades after Geographical Distribution was published. This quote from Wallace illustrates my meaning perfectly:

Should we ever arrive at a fair knowledge of the physical changes that have resulted in the present condition, we shall almost certainly find that many of the differences and anomalies of their existing fauna and flora will be accounted for.

Understanding the author’s character

Wallace, like many naturalists, collected insects, including beetles. As he explained:

[These] families comprise the extensive series of ground beetles (Carabidæ) containing about 9,000 species, and the Longicorns, which are nearly as numerous and surpass them in variety of form and colour as well as in beauty. The Cetoniidæ and Buprestidæ are among the largest and most brilliant of beetles; the Lucanidæ are pre-eminent for remarkable form, and the Cicindelidæ for elegance; and all the families are especial favourites with entomologists, so that the whole earth has been ransacked to procure fresh species.

Results deduced from a study of these will, therefore, fairly represent the phenomena of distribution of Coleoptera, and, as they are very varied in their habits, perhaps of insects in general.

I am reminded of J.B.S Haldane, who was a British scientific polymath of the early 20th Century. It is variously reported that his reply to a question by a theologian whether anything could be concluded about the Creator from the study of natural history was “an inordinate fondness for beetles.”

Making texts accessible to all

Apart from the new things we Distributed Proofreaders volunteers learn every day from working on public domain projects, we have the great satisfaction of “preserving history one page at a time” and introducing new readers to the rewards of great old books like this one.

This post was contributed by Bess Richfield, a Distributed Proofreaders volunteer.


Printer Credits

March 1, 2019
colophon

Colophon of the Aldine Press

Having grown up in an age of ever increasingly sophisticated advertising, I’m amazed at the poor way printers represented their work in the printers’ credits I see in the older books we work on at Distributed Proofreaders.

Printers’ credits? I’m not sure what the proper name is. I’m speaking of the two or three lines near the beginning or end of a book where the printer’s name, address, and any other information appear.

I’m not sure of the purpose of this information appearing in a book. Perhaps it helps support a publisher’s claim that the book has been published in a particular country and is therefore protected by that country’s copyright laws. It doesn’t appear to be in every book, so it can’t have been a legal requirement to include it. Printing companies presumably had established reputations. Some may have been known for quick typesetting and turnaround, some for low prices, some for accurate technical work or quality binding. Publishers must have needed to know these reputations to select an appropriate printer for a particular job. I hope publishers weren’t depending on the two or three lines of printed identification to identify the quality of a potential printer’s work.

No, I don’t think this would have been the way printers were presenting themselves to publishing houses for future business. I expect printing companies advertised to publishers in some other way to establish business relationships. However, the printer’s credit in a book is the only way the printers are known to the reading public. In any case, I’d think they’d want to put their best foot, er, font and printing quality forward! I’d think they’d want to at least meet the quality of the rest of the book they printed.

In the early days of European printing, printers were often also the publisher, editor, bookseller, and even author or translator. The famous Aldine Press of Venice was one such printing firm, engaging giants of humanism such as Erasmus to produce translations of ancient classics imprinted with its distinctive dolphin colophon.

By the 19th Century, with specialization, the printer’s role had become separate, and the colophons belonged to the publishers. In some cases, printers may have wanted to omit a printer credit to avoid potential prosecution for printing banned or pirated books. But books containing printers’ credits that are poorly printed neither protect the printer from prosecution nor present the printer’s capabilities in a favorable light.

The printed book itself demonstrates the printer’s quality of work, but I’d expect the information naming the printing firm in the book to represent their best work. Printers’ credits first caught my eye as representing the worst example of printing in a book! I often questioned whether this blob of text had been originally printed in a book or just rubber-stamped crookedly after the fact onto the page.

As I found myself looking to see how poorly the printers’ credit looked in each book, my impression was that they typically use the worst font and have the blurriest impression of anything in the book. It’s as if they were trying to make the credit look as bad as they could. Besides the rubber-stamp look, the printing impression often looks incomplete, part of letters missing or blurred. Blobs of ink fill in the open spots of letters. But don’t just take my word for it. Here are some examples.

Let’s start with a few that have the rubber-stamp look:

printer credit

printer credit

printer credit

Here’s one with an unusual layout:

printer credit

Here’s one squeezed at the bottom of the last index page of a book, unevenly printed on a single line:

printer credit

 

Finally, here are a couple of quality printer’s credits to be the exception:

printercredit7

printercredit8

Perhaps now you’ll find yourself looking to see just how poor an impression these printers make! Or maybe you’ll find the high-quality entry that proves the exception.

This post was contributed by WebRover, a DP volunteer.


The Proofreading Quizzes

February 1, 2019

I am one of the thousands of volunteers at Distributed Proofreaders. We’re Distributed because we’re located in different places all over the globe and we’re Proofreaders because we read text looking for errors. We turn out-of-copyright printed books into electronic eBooks, which have selectable/searchable text and which are also suitable for text-to-speech software, and then make those eBooks available to all, for free, via Project Gutenberg.

Once we have a scanned image of a page from a printed book, we run Optical Character Recognition (OCR) software on it to turn the image of text into actual editable text. The OCR accuracy is good, but tends to still leave many mistakes (what we call “scannos”) in the created text. We then, in multiple passes, verify the OCR’s results.

In striving towards a high quality for the finished eBooks we aim for a consistent result from all the many different volunteers. This is achieved by following a set of Proofreading Guidelines which explain what to change and how to do it.

And to help people familiarize themselves with the Guidelines, we have a set of Proofreading Quizzes and Tutorials. These act as an instructional aid for people to learn what to do and also as an ongoing refresher course, as it is strongly recommended that all volunteers redo the Proofreading Quizzes every six months or so.

The Proofreading Quizzes start with the basics and gradually introduce more and more elements, covering what to do with things found in easier books through to quite hard and challenging books. Each quiz is accompanied by a brief tutorial which explains everything one needs to know to complete the quiz.

Part of such a scanned image of a page from a printed book might look like this:

quizimage

and the OCR software may have generated for it the text:

quiz_rawscan

We then compare that OCR generated text with the scanned image of the printed page and correct any mistakes which the OCR made to have the text be the same as in the image:

quiz_corrected

It’s very much like those spot-the-differences types of games/puzzles. Whilst Proofreading we ignore things like italics and just verify the text has the correct characters. Layout and style issues, such as italics, are dealt with in later Formatting rounds of our process.

The quiz process lets volunteers actually try their hands at proofreading as they work through the quizzes and tutorials. And it provides the answers online in an automated way — you don’t have to wait for feedback.

Here’s a little quiz to start you off:

  • Do you have an attention to detail?
  • Do you like those spot-the-differences games?
  • Do you like learning new things and facing new challenges?

If you have answered yes to these questions, you may enjoy being a Proofreader at Distributed Proofreaders. Try the Proofreading Quizzes and find out!

This post was contributed by FallenArchangel, a DP volunteer.


Preserving the Past … For the Future

January 1, 2019

Preserving the Past … For the Future … One Dig at a Time

archaeologyLooking forward to another day at the archaeology dig. Putting on the coffee and getting breakfast. Water containers to be filled with fresh water — it’s going to be HOT today, so need to take extra. Grabbing some food to throw into my pack along with the water. A trip to the barn to check on my animals — fresh water, everyone looks good. Throwing my pack into my vehicle and away I go!

Need to dig carefully — looks like someone broke a clay pot — all in pieces — and each piece needs to be carefully extracted from the soil. The pot will be reconstructed in the lab at a future time. Notes, notes, notes, never ending — this is the important stuff — keeping track of soil changes, artifacts found, any “stains” in the soil that may be the remains of poles holding up ancient structures. Here’s some rock debris — someone chipping away on a precious piece of rock to make a projectile point, scrapper, or other implement. Each piece of rock must be collected and labeled carefully. Some charcoal here — an ancient fire pit, rock-lined — need to photograph and draw a rough sketch. Wonder what they were cooking: deer? rabbit? fish? Maybe some of the potsherds from the broken clay pot can be sent out for protein analysis.

One never knows what is going to be found at a dig — but each little bit tells the story of the past and must be carefully preserved for future generations.

I’m very dirty and very tired and mosquito-eaten — but it’s been a good day and I feel great!

Preserving the Past … For the Future … One Page at a Time

That’s what I did as an archaeologist volunteer — but it’s not so very different from what I do as a Distributed Proofreaders volunteer.

Getting up in the morning and turning on the computer before doing anything else. Putting on the coffee and grabbing some breakfast. Logging into Distributed Proofreaders.

What shall be read today? Sometimes science, sometimes travel, sometimes anthropology, sometimes just choosing something different that I never even considered reading. Every book is important — the 5-page books to the 1,000-page books. The religious books — books of poems — science books — fictional books — travel books — music books — medical books — all interesting and need to be carefully proofed.

Here’s a book on engineering — wonder what sorts of things engineers were working on way back then? Another on an African tribe — a culture different from mine — thinking and doing things according to their needs and wants — wonder what they would think of Western culture? And another book on ocean biology — maybe will read this one for a while. All those Latin names of shells and sea creatures — they require a reader’s full attention. Here’s another book on submarines — somewhat technical — think I’ll read this next. Some math formulae and engineering terms — wonder how submarines have changed from past times to today?

Never know what books will be in the queue to be proofed but every one is important, each book tells a story of the past and must be meticulously proofed, formatted and preserved for future generations.

My back hurts, I need more coffee, my eyes are glazing over — but it’s been a good day and I feel great!

This post was contributed by eyecrochet, a DP volunteer.

The DP Blog wishes all its readers a very happy and healthy New Year!


Crossword: Uncle Wiggily

December 1, 2018

Enjoy the holiday season with a crossword puzzle based on Uncle Wiggily’s Squirt Gun, a humorous illustrated children’s book of the early 20th Century, provided to Project Gutenberg by the volunteers at Distributed Proofreaders.

wiggily_crossword_grid image

In order to solve the puzzle, first read the book — it’s easy and amusing — then decide how you want to proceed:

  • Use the interactive version. Just click on a blank square and the corresponding clue pops up. Type in the answer and click OK (or, if you’re stumped, click the Solve button). Clicking the Check Puzzle button at the bottom gives the number of errors and incomplete words, if you want to see how you’re getting on.
  • Download the printable PDF version and print out the puzzle to solve it the old-fashioned way, with your favorite writing implement. Check your solution with the PDF answer key. No peeking! (But who’s to know?)

Happy Puzzling!

This crossword was created by FallenArchangel, a Distributed Proofreaders volunteer, using the free EclipseCrossword app.

Previous Crosswords

Marjorie Dean: Marvelous Manager

The Last of the Bushrangers

 


Edith Wharton’s French Idyll

November 1, 2018

The American writer Edith Wharton (1862-1937) didn’t fully blossom as an artist until she was in her 40s. Raised in a stodgy “Old New York” family, trapped in an unfulfilling marriage, all her life she longed for personal and literary freedom. She was a dedicated bookworm from childhood, and through her avid reading she glimpsed vistas of art and knowledge that her family and husband, wealthy and cultured though they were, could not appreciate as deeply as she did.

She had been well traveled, as most of her class and era had been, and so was quite familiar with Europe from childhood. But she wasn’t at home in her own home. As Janet Flanner put it in a 1929 New Yorker profile, “For if Boston, the city of her marriage, never forgave her for having been born in New York, her New York never forgave her for having been born in New York and writing about it.”

For a time, she found a refuge in the Berkshire Hills of Massachusetts, at The Mount, an airy villa she designed herself, inspired by the European homes she had written about in The Decoration of Houses (1899) — so different from the dark, overstuffed brownstones of her New York youth. There she wrote her first great New York novel, The House of Mirth (1905); there she relaxed, in the cozy library with its tall windows overlooking the terrace and her beloved gardens, with kindred spirits like Henry James, reading aloud and talking of books and art.

Wharton’s appreciation for France began to grow during this time. In 1906 and 1907, she and her husband embarked on the auto trips that formed the basis for A Motor-Flight through France (1908). A long-time devotee of motoring, she declares at the opening of the book, “The motor-car has restored the romance of travel.” Now, instead of the ugliness of train travel, one has “the delight of taking a town unawares, stealing on it by back ways and unchronicled paths, and surprising in it some intimate aspect of past time, some silhouette hidden for half a century or more by the ugly mask of railway embankments and the iron bulk of a huge station.” The trips covered thousands of miles across the French landscape, alighting at tiny medieval towns and big cities, chateaux and cathedrals, ruins and gardens.

Back in America, the Lenox idyll began to sour. As her literary star ascended, her marriage deteriorated. Her husband philandered. She fell passionately and fruitlessly in love with the caddish journalist Morton Fullerton (“My life was better before I knew you,” she heartbreakingly wrote him after his ardor had cooled). Her husband plundered her trust fund. She was nearly 50 in 1911 when she finally left him, The Mount, and America to begin a new idyll in France.

Le Pavillon Colombe

Edith Wharton’s garden at the Pavillon Colombe outside Paris

Wharton spent the rest of her life there, finally free to live the literary life she had long yearned for. She initially established herself in Paris, where she wrote Ethan Frome (1911) and several other novels. As she put it in her autobiography, A Backward Glance (1934), “In Paris no one could live without literature, and the fact that I was a professional writer, instead of frightening my fashionable friends, interested them.”

During the First World War, Wharton worked tirelessly to provide shelter, employment, schools, and hospitals for refugees. To raise money for Belgian refugees, she organized and edited The Book of the Homeless (1915), a compilation of donated works by noted writers, artists, and composers of the day. She visited the front, chronicling the devastation she saw in Fighting France (1915). The lovely pastoral landscape she had toured by car less than a decade earlier was now a wasteland:

The country between Marne and Meuse is one of the regions on which German fury spent itself most bestially during the abominable September days. Half way between Chalons and Sainte Menehould we came on the first evidence of the invasion: the lamentable ruins of the village of Auve. These pleasant villages of the Aisne, with their one long street, their half-timbered houses and high-roofed granaries with espaliered gable-ends, are all much of one pattern, and one can easily picture what Auve must have been as it looked out, in the blue September weather, above the ripening pears of its gardens to the crops in the valley and the large landscape beyond. Now it is a mere waste of rubble and cinders, not one threshold distinguishable from another.

Though extremely busy with her war-work, which later earned her medals from both France and Belgium, she still managed to write several novels during the war, including Summer (1916) and, directly inspired by the war, The Marne (1918).

In 1919, Wharton left the bustle of post-war Paris for a quiet home on its outskirts, the Pavillon Colombe, and later took a winter home at Hyères in the south of France. In her loving tribute to the French, French Ways and Their Meaning (1919), she observed:

The French are the most human of the human race, the most completely detached from the lingering spell of the ancient shadowy world in which trees and animals talked to each other, and began the education of the fumbling beast that was to deviate into Man. They have used their longer experience and their keener senses for the joy and enlightenment of the races still agrope for self-expression. The faults of France are the faults inherent in an old and excessively self-contained civilisation; her qualities are its qualities; and the most profitable way of trying to interpret French ways and their meaning is to see how this long inheritance may benefit a people which is still, intellectually and artistically, in search of itself.

Wharton went on to write many more books, including The Age of Innocence (1920), a novel about the New York society of her youth, for which she became the first woman to win the Pulitzer Prize for fiction. She returned to the United States only a couple of times, once to receive an honorary doctorate from Yale University in 1923 — another first for a woman. There was otherwise no need to return: She had found her true idyll. She died peacefully at the Pavillon Colombe in 1937.

This post was contributed by Linda Cantoni, a Distributed Proofreaders volunteer and a member of The Mount and the Edith Wharton Society. Wharton’s time in France is chronicled in a newly published book, Edith Wharton in France (Lenox: The Mount Press/Prospecta Press, 2018), by the late Claudine Lesage, which was the source of some of the information in this post.


%d bloggers like this: