Memoirs of a Post-Processor

December 1, 2016

When I joined Distributed Proofreaders, I started with proofing and formatting as I assume most people do. I came across some fascinating snippets that I would never have read otherwise, but found I lacked the patient attention required to do a good job. This left smooth reading, at which I knew I would be hopeless, or post-processing, so post-processing it was. It probably helped that I had been a programmer in a previous existence, but I started, foolishly, with more difficult books but benefited from some very helpful and supportive post-processing verifiers.

With a hundred or so books and counting, why do I do it? Certainly to make the books available again. It produces a useful product: To ensure that the work of the many proofers, formatters and smooth readers is not in vain. But I do it mostly for the satisfaction of seeing a working ebook emerge from the collection of plain text and images I start with.

I do not specialise and the sheer range and variety of books available never fails to amaze. My reading of 18th and 19th Century books never got much beyond Sheridan le Fanu and Wilkie Collins, so I have come across a world of literature – biography, humour, philosophy, religion, science, poetry, history, fiction – which I did not realise even existed. Thank you, DP!

Glastonbury Abbey

Glastonbury Abbey, from The Gate of Remembrance

One highlight is The Gate of Remembrance, subtitled, “The Story of the Psychological Experiment which Resulted in the Discovery of the Edgar Chapel at Glastonbury,” by the noted British architect, archaeologist, and psychical researcher Frederick Bligh Bond. This is a detailed account of an experiment with automatic writing over a period of years, which, the author maintains, led to the discovery of the lost chapels at Glastonbury. The author shows total confidence in the method and tackles criticism head-on in the preface. Written in a more credulous (or perhaps less hidebound) age, it is hard to imagine any serious architect or archaeologist even contemplating this approach today. Publishing would now be instant professional suicide. I am very sceptical, yet, it seemed to work, and the sincerity of the protagonists seems beyond doubt.

A completist approach to science  writing was still feasible in the late 19th Century and surprisingly popular. Now no author could possibly undertake such a venture and no publisher would consider it. Popular Scientific Recreations, by French scientist Gaston Tissandier, probably published in the 1880s, at 770 pages and 900 illustrations covering every scientific discipline known from Astronomy to Zoology, via parlour games, is an example of the genre. While it is incomplete and sometimes incorrect, it is remarkably comprehensive and up to date. Given that the author developed and flew an electric dirigible (how far has electric manned flight developed in the last 130 years!) this should not be surprising.

Another example is Outlines of Creation, by Elisha Noyce, a more modest 1858 publication of 340 pages, limiting itself to astronomy, geology and life. While unequivocally creationist in outlook, it presents the scientific evidence comprehensively and  comprehensibly, and is beautifully illustrated. As the author explained in the Introduction,

The want of a general knowledge of those works of the Great Creator which are constantly spread out before us, in these days of easy acquirement, amounts almost to a sin, for it is by the study of Nature in all her varied forms and associations, that we learn to “look from Nature up to Nature’s God;” for who can look upon the works of God without a feeling of awe and admiration?

Staying positive and not dwelling on the horrors of war, vivisection, adulteration of food, primitive medicine, etc., I enjoyed the gentle humour and the depiction of genteel life in the books by American author John Habberton, Helen’s Babies and Budge and Toddie. Concerning  the generally disruptive adventures of the two toddlers and, perhaps, marred by the author’s excessive use of baby talk, they are very light, but enlivened by excellent illustrations.

A final curio, for those who might want definitions of futtocks, dead rising, spirketing, breast hooks and many more, is A Naval Expositor by Thomas Riley Blanckley, a dictionary of naval terms from 1750.

This post was contributed by throth, a DP volunteer.

Comments That Matter!

September 1, 2016

DP logo“Thank you for working on this project.”  There I was, a new member of Distributed Proofreaders, tentatively asking what I was sure was a stupid question. I was sure that the answer would be glaringly obvious in the proofreading guidelines, but that I’d totally missed it. How nice to get a gentle answer and “Thank you for working on this project.” Or “Thanks for asking.”  Wow!  These were comments that mattered. These comments encouraged me to come back!

So I came back. I found the forum. I posted there. Back came comments. Recognizing that I was new, people said, “Welcome to DP!”  I got validation that the “diff” (i.e., change) that someone made to my edited page did not mean I’d made a mistake. Sometimes changes are made because of ambiguity. Sometimes different people interpret the same wording differently. Sometimes I understood the guidelines and the person after me did not. “Welcome to DP!” “Your questions matter.”  “Thanks for asking.” These are comments that make a difference!

The managers of the projects (mostly books) that we work on create project comments. They tell us a little about the book or the author. They emphasize items in the Guidelines that we will see in the project and need to deal with. They point out things that are not in the guidelines that may cause questions and provide answers before we need to ask. They may ask us to do something a little different than the usual in this one project. From these comments we decide if this is the right project for us to work on.  These are comments that matter!

In the Forums we post about Distributed Proofreaders aspects we care about. There’s change we want, functionality we want, Guidelines we want changed, Guidelines we want clarified, Guidelines we have different opinions on, language support we want, where we believe we need to focus efforts, where we feel we’re bogging down, what we have resources for, what we don’t. Because we care, we’re passionate. What we comment matters. How we comment matters!

Comments that welcome us. Comments that guide us. Comments that appreciate our efforts. Comments that push us to grow. Comments that help us as we each strive to leave each page better than we found it. These are comments that matter! These are comments made by volunteers who matter!

This post was contributed by WebRover, a DP volunteer.

These are a few of my favourite things

August 1, 2016

A while ago I wrote about Smooth-Reading and the variety of things I’ve read while doing it. I thought that, today, I’d mention a handful of those books in more detail. These were ones that, for some reason or another, really stood out for me.

Picture of Poisonous Mushrooms

Poisonous Mushrooms of the Genus Amanita

One was a brilliant book called Mushrooms of America, Edible and Poisonous, with absolutely beautiful pictures, like the one to the left. It told you how to tell which ones are poisonous—and how to cook the ones that aren’t. It was the first one that I felt compelled to download, just so I could look at the pictures again. I also have a weakness for recipe books, so it fed another of my guilty pleasures at the same time. (By the way, don’t try eating these ones, they’re definitely not good for you.)

Another pick of the bunch was Jacko and Jumpo Kinkytail (The Funny Monkey Boys), a collection of 31 bedtime stories for young children by Howard R. Garis, with extremely surreal endings. This is one of a series of books by the same author, many of which are now available on Project Gutenberg. One chapter ending goes:

Now next I’m going to tell you about the Kinkytails and the doll’s house—that is, if the alarm clock will stop making figures all over my paper so I can write the story, and if the coffee pot doesn’t step on the rolling pin’s toes.

Well, things must have worked out OK because the next story was, indeed, as advertised. The stories themselves contained a distinctly surreal universe. I think the children to whom these stories were read probably grew up to be imaginative and inventive individuals, or possibly horribly disturbed. It’s a bit of a coin toss as to which.

Then there was the excellent and very entertaining Stanley in Africa. The Wonderful Discoveries and Thrilling Adventures of the Great African Explorer and Other Travelers, Pioneers and Missionaries. Stanley was the man who found Dr. Livingstone, who’d been lost in the African Interior for a while, and greeted him with the immortal words “Dr. Livingstone, I presume”. It is, of course, typical of its time (1889) and there are things in it, such as the casual racism, that we cringe at nowadays. It does, however, contain some useful information on what to do if attacked by an ostrich.

Ostrich farming is a large industry in these South African States. […] The birds are innocent and stupid looking, but can attack with great ferocity, and strike very powerfully with their feet. The only safe posture under attack by them is to lie down. They then can only trample on you.

OK. I’ll try and remember that if I encounter any.

An out-and-out winner was the final two volumes of The Paston Letters, a collection of letters, wills and other documents relating to an influential Norfolk family between 1422 and 1509. Most of the letters are either written in haste, or to be delivered in haste, and they tell you not just about the things you get in history books, but also domestic issues and family quarrels. One spat occurs when a daughter of the family, Margery, gets engaged to a man without her family’s knowledge or permission and her mother refuses to have her in the house or to speak to her again—categorising her as a loose woman. The marriage goes ahead, but I suspect they’d planned a more advantageous match for her than one of their employees. She is very noticeably omitted from her mother’s will, in which everyone and their grandmothers are left something. At one point they seem very short of money, their letters are full of requests to each other for cash, and replies saying “don’t ask me, I don’t have any money.” Their wills are extraordinarily detailed—individual bequests include mattresses and specific pieces of bedding. They lived at quite a turbulent period, kings come and go, including Richard III—to this day depicted as a hunched and limping villain who killed his nephews. They seem to have come round, financially, by the time we leave them in 1509, but I have no doubt they continued to have their ups and downs.

Another favourite was a book on copyright law from 1902. I have spent much of my working life in places where copyright and intellectual property are hot topics. So the subject caught my eye and I picked it up to read. Written by a lawyer, for other lawyers, A Treatise upon the Law of Copyright in the United Kingdom and the Dominions of the Crown, and in the United States of America, by a barrister named E.J. MacGillivray, was fascinating reading, and surprisingly accessible to a lay person such as myself. It rounded up the development of the idea of copyright and all the laws that had sought to provide protection to writers and other creative artists, together with clarifications obtained during a huge number of court cases. The author’s aim was simple:

The foundations of this work were laid by my endeavours to understand what is perhaps the most complicated and obscure series of statutes in the statute book. In working from time to time at the Law of Copyright I found great want of a textbook which should be exhaustive of the case law, and at the same time contain a concise and clearly arranged epitome of the statutory provisions. This want I have tried to supply for myself in the present compilation, and it is now published in the hope that it may prove useful to others.

I think he succeeded admirably, and although the law will have changed in the ensuing century, it’s still an excellent summary of the development of copyright protection in the UK, the USA and various places that were part of the British empire.

I can’t recommend Smooth-Reading enough as a fun thing to do with your time. Right now there are dozens of books, ranging across novels for adults and children, history, drama, science, and others.

Ooohh, choices, choices. What shall I pick first?

We are time travelers

March 1, 2016

detail from Allegory of Vanity, by Antonio de PeredaAs part of Distributed Proofreaders, yes, we are time travelers. We get glimpses of time. This is not just because we work on history books written in their own time. We also get a contemporary perspective on technology, science, medicine, entertainment, vocabulary, attitudes, personalities, language, and more from a different time.

Because public domain is restricted in America to items that were published before 1923, with a few exceptions, we spend most of our proofing time in the past. We’re not hopping back and forth between something using current vocabulary and spelling vs. something older. We stay in the past. As a result when we see words like burthen, intrust, inclosure, to-day, skilful, musquitoes, &c., we begin to recognize them as accurate for the time and see them as old friends. We know that a receipt is a recipe. We learn that children are encouraged to work on projects with knives, saws, chemicals and the like (see this review of The Boy Craftsman for a good example of that). Oh yes, we are in another time.

We discover references to some event that was common knowledge at the time a book was published, and that knowledge is lost now. Several books with stories of early US western travel refer to stopping at the farm of early missionaries, the Whitmans. And then they end that portion of the tale with, “of course that was before the massacre.” Apparently any reader of the time knew about the Whitman massacre.

On any given day, volunteers at DP have the opportunity to travel to many times and many places. To-day my choices include: Nagaland in northeast India, in ancient times as viewed in the late 1800s and early 1900s; Mexico in the early 1500s; Königgrätz, Bohemia, in 1866; print shops in the US and England in 1888; England from 1066 to 1154; English towns in the 1400s; Russia from 1903 to 1912; Napoleonic War experiences during the period 1804-1814, compiled before 1815, in a book originally published in 1839 that nearly disappeared before being republished in this edition in 1902. This example demonstrates that there are the times in the books and the times of the books—when they are about vs. when they were written, published and printed. In a single volume we can go back to at least three times: the time being presented, the time it was written—which impacts word choice, perspective and attitude—and the time it was published—which impacts typesetting, font, characters (like long s) and layout.

Come and join us. Where else can you make such a trip through time!

This post was contributed by WebRover, a DP volunteer.

The life of a book at Distributed Proofreaders

January 1, 2016

This post walks through the life of a book at DP from its beginnings as a physical book to its final form as a beautiful ePub, using Uncle Wiggily’s Auto Sled by Howard Roger Garis, recently posted to Project Gutenberg (eBook number 50405), as a study.

Aside: I didn’t help with this particular book in any way, but rather selected it based on its length, language, beautiful illustrations, and wonderful example of a final ePub.


Selecting a book

The process begins when a volunteer (usually referred to as a Content Provider) finds a book they want as an eBook. They first have to get a clearance from Project Gutenberg Literary Archive Foundation (PGLAF) that the book is in the public domain, and legal to be reproduced. and Project Gutenberg are both in the United States and thus must adhere to US Copyright law. DP and PG sites hosted in other countries are able to work on and host books that are in the public domain in their respective countries, but aren’t in the public domain in the US.

Figuring out if a book is in the public domain can be oddly complicated — which is why we leave it to the professionals at PGLAF — but a general rule of thumb is that if it was published in the US before 1923, it’s probably in the public domain in the US.

Uncle Wiggily’s is copyright 1922, so just barely under the wire.

Getting the initial text

After receiving clearance, the volunteer either scans the book in or finds the page images from Google BooksThe Internet Archive (usually through their OpenLibrary site), or a slew of other image providers. The images will likely need some level of cleaning to deskew or despeckle them after being scanned in. The images are then run through OCR software to get an initial, raw copy of the text.

Page images of Uncle Wiggily’s were obtained from Google Books.

Note that Google Books and The Internet Archive stop here — eBooks you download from them contain only the text obtained from OCR. PDFs contain the page images with the underlying OCR available for selection and searching. The Internet Archive provides an ePub format, but it’s of the raw OCR text — not a pleasant reading experience.

At DP, this is just the first step in the process of refining and creating an eBook.

Loading the book into DP

Once the page images and text are available, a Project Manager will take up the mantle and guide the book (referred to as a project) through DP. Note that the Project Manager may have acted as Content Provider as well, may have been asked by the Content Provider to manage the book, or may have found the project on one of DP’s internal lists of available scans ready for adoption.

Either way, the Project Manager will create a new project at DP for the book (e.g., Uncle Wiggily’s project page). They’ll fill in a slew of metadata about the project so that proofreaders will be able to find it. This includes information like the name, author, the language the book is written in, and its genre. They will then add the page images and text.

Unleash the proofreaders!

Up until now the process hasn’t been very distributed and may, in fact, have all be done by a single individual. But now that the book has been loaded and is ready for proofreading, many people can work on it at once.

The book starts out in P1, the first proofreading round. Proofreading volunteers can select any book available in this round and start proofreading pages. How they select which project to work on is completely up to them. They might browse the list of all available projects in the round or search for those matching a specific genre and/or language.

Once they find a project and click on ‘Start Proofreading,’ they are presented with an interface that shows the page image and the text. Their job is straightforward: make the text match the image and follow some basic proofreading guidelines. After they make whatever changes they think are necessary to the text, they save the page and can either get a new page from the project or stop proofreading. Other volunteers may be working on the book at the same time, each on a separate page.

After all pages have been proofread, the project is moved into two other proofreading rounds in series: P2 and P3. While any volunteer can proofread books in P1, the subsequent rounds have entrance criteria to ensure each level has ever-increasing proofreading experience and critical eyes.

The time it takes to go through the proofreading rounds can vary from minutes to years depending on the size of the book, the complexity of the pages, the quality of the initial OCR, and most importantly, how many volunteers are interested in working on it!

Uncle Wiggily’s meagre 33 pages soared through all three proofreading rounds in 4.5 hours.

Formatting: a bold move

Proofreading focuses on the page text, not how it’s formatted — that’s for the F1 and F2 formatting rounds. It’s in these rounds that all formatting happens, including things like bold, italics, and underlining, as well as marking poetry and other non-paragraph text for when the book is combined back together. These rounds are also fully distributed and, not surprisingly, there’s a set of formatting guidelines as well.

Uncle Wiggily’s completed both formatting rounds in roughly 12 hours.

Stitching them all back up again

Now that the pages have been proofread and formatted, they wait for a Post-Processor to pick them up and stick them together into their final form. The Project Manager may perform this step, or it may be someone else. The Post-Processor will do a wide range of sanity checks on the text to ensure consistency, merge hyphenated words that break across pages, and many other bits. They’ll create at least a plain-text version of the book for uploading to Project Gutenberg. Nowadays HTML versions are also very common and are further used to make ePubs for eBook readers.

Books like Uncle Wiggily with illustrations require even more care. Unlike page texts that are often scanned in at a relatively low resolution in black and white, illustrations are often in color and always at a higher resolution. Post-Processors will take great care in cropping, color balancing, and doing other image processing on the illustrations before including them in the HTML and ePub versions.

Smoooooooth reading

Often, but not always, Post-Processors will submit the books to what is called the smooth reading round. This is an opportunity for people to read the book as a book, but with a careful eye to anything that looks amiss. Humans are great at noticing when things are not quite right, and what a better way to do it than reading the book! If the reader spies something amiss they can let the Post-Processor know and have it corrected.

Posted to Project Gutenberg

Now that the eBook is completed, it’s posted to Project Gutenberg! Each eBook gets a unique number from Project Gutenberg which is recorded in the DP project record.

Uncle Wiggily’s Auto Sled was given number 50405 and was posted in several different formats:

Every book posted from DP includes a credit line in the text that recognizes the Project Manager and Post-Processor individually and the team at DP as a collective. If the images were sourced from another provider, they are also recognized in the credit line.

Uncle Wiggily’s credit line looks like this:

E-text prepared by David Edwards, Emmy, and the Online Distributed Proofreading Team ( from page images generously made available by the Google Books Library Project (

Preserving history, one page at a time

As you can see, there are many different ways to help create an eBook as a DP volunteer. The best thing about DP is that you can do only the parts you enjoy and only as much of those parts as you enjoy.

Interested in helping a book on its journey? It’s easy to get started as a proofreader — just:

  1. Create an account at DP
  2. After you register, find a project and start proofreading!

Or you can smooth read a book without even creating an account.

Proofing with Maps

August 8, 2015

While proofing for Distributed Proofreaders, I often find myself opening up a mapping application to locate rivers, towns, buildings, forts, streets, etc. that are mentioned, described, or central to a project.  Sometimes it’s to figure out where they are. Sometimes it’s to try and see what’s being described.


For example, Early Western Travels, 1748-1846, Volume XXIII, describes some rock formations that the footnote identified as being in Dawson and Valley Counties, Montana. Using that information, I was able to view a photo of the rock formations. I’ve also found remote tiny towns that still exist in the American West — one even had a preserved historical district.

Florizel’s Folly (in progress at DP) led me to Brighton, EnglandYellowstone’s Living Geology: Earthquakes and Mountains (also in progress) to Old Faithful.

I posted in the DP forums about this and found another proofreader who was using mapping software to locate parks that were mentioned in old bird books as locations of certain birds. This person was interested in whether the parks have the same birds.

Of course, I look at maps because I love maps. So starting with a specific reference point from a book, I can get lost for half an hour or more exploring, envisioning, and virtually visiting. Anywhere. And how exciting when I get a chance to visit in person a site I’ve visited before via mapping software; for example, the Pony Express Statue in Sacramento Old Town.

If you haven’t tried this before, do! You may find yourself addicted.

This post was contributed by WebRover, a DP volunteer.

Sunday School Stories

April 4, 2015

Maybee’s Stepping Stones by Archie Fell is a book of Sunday school stories for each week of the year. As I read it, I experienced a wide range of emotions — love, kindness, patience, life, death, naughtiness, guilt, fear, consequences, tolerance, forgiveness, family, community, happiness, sorrow, adversity, hope, loneliness, sadness, joy….


I gasped with alarm when Dick shot himself; when Tryphosa was overcome with the fire. I wanted to cry when Dick lay in the woods unheard, when Phosy and Aunty McFane became ill, and I rejoiced when Mrs. Harte and Bill Finnegan went to the Sabbath School, and when Dan Harte resolved to overcome his addiction to alcohol. I shared the children’s frustrations as they struggled with doing the right thing, and smiled unashamedly when their good deeds worked near miracles.

The stories may be old-fashioned, and based on Christianity, but the lessons are for us all, whether we believe in a god or not, whether our deeds are in person or via social media, whether we are young or old. We can all put out a hand in comfort and together we can grow in strength no matter what our trials and tribulations.

She had just been reading a chapter in the Bible out loud, and Aunty McFane said there was a promise for every ache she had. Isn’t it funny,” he  continued, turning to Miss Marvin, “that folks just as different as can be find exactly what they want in the Bible? — Maybee’s Stepping Stones, page 224.

Reading these stories, I couldn’t help but reminisce about when I was a little girl going to Sunday school.

Denomination meant nothing to us so the church we attended was the one within walking distance — I think it was Presbyterian. Our parents didn’t seem particularly religious, but they did make us go to Sunday school. Our father had in mind that if we weren’t christened it would be easier for us if we wanted to marry someone of strong faith in a particular church.

I never did work out my father’s beliefs. I suspect my mother was quite devout, although I did not know her to go to church, and she didn’t speak about religion much. She did go to a Catholic primary school — she had me shocked and in fits of laughter when she told me of the time she had to stand in front of an open fire with a piece of soap in her mouth because she had sworn at the nuns.

…  then she tried scrubbing the inside of his mouth with soap-suds — Maybee’s Stepping Stones, page 19.

My sister only recently told me the story of her second son who, at age six, when admonished for swearing, was threatened with a similar fate of having his mouth washed out with soap. The little boy went to the bathroom, grabbed some soap, foamed it up in his mouth, and went out to his mother and said, “Now I can swear.” I think there’s quite a bit of my mother’s determined spirit in both my sister and my nephew. The same son said to my sister the other week: “Do what you want, mother, you will anyway.”

My mother also told the story of a family member who was a Major in the Salvation Army. I heard her say many times that only the good die young. And I learnt that she had a very difficult time accepting the death of a daughter before I was born.

Upon the pine coffin, the girls in Miss Cox’s class laid a wreath of beautiful hot-house flowers; but all over the lid, and inside, around the pale face and over the white robe, were fresh, fragrant pond-lilies, their subtile perfume filling the room. — Maybee’s Stepping Stones, page 149.

We had Sunday School stories, much like those told in Maybee’s Stepping Stones. We collected a stamp for each story lesson we attended. When our stamp sheet was full, we were presented with a little book.

We had our “Sunday best” clothes, and how we did love dressing up, putting on our delicate little dresses with ribbons and bows, and polishing our little shoes. Going to Sunday school was exciting and something to look forward to. It added a purpose to our lives, spiritual and social.

But she made her appearance, bright and early, Sabbath morning, comparatively quite docile, submitted to be washed, shampooed, braided, and ruffled, with a most martyr-like air, and came out from the process not so very unlike the five other girls, among whom Say seated her, with such a happy look in her own blue eyes. Just to see her sitting there more than repaid the trouble. — Maybee’s Stepping Stones, page 106.

Our Sunday school was at the back of the church in a prefabricated corrugated steel “Nissen hut” like those used for temporary accommodation during the war years. The building is still there but it is no longer a church, and the hut has been replaced by a brick addition attached to the main building.

I mentioned above it was within walking distance. Back then, there was a church nearby almost everywhere. I thought about this in recent years when a neighbour who had become almost housebound because of poor vision and declining mobility told me that one of the things she missed most was being able to walk to church. Her old church building was still there, too, at the end of the street where she had lived most of her adult life, along with the convent buildings that had been converted first to a school, and then to an art gallery, and now left to crumble. The nearest church for her was now on the other side of town. Buses don’t run on Sundays in this small community so, with few friends or family interested in taking her to church, she had only television services to comfort her.

So much inward soul searching from a little children’s book — literary merit?… Well, the stories stand up to the test of time, is all I can conclude.

This post was contributed by a DP volunteer.

%d bloggers like this: