A Volunteer’s Thoughts on DP

August 1, 2017

From With a Camera in Majorca

Passing time at Distributed Proofreaders is not like working. It is for me a relaxing process that gives me many views of the world that I would have otherwise missed. I say missed because I have had neither the opportunity nor the money to travel, nor to read books as widely in my lifetime as I might have at one time wished to do. DP is a vicarious idea, where you can experience the world through books – one day a famous classic, the next maybe a few pages from a children’s book – a little adventure every day, the choices are wide. You can do as much or as little as you wish, and the tasks are variable and numerous. The wonderful world of books – maybe some are a little old-fashioned, but better late than never.

I have always lived in small villages near the sea, or on small boats, so computers were not a big thing with me. I only came to the connected world four years ago, rather late in my life, when I retired, and the village where I live had a rural wi-fi scheme installed. If I had only realized that there were sites like DP, it might have given me much greater incentive to become involved much sooner. I have always felt involved since my first day at DP. Like many other DPers, I found the site through downloading books from Project Gutenberg.

Proofing at DP is a relatively easy task, and working on so many different projects is like looking through a new window with every page that you do. Although formatting is a little more technical, the basics can be quickly learnt, and progress is made because everyone works as part of a large team. We contribute mutually, and one’s individual weaknesses are well covered by others’ combined strengths. The interaction between volunteers during this process makes it hard not to make friends, and so DP is a very friendly place to become attached to.

The bolder and more adventurous volunteers eventually progress to Post Processing, putting the projects into their final form before they are posted to PG. I quickly entered into this area and now have more than 50 books at PG from children’s books to larger and more difficult projects. I learned on the way to become quite proficient in image manipulation, especially old photographs and coloured book-plates.

Recently, I started to learn Content Providing and Project Managing. This has required further skills in OCR, and preparing and guiding the projects through the rounds. This has brought me into even closer contact with other volunteers, producing their requests and answering the inevitable questions as the books progress through the rounds. One of my recent efforts in this area is With a Camera in Majorca.

There are also important administrative jobs at DP held by Project Facilitators and “Squirrels” (the technical team who maintain the site and coding at DP, among other chores.) These tasks require experience that I have not yet acquired in my short time at DP.

Experienced volunteers who enjoy guiding new members can become Mentors and Post-Processing Verifiers. And for those who enjoy just reading, there is Smooth Reading, which, as its name implies, involves making sure that the book reads correctly in its final form and that there are no startling errors before it goes to PG.

I am very glad that I found DP. As a virtually housebound person it makes me feel useful, and the idea and the opportunity of making these books freely available at PG is a wonderful and altruistic pastime.

Please feel free to join us. I assure you that you will be made most welcome.

This post was contributed by readbueno, a DP volunteer.


Comments That Matter!

September 1, 2016

DP logo“Thank you for working on this project.”  There I was, a new member of Distributed Proofreaders, tentatively asking what I was sure was a stupid question. I was sure that the answer would be glaringly obvious in the proofreading guidelines, but that I’d totally missed it. How nice to get a gentle answer and “Thank you for working on this project.” Or “Thanks for asking.”  Wow!  These were comments that mattered. These comments encouraged me to come back!

So I came back. I found the forum. I posted there. Back came comments. Recognizing that I was new, people said, “Welcome to DP!”  I got validation that the “diff” (i.e., change) that someone made to my edited page did not mean I’d made a mistake. Sometimes changes are made because of ambiguity. Sometimes different people interpret the same wording differently. Sometimes I understood the guidelines and the person after me did not. “Welcome to DP!” “Your questions matter.”  “Thanks for asking.” These are comments that make a difference!

The managers of the projects (mostly books) that we work on create project comments. They tell us a little about the book or the author. They emphasize items in the Guidelines that we will see in the project and need to deal with. They point out things that are not in the guidelines that may cause questions and provide answers before we need to ask. They may ask us to do something a little different than the usual in this one project. From these comments we decide if this is the right project for us to work on.  These are comments that matter!

In the Forums we post about Distributed Proofreaders aspects we care about. There’s change we want, functionality we want, Guidelines we want changed, Guidelines we want clarified, Guidelines we have different opinions on, language support we want, where we believe we need to focus efforts, where we feel we’re bogging down, what we have resources for, what we don’t. Because we care, we’re passionate. What we comment matters. How we comment matters!

Comments that welcome us. Comments that guide us. Comments that appreciate our efforts. Comments that push us to grow. Comments that help us as we each strive to leave each page better than we found it. These are comments that matter! These are comments made by volunteers who matter!

This post was contributed by WebRover, a DP volunteer.

The life of a book at Distributed Proofreaders

January 1, 2016

This post walks through the life of a book at DP from its beginnings as a physical book to its final form as a beautiful ePub, using Uncle Wiggily’s Auto Sled by Howard Roger Garis, recently posted to Project Gutenberg (eBook number 50405), as a study.

Aside: I didn’t help with this particular book in any way, but rather selected it based on its length, language, beautiful illustrations, and wonderful example of a final ePub.


Selecting a book

The process begins when a volunteer (usually referred to as a Content Provider) finds a book they want as an eBook. They first have to get a clearance from Project Gutenberg Literary Archive Foundation (PGLAF) that the book is in the public domain, and legal to be reproduced. pgdp.net and Project Gutenberg are both in the United States and thus must adhere to US Copyright law. DP and PG sites hosted in other countries are able to work on and host books that are in the public domain in their respective countries, but aren’t in the public domain in the US.

Figuring out if a book is in the public domain can be oddly complicated — which is why we leave it to the professionals at PGLAF — but a general rule of thumb is that if it was published in the US before 1923, it’s probably in the public domain in the US.

Uncle Wiggily’s is copyright 1922, so just barely under the wire.

Getting the initial text

After receiving clearance, the volunteer either scans the book in or finds the page images from Google BooksThe Internet Archive (usually through their OpenLibrary site), or a slew of other image providers. The images will likely need some level of cleaning to deskew or despeckle them after being scanned in. The images are then run through OCR software to get an initial, raw copy of the text.

Page images of Uncle Wiggily’s were obtained from Google Books.

Note that Google Books and The Internet Archive stop here — eBooks you download from them contain only the text obtained from OCR. PDFs contain the page images with the underlying OCR available for selection and searching. The Internet Archive provides an ePub format, but it’s of the raw OCR text — not a pleasant reading experience.

At DP, this is just the first step in the process of refining and creating an eBook.

Loading the book into DP

Once the page images and text are available, a Project Manager will take up the mantle and guide the book (referred to as a project) through DP. Note that the Project Manager may have acted as Content Provider as well, may have been asked by the Content Provider to manage the book, or may have found the project on one of DP’s internal lists of available scans ready for adoption.

Either way, the Project Manager will create a new project at DP for the book (e.g., Uncle Wiggily’s project page). They’ll fill in a slew of metadata about the project so that proofreaders will be able to find it. This includes information like the name, author, the language the book is written in, and its genre. They will then add the page images and text.

Unleash the proofreaders!

Up until now the process hasn’t been very distributed and may, in fact, have all be done by a single individual. But now that the book has been loaded and is ready for proofreading, many people can work on it at once.

The book starts out in P1, the first proofreading round. Proofreading volunteers can select any book available in this round and start proofreading pages. How they select which project to work on is completely up to them. They might browse the list of all available projects in the round or search for those matching a specific genre and/or language.

Once they find a project and click on ‘Start Proofreading,’ they are presented with an interface that shows the page image and the text. Their job is straightforward: make the text match the image and follow some basic proofreading guidelines. After they make whatever changes they think are necessary to the text, they save the page and can either get a new page from the project or stop proofreading. Other volunteers may be working on the book at the same time, each on a separate page.

After all pages have been proofread, the project is moved into two other proofreading rounds in series: P2 and P3. While any volunteer can proofread books in P1, the subsequent rounds have entrance criteria to ensure each level has ever-increasing proofreading experience and critical eyes.

The time it takes to go through the proofreading rounds can vary from minutes to years depending on the size of the book, the complexity of the pages, the quality of the initial OCR, and most importantly, how many volunteers are interested in working on it!

Uncle Wiggily’s meagre 33 pages soared through all three proofreading rounds in 4.5 hours.

Formatting: a bold move

Proofreading focuses on the page text, not how it’s formatted — that’s for the F1 and F2 formatting rounds. It’s in these rounds that all formatting happens, including things like bold, italics, and underlining, as well as marking poetry and other non-paragraph text for when the book is combined back together. These rounds are also fully distributed and, not surprisingly, there’s a set of formatting guidelines as well.

Uncle Wiggily’s completed both formatting rounds in roughly 12 hours.

Stitching them all back up again

Now that the pages have been proofread and formatted, they wait for a Post-Processor to pick them up and stick them together into their final form. The Project Manager may perform this step, or it may be someone else. The Post-Processor will do a wide range of sanity checks on the text to ensure consistency, merge hyphenated words that break across pages, and many other bits. They’ll create at least a plain-text version of the book for uploading to Project Gutenberg. Nowadays HTML versions are also very common and are further used to make ePubs for eBook readers.

Books like Uncle Wiggily with illustrations require even more care. Unlike page texts that are often scanned in at a relatively low resolution in black and white, illustrations are often in color and always at a higher resolution. Post-Processors will take great care in cropping, color balancing, and doing other image processing on the illustrations before including them in the HTML and ePub versions.

Smoooooooth reading

Often, but not always, Post-Processors will submit the books to what is called the smooth reading round. This is an opportunity for people to read the book as a book, but with a careful eye to anything that looks amiss. Humans are great at noticing when things are not quite right, and what a better way to do it than reading the book! If the reader spies something amiss they can let the Post-Processor know and have it corrected.

Posted to Project Gutenberg

Now that the eBook is completed, it’s posted to Project Gutenberg! Each eBook gets a unique number from Project Gutenberg which is recorded in the DP project record.

Uncle Wiggily’s Auto Sled was given number 50405 and was posted in several different formats:

Every book posted from DP includes a credit line in the text that recognizes the Project Manager and Post-Processor individually and the team at DP as a collective. If the images were sourced from another provider, they are also recognized in the credit line.

Uncle Wiggily’s credit line looks like this:

E-text prepared by David Edwards, Emmy, and the Online Distributed Proofreading Team (http://www.pgdp.net) from page images generously made available by the Google Books Library Project (http://books.google.com)

Preserving history, one page at a time

As you can see, there are many different ways to help create an eBook as a DP volunteer. The best thing about DP is that you can do only the parts you enjoy and only as much of those parts as you enjoy.

Interested in helping a book on its journey? It’s easy to get started as a proofreader — just:

  1. Create an account at DP
  2. After you register, find a project and start proofreading!

Or you can smooth read a book without even creating an account.

Celebrating 30,000 Titles

July 7, 2015

30K banner

Distributed Proofreaders has reached another milestone: we’ve posted 30,000 unique titles to Project Gutenberg!

We’re celebrating this achievement with a collection of 30 works, the product of DP volunteers’ perseverance and hard work in making a wide range of books available online:

These titles are an excellent illustration of what DP volunteers do every day:  We preserve and make available, for the delight and enlightenment of readers everywhere, a broad array of books on many subjects and in many languages. Congratulations to all who made this achievement possible!

DP on Twitter

November 7, 2014


Distributed Proofreaders is on Twitter! Now you can keep up with DP’s milestones, new blog posts, and other news. Just follow @DProofreaders to stay in the know.

