A Whimsical Tour of Distributed Proofreaders

October 1, 2022

Distributed Proofreaders is 22 years old today, and we’re celebrating our anniversary with a bit of poetry. Congratulations to all the volunteers who have helped in “preserving history one page at a time” all these years!

DP = Distributed Proofreaders
PM = Project Manager
PP = Post-Processor
PG = Project Gutenberg

Calliope by Meynier

Welcome to DP; please step right on in.
The tour I shall give is about to begin.

The goal of DP is to save the old books
Where some have sat gathered in dusty ole nooks.

We find them and scan them to bring back to life;
So you can view free on your electric device.

Now let’s take a tour to observe what goes on
And capture the secrets of change undergone.

The PMs are wise folks who manage our projects.
They keep the work tidy for steps that come up next.

The next step is proofing to check for mistakes
That optical scanners leave lying in wait.

Our Proofers have keen eyes that don’t miss a beat.
Our books are proofed three times; it’s really a feat!

The Formatters labor to put things in line
So reading is easy and feels so refined.

They use special coding to line up the text.
This work is what keeps their big brain muscles flexed.

Some covers are damaged and look really ratty,
But DP has Wizards to make them look natty.

These Wizards are clever with mysterious notions
Who wave their wands deftly in all sorts of motions.

When all of the proofing and formatting’s done,
The job of the PP has only begun.

They gather and sort and they move things about
Till it’s pretty and perfect and ready, no doubt.

But wait, one more step comes before a book’s ready;
Our Smooth Readers check with their eyes fixed and steady.

Some issues crop up that we’ve not seen before,
But there’s always a Mentor with guidance galore.

The talent in all teams just boggles the mind,
Plus count on support from a Mentor who’s kind.

Believe it or not we have Squirrels running ‘round,
But they aren’t the same type in the woods to be found.

Our Squirrels are equipped with knowledge all-round,
Technicians who keep this place all safe and sound.

When DP transforms these old writings anew,
They’re sent to PG as enjoyment for you.

I think you will probably all quite agree
What a gift these books are to be offered for free!

At DP you know you will always be welcome–
Whatever your interests; wherever you’re from.

This tour is concluded but not without wishing
That you will come back, and not just go fishing.

And, if you come back, you might volunteer;
You’ll be welcomed with sentiments truly sincere.

By Susan E., a DP volunteer, with special thanks to all the DP volunteers for their inspiration.


Teamwork

July 1, 2022

Which major league contains these teams: United States, Poland, Texas, Yorkshire, Midnight Crew, P3 Archers, Procrastination, and more? Why, Distributed Proofreaders (DP), of course, a “major league” producer of e-books for Project Gutenberg.

Teams? Yes, DP volunteers have the chance to join any of over 400 Teams and “talk” online to other DP users with similar interests in the Team forums.

Let’s look at the range of topics the Teams focus on; the experience of being a Team member; and which Teams have stayed the course and flourished over nearly twenty years.

Types of Teams

We can broadly classify Teams as either “social” or “technical.”

Social Teams bring together those DP users who live in a particular part of the world, use a certain language, or share an interest outside DP.

Technical Teams are linked to some part of the DP process, such as proofreading, formatting, smooth reading, etc.; or to a special skill that may be needed in some of the books we work on, such as processing illustrations. There’s even a Team for DP Bloggers!

How Teams Work

Anyone registered with DP may join up to six Teams. They’re listed in the Team list, which gives a link for you to join any that interest you, or even enables you to start a new Team if you think you’ve identified a topic likely to support one.

Your Team memberships are listed on your profile for other users to see.

Each Team has a page with a mission statement and a link to the Team discussion thread, which appears in the Team Talk area of the DP Forums. You can take part in active Team discussions – which you can do even if you’re not a formal member of a Team – or see if you can revive an old Team with a peppy new post.

DP Teams: Facts

  • Oldest Teams:
    The first batch of Teams, established in February 2003, included Nederlands, Northern Virginia, Michigan, Florida, Aussies, Canada, Graduate Students, Finland, and the ironic Team Non-Competitive (“for people who hate the concept of teams and all the competition that goes with it”).
  • Newest Teams:
    The Teams to have got off the ground in 2022 are Vancouver and P2+1 (for volunteers who work in the second proofreading round and want to qualify for the third round). The next most recent Team to get started was launched in 2020: Cookbook Lovers.
  • Team with most members: UK, with 855 members.
  • Team with most posts: Keep Your Chin Up (“for those who proofread despite having some physical difficulty”), with 29412 posts.
  • Teams with the wackiest names:
    • Sgt. Proofers Lonely Hearts Club Team (“For every lonely heart feeling lonely and lost in this E-niversum”).
    • Distributed Avoirdupois Team (“for people who love proofing but don’t love the pounds that mysteriously attach themselves to a figure seated at a computer”).
    • No, really, I am not goofing off (“for those who should be doing their real work instead of proofing”).

Teams Then and Now

Teams became a feature of DP in 2003, not long after DP itself was born in October 2000. Since then, 413 Teams have been launched. Naturally not all of these have remained highly active, but those that are still active provide a lively environment for discussion.

Among “social” Teams, most of the active ones have a national or language focus: Western Europe makes the running with Team Germany, en français, Team Italia, Spanish, and Nederlands especially busy. There are also a few other “social” ones that have long been well attended, including Keep Your Chin Up, Proofing with Cats, and Knitters Who Read.

Active “technical” Teams, more closely focussed on DP work, include some centered on stages in the DP process: for instance, Smoooth [sic] Readers; the new P2+1 team described above; and F2 Fanatics, for volunteers working in the second formatting round. Some popular Teams offer specialist advice and assistance to other proofers, such as Music, Illustrators, Index, Turn the Tables, We’ve Got You Covered (designing custom covers for books without an original cover image), and Ad Addicts (some of our books were published with pages of adverts that present complex design issues).

The P3 round – the third stage of proofreading in which volunteers closely check each draft e-book to ensure that it matches the original – has two Teams with posts every day. One is the P3 Archers, who target projects calling for a quick finish. Then there are the P3 Diehards, who in recent months have made remarkable progress in “pushing along those P3 projects that are languishing in the list” – Diehards now usher even the largest and most challenging projects out of P3 within just a couple of months.

So, what place do Teams have within DP now? Do people still want to interact online in this way? After all, since Teams were first thought of, the rise of social media has provided many other opportunities for online socializing. The people who started the first DP Teams back in 2003 might not have predicted the proliferation of Teams and which Teams would be active now. But Teams have evolved to occupy several niches where people do find it satisfying and useful to take part, and the example of the P3 Diehards has shown how Teams can still do a lot for DP’s productivity. DP’s Teams provide both a sense of community and mutual aid in DP’s mission to “preserve history one page at a time.”

This post was contributed by Neil M., a Distributed Proofreaders volunteer.


Creating E-Book Covers

April 1, 2022

Distributed Proofreaders volunteers work hard to make the e-books they contribute to Project Gutenberg as user-friendly as possible. Among the things we do to that end is creating e-book cover images to make it easy for readers to find e-books of interest to them.

The role and requirements

Book covers in the digital age have taken a different role. Where in the past covers and dust jackets served to protect and later also advertise the book, they now mainly serve to advertise the book and make it easy to quickly locate it on a computer or e-reader screen. With that changed role, the requirements for book covers have also changed.

In short, the role of a book cover in the digital age is to

  • Invite a potential reader to give it some attention.
  • Provide an easy-to-locate icon in e-book readers or computer screens, so it can be found quickly.
  • Provide a reasonably sized, readable short title and the author’s name, so people can ascertain they have selected the book they want.
  • Give some impression of the type of content to expect.

All the while considering that a digital cover is now often just the size of a postage stamp.

A short history

Historically, decorated book covers are a relatively new invention. Books started to be sold in neatly designed covers only by the end of the 19th Century, and in some countries even later. Book buyers were expected to provide their own cover and binding, as desired and fitting for their personal library. So the publisher just sold the book as a bound stack of pages with a nondescript paper cover. That is why old libraries often look very uniform, with all those similarly and often richly bound and decorated volumes. (Our 34,000th title contributed to Project Gutenberg was a manual of artistic bookbinding published in 1878.)

Since books are stored in bookcases or cabinets with only their spine visible, the publisher needed only to put identifying information, such as the title, on the spine. The cover could remain boringly neutral, or, as with some ancient bibles, heavily decorated, but there was no need to put a title on them.

Fortunately, many of the originals we work from at Project Gutenberg are late 19th- and early 20th-Century titles, which often do have nice book covers. However, even when the book we are digitizing does have a cover, it is the part of a book (after the spine) that is most likely to suffer from wear and tear, stained with ink and coffee, mutilated by repeated unprofessional repairs, and defaced by libraries who like to put stickers with shelf locations and bar-codes on them. They are also most exposed to sunlight and so end up discolored.

Even then, such covers were designed to be attractive when placed in the book shop’s window, on a table, or when pulled from a shelf by a prospective buyer, so the requirements for large-size titles and author names are quite different from those you’ll need on a postage-stamp-sized digital image.

Challenges

When dealing with book covers, we at Distributed Proofreaders face a number of challenges. It is our intention to reproduce the original book in its full glory, “the book, the whole book, and nothing but the book.” Of course, with the transition to a digital format, we will lose some of the artifacts of the paper medium, such as page headers and page numbers, although we often retain the latter as small notes in the margin. Similarly, book covers will have to be reinvented for our books reincarnated in their digital form.

When preparing a book for Project Gutenberg, we will address these challenges in different ways.

Locating a good quality cover

First of all, we prefer to use an image of the original cover, so if we have one, we can use that as a starting point. In that case, it often requires some digital restoration. But before we invest in the labor-intensive process of restoration, we’ll seek out alternatives. If we don’t have a good quality cover, but have some idea of what it looks like, our first step is an internet search. Surprisingly often, better-quality scans of the same cover can be found, and sometimes those can be used. We need to be sure to pick only scans of a truly matching cover (i.e., same edition and printing), both to avoid a copyright violation, and to maintain the integrity of the e-book edition we’re making. Covers tend to appear in far more variations than the book itself, even within a single print-run.

Digitally restoring a damaged cover

If our search fails to unearth a good-enough cover, we will fire up our photo-editing software to restore what we do have. My personal guidelines in digital restoration is not to try to reconstruct an as-new cover (it would be nice if such a cover is still available), but only to remove mutilations like bar-codes and disfiguring damage, such as scratches and stains. Smaller aspects of wear and tear I will leave as is: it is not a shame to be old and look it. What I will also try to do is brighten up the colors, and restore color balance. Of course this involves a lot of guesswork, but again, if we can find alternative images on-line, even if tiny photographs, they can give us an indication of the original colors if our copy is particularly discolored.

Removing disfiguring stains from the cover of Van de Noordpool naar den Aequator
Improving the cover colors for Belgian Fairy Tales
Removing the bar code from the cover of The Mason Wasps

Adding titles and authors to original covers

As explained above, the original cover will often not mention the title or author at all. In such cases, to make it easier to recognize a book, we can decide to digitally add the title to the front, — that is, if the original design leaves space for it, which it often does. When adding the title and author, it makes sense to use a typeface matching that of the spine (if known), or the title page. Sometimes we can also use the title from the title page directly, manipulating the color and appearance to blend in with the original cover design.

Adding the title and author to the cover of Myths of the Cherokee

Designing our own cover

Then we come to the point where we have no cover to start with at all. The book at hand is in a generic, unmarked cover, or we have none at all, for example when we work from a set of scans produced elsewhere. In that case, we will design a new cover. From here onward I will concentrate mostly on the way I do this, as other volunteers may have different procedures. It may be tempting to go all overboard and design something really fancy, but here I normally try to restrict myself and keep it functional.

One starting point I often use is the scanned cloth pattern of a book’s back to serve as a generic background. I derived a range of color variants from it. I will pick one color, depending on my mood and gut feeling of what would be appropriate for the book, and will add the original title, author, and year of original publication in a centered design. If the book itself includes a suitable illustration, often the frontispiece, I will use that. If not, I will slightly emboss a generic “PGDP” design on it, but won’t use artwork not present in the source, because of the copyright implications that might have. Balancing out the letters takes some puzzling with font-sizes, splitting lines, and letter-spacing, but normally, I am able to produce a reasonable new cover in some 15 to 30 minutes. Not perfect, probably not to everybody’s taste, but better than auto-generated.

DP-created covers for Narrative of a Five Years’ Expedition Against the Revolted Negroes of Surinam, Serbian Fairy Tales, and De Hogerveldt’s Oorspronkelijk Tooneelspel in 3 Bedrijven

I normally use serif typefaces, capital letters, and symmetrical design, because that was the standard in the era most of our books where produced. Asymmetric designs only started to come into vogue after the 1920’s, and thus are inconsistent with most books’ age. I still don’t feel the need to fully emulate an old style cover: I typically use somewhat brighter and larger letters, and prominently place the year of the original copy at the bottom of my design. This should immediately signal to the reader they are dealing with an old book in a new digital cover.

Some things that work less well

An alternative I regularly see is to use the title page as a replacement for a cover. I am not a big fan of that, because title pages are far more similar to each other and often black-and-white, so they lack distinctiveness. Besides that, they often include more detailed information, like the publisher’s name, author credentials, and such, given in a much smaller type. Imagine what it does with your ability to spot the book you’re looking for on a screen filled with postage stamp sized title pages in an e-reader.

Not all books in Project Gutenberg have book covers, so as a gap-stop measure, PG has a system to generate generic covers automatically. The results are not always satisfactory, because the software we use isn’t smart enough to understand what part of the title is most significant and to tweak letter sizes and spacing accordingly to obtain a pleasing result.

Finally, a little searching on some large commercial e-book platforms will reveal a range of newly designed covers for public domain books (the texts for which are often harvested from Project Gutenberg’s offerings in bulk), which range from boring to utterly hilarious: using inappropriate photographs on designs that make serious literary classics, even non-fiction, look worse than cliché Harlequin romances. Such things should not happen at Project Gutenberg, except when we keep the original pulp magazine cover that happens to be equally cringe-worthy, such as this:

Cover of the Dutch pulp magazine Lord Lister No. 8

This post was contributed by Jeroen Hellingman, a Distributed Proofreaders volunteer.


Čħāṛᾀςŧέř ♭ῧįłďĭñġ (Character Building)

June 1, 2021

Editor’s Note: This post contains some uncommon Unicode characters, some of which may not display properly on older systems.

In May 2020 the Distributed Proofreaders (DP) site moved over to using the Unicode UTF-8 character base. “This is a very major change,” said General Manager Linda Hamilton. “This move has been a long-term objective for many years.” (For more information on this huge improvement, see the DP Wiki article Site conversion to Unicode.) Now, instead of being limited to about 200 assorted letters, numbers, and squiggles, DP has over a million to choose from.

We started modestly, providing a very few extra characters, such as the “œ” ligature often found in older books in words like Œdipus or cœlacanth. Soon though, DP site developers were picking up the pace, providing more and more additional character suites that Project Managers can assign to their projects. We rolled out three character suites for different European languages – with letters like ĝ, Đ, or ł; Basic Greek and Polytonic Greek character suites; and one for characters found in medieval books such as Ƿ or ȳ. In addition, Project Managers can now add individual Unicode characters to a project where they’re needed. These characters, which can include less familiar specimens such as ŧ ꝓ ᴚ ♅ ◘, show up in a “Custom” character suite on the proofreading screen.

All Greek to us

How does this really benefit our work in DP? Mainly, the Unicode-based encoding allows us to support languages that use characters outside the “Latin-1” character suite, which was what we had available before then.

Let’s take one important example: Ancient Greek. Why important? Because many of the books we work on, from the 19th century or before, do contain Greek words or whole passages in Greek. The writers of the time took it for granted that readers of the more scholarly type of book would have learnt Greek (along with Latin) as part of their education.

We reported here a few months ago about how DP handled Grote’s History of Greece, a monumental work with thousands of footnotes containing Greek text. Much of the work there fell to the Post-Processor, the person who prepares a project for final publication after it has all been proofread. With Unicode, the proofreaders can have a share of the fun!

Formerly, the proofreaders didn’t have the use of Greek characters. To represent the text during proofreading, a roundabout process was necessary in which proofreaders produced a transliterated version of the Greek written in our familiar Roman alphabet, like [Greek: mêde nein mêde grammata], to be transformed back into the original Greek – μηδὲ νεῖν μηδὲ γράμματα – by the Post-Processor. But now, the proofreaders can produce a correct text, drawing on a complete set of Greek characters. This is how the relevant part of the proofreading screen looks for a project that includes a Greek character set:

Asking proofreaders to work with Greek letters is also more practicable now that Optical Character Recognition (OCR) software has become better at reading Greek. Take a look at what OCR made of a page of Greek in 2005:

Looks like the book was written in Klingon doesn’t it? Now compare this, from 2020:

Still far from perfect, but good enough that proofreaders don’t have to retype the whole thing from scratch. Using the expanded character set, they can now correct the Greek text coming from OCR, just as they correct text in their own language.

The Mercury goes up

It’s the same with other types of characters. If proofreaders meet an unfamiliar letter in a medieval book, instead of typing [yogh] (for example), they can now input the actual letter ȝ. And we continue to add more character sets to meet the needs of our varied projects. Among recent ones is a set of characters used in Romanized forms of languages such as Arabic, Hebrew, and Sanskrit, so we can reproduce accurate transliterations of names such as ʿAlaʾuddīn or Mahābhārata or Viṣṇu (which previously had to be proofed as Vi[s.][n.]u). There’s also a “symbols collection,” including astronomical, zodiac, apothecary, and music symbols. With this collection, if we’re proofreading an astrological book, instead of [Mercury] we can now simply add ☿. Recipe for a bygone apothecary’s potion? Not [**ounce], but ℥. And so on.

So with Unicode, proofreaders know that someone else won’t need to come along and change all the awkward symbols later. Now they can do the whole job and produce a precise digital version of the original page, no matter what characters are on it!

This post was contributed by Neil M., a Distributed Proofreaders volunteer.


%d bloggers like this: