My Distributed Proofreaders Journey

April 1, 2024

When I first discovered Distributed Proofreaders via a Google search, my quest was like that of many who were looking for an interesting and challenging activity during the pandemic. Turning public domain books into free e-books for Project Gutenberg – preserving history by proofreading one page at a time – seemed ideal. And one thing that particularly drew me to Distributed Proofreaders was that there was an active forum where like-minded folks could share knowledge and chat.

The Bookworm, Carl Spitzweg, c. 1850

I stalked that forum for a few days and noticed one particular forum thread was quite active – P3 Diehards. I said to myself, “I want to join that fun team.” Problem was, I was not a P3! P3 is the top of three levels of proofreaders, and the privilege must be earned.

And so my Distributed Proofreaders journey began, going through the registration process, taking the training quizzes, and starting at the lowest proofreading level, P1. I had my blinders on and was only interested in proofreading so I could qualify for level P3. Occasionally, on my breaks, I would browse the forum to see what everyone was talking about. There was always something complex that people were working through with help from others. It all sounded too bothersome and beyond my abilities at the time. So, back to proofing I would go.

And then one day it happened. A private message arrived in my inbox announcing my access to P3! I think it was maybe 30 seconds later I became a member of the P3 Diehards team. What a great group and a great goal – to save the old books from languishing in the P3 round for more than 100 days. Of course, there are other great teams at Distributed Proofreaders; this one just spoke to me. (I even wrote a poem about it!)

Fast forward two years, and a new interest was sparked to investigate some other parts of Distributed Proofreaders. There’s Formatting, Content Providing, Project Management, and Post-Processing. I decided I wanted to try them all!

Starting with formatting seemed like the logical first step. At first it’s difficult to understand which tags (such as for italics and poetry and so forth) go where, and more importantly why, but it is rewarding when the second level formatter, F2, confirms that my page was done correctly.

Next step: Project Manager (PM). A PM guides a book project through the rounds of proofreading and formatting and answers questions in the project discussion. I was prompted to this endeavor by a plea for more content to be provided for new members. However, I couldn’t yet create my own projects to manage, so I adopted a few from others for awhile. With kindness and expert knowledge, my mentor, Fay Dunn, guided me through setting up my first project and several more after that. It is fun to watch my projects going through the rounds, some more popular than others, but they are all like “my kids.”

So that meant the next step needed to be learning Content Providing (CP). CPs select books for processing through Distributed Proofreaders, harvesting page images from online sources and converting them into editable text using optical character recognition. For this step, there were nearly a dozen software applications that first needed to be downloaded. And, of course, about half of them were problematic with my device for one reason or another. BUT, there’s a forum with lots of knowledgeable people from all over the world. They came to my rescue, and finally everything was working.

It was slow going at first, what with learning all this new software AND the CP steps. One of those steps is obtaining copyright clearance from Project Gutenberg, because everything we work on has to be in the public domain. I can’t describe the thrill when I see the clearance email from the copyright team approving a project. I’ve now CP’d 10 projects and was recently handed my graduation diploma by my wonderful mentor!

Post Processing (PP) was the last on the list of areas to tackle. PPers convert the proofread and formatted text into its final e-book format for Project Gutenberg. This turned out to be the most arduous process. There was more software to download and learn, as well as learning new concepts and new lingo used in the Guidelines. Luckily my PM/CP mentor felt brave enough to help me through it all. After a crash course in HTML and CSS and some peeking at finished projects, I somehow managed to crank out my first PP project recently. A good amount of time will need to lapse before I tackle another PP project, but I have so much respect for those who get these books onto the Project Gutenberg site.

It’s mind-boggling the amount of intelligence in this Distributed Proofreaders pool of volunteers, from solving difficult software issues, to proofing questions about any known language to man, to devising detailed search functions; there’s always someone to provide the answer.

This journey has been one of the best experiences of my life. I conquered some complicated (for me) challenges. I almost gave up a couple of times, but my mentor kept encouraging and reassuring me. I’m so glad I persisted, as it gave me the opportunity to rub elbows with some brilliant, clever, and creative minds from around the world. It is a privilege to be among them.

This post was contributed by Susan E., a Distributed Proofreaders volunteer.


What Are You Waiting For? (A Poem)

August 1, 2023

Are you enthralled with challenges that exercise the mind?
And do you feel some comfort with the literary kind?

Then let me tell you ‘bout this place where you’ll feel right at home.
Come join us for this worthy cause; there’s no more need to roam.

This place is called DP for short because the name is long,
And if I could, I’d add some notes and sing it like a song.

The worthy cause will bring some books to many folks indeed,
And you could play a key role in providing what they need.

If you’ve been told or know yourself that you’ve got special skills,
Then don’t wait ‘round; put them to use and reap in all the thrills.

Please take a seat so I may share the features of DP;
I think you’ll see your time spent here is better than TV.

There’s lots to put your brain to work while feeling so sublime,
Plus knowing all the good it does makes worthy all your time.

Have you been told that you excel at picking out details?
Then proofing might be just the thing to open up your sails.

Sail free and wide through punctuation, noting scanned mistakes.
Your bounty comes with knowing that your brain is all it takes!

Then share the funny, weird, inspired phrases that you find
On the friendly, helpful, worldwide forum where great minds are aligned.

Our forums are a gathering place for questions and support,
But socializing does take place, I’m happy to report!

Your questions are most welcome so feel free to post your plea;
We do not want you feeling you’re completely lost at sea.

Take one step more and join a team; there’re many you can choose.
So find a team that makes you smile and complements your views.

I have to say without a doubt, Team Diehards is my fave.
Those hard, forgotten projects are the ones we try to save.

If you like organizing things displayed all nice and square,
Your joy may be to format books, come try it if you dare.

You’ll use a nice assortment of some very nifty tags
To mark the words and wordy parts so well you’ll want to brag.

Our managers and processors pull everything together.
This type of work may suit you fine no matter what the weather.

Come rain or shine, this type of work is certainly rewarding,
Enough to make you jump from bed all fresh to start your morning.

And last, not least, there is a space where folks enjoy Smooth Reading.
Please sit right down and read a book; it’s key to us succeeding!

So if you love a challenge for your literary mind
And working with an online group whose goals are quite aligned,

Please join us at DP today; there’s so much to explore.
The final question without doubt: What are you waiting for?

This poem was contributed by Susan E., a Distributed Proofreaders volunteer.


The Story of Chamber Music

May 1, 2023

The Joachim Quartet, founded by violinist Joseph Joachim

Chamber music – sometimes called “the music of friends” – is one of the most intimate genres of classical music. In its ideal form, a handful of musicians – professional or amateur – play together in a private space for a small audience. There is no orchestra, no conductor, no soloist. In The Story of Chamber Music, you can read about its rich history and hear some of its music, thanks to the volunteers at Distributed Proofreaders and Project Gutenberg.

The author of The Story of Chamber Music, Nicholas Kilburn (1843-1923), was himself an amateur musician – his “day job” was running a locomotive repair company – with a music degree from the University of Cambridge. In his spare time, he was actively involved with a variety of British amateur musical groups as a conductor, pianist, organist, and cellist. Kilburn made a point of supporting the music of British composers, becoming a close friend of Edward Elgar. He was also an ardent disciple of Richard Wagner and wrote two books about his music.

Published in 1904, The Story of Chamber Music traces the origins of chamber music back to medieval times in Europe, when musicians entertained kings in private palace chambers. Noblemen took up the practice and used chamber music at their banquets “as a stimulus and a cover for conversation, a practice not even yet quite obsolete,” as Kilburn notes.

From these early times, Kilburn takes us to the 17th Century, when both private and public concerts of chamber music became popular entertainments. Two famous diarists of the era, Samuel Pepys and John Evelyn (both of whose diaries are available at Project Gutenberg), mention attending chamber concerts, with Pepys so transported by a wind ensemble that he “remained all night transported, so as I could not believe that ever any musique hath that real command over the soul of a man as this did upon me.”

Kilburn then shows how the musical giants of the Baroque, Classical, and Romantic Eras – Bach, Haydn, Handel, Mozart, Beethoven, Schubert, Mendelssohn, Schumann, Brahms, and others – further refined chamber music into the brilliant form we know today. The Story of Chamber Music was written at the tail-end of the Late Romantic Era, so Kilburn included studies of chamber music by later composers like Dvořák, Richard Strauss, and Bruckner. There is a chapter devoted to chamber music by Russian composers such as Glinka and Tchaikovsky. And he concludes with a chapter on contemporary chamber music, selecting for inclusion “[o]nly what is thought may prove acceptable and useful to earnest amateurs.” Kilburn speaks admiringly of the black composer Samuel Coleridge-Taylor and highlights the work of several other British composers.

The Story of Chamber Music contains over 75 music notation excerpts from string quartets and other chamber works. The e-book version at Project Gutenberg enhances the enjoyment of it by including audio files in MP3 format to accompany these excerpts. Distributed Proofreaders has a Music Team devoted to transcribing music notation and creating audio files using music software. Here, for example, is an excerpt from a Mozart string quartet, cited by Kilburn on page 67 and transcribed by a Distributed Proofreaders volunteer:

The e-book version also includes MusicXML files for the music excerpts – a standard open file format designed for sharing music notation. If you’re a musician, you can download MusicXML files into your favorite notation software for your own use.

Kilburn’s enthusiasm for chamber music is apparent on every page of his book. As he put it in his introduction:

[W]e may ask ourselves which of the great forms of musical composition we would plead for in case all the rest were doomed to destruction. Music for the orchestra, with its vivid colours, its strength and delicacy; the vast range of choral music; works for the organ, that huge modern plexus of pipe and reed;—these and others no doubt have strong claims on our musical affections. But, if forced to such a choice, it is certain that many a musician would, without hesitation, pledge himself to uphold the claims of Chamber music, for who can measure the almost infinite variety and charm which it affords, and that, too, with the slenderest means?

The Story of Chamber Music lets us read and listen to some of the “infinite variety and charm” of this lovely musical genre.

This post was contributed by Linda Cantoni, a Distributed Proofreaders volunteer.


The Life of a P3 Diehard

January 1, 2023

Note for those who don’t yet know the e-book creation workflow at Distributed Proofreaders: After a scanned book is turned into editable text, it goes through three rounds of proofreading. The third, P3, is the most challenging, as it requires the most expertise and the closest attention, so sometimes a project has to wait awhile until P3-qualified proofreaders can get to it. The P3 Diehards team has dedicated itself to rescuing projects that are languishing in that round.

I woke up this morning with a minor headache,
So the first line of business: There’s coffee to make!

The headache was due to some major proofreading,
But I wanted to help move some books to smooth reading.

I found my new passion at DP last year
‘Cause there’s so much to do plus there’s fellowship here.

The challenge was huge, but my spirit was keen,
And each day I leaned on my good friend: Caffeine.

Each day I made progress, but slow in my mind.
The goal seemed beyond reach; I felt so behind.

This goal was to help in the most needed place
Where projects sat languished, forgotten, misplaced.

But one thing that kept me on path to my goal
Was seeking that something that feeds mind and soul.

So, first was the hurdle of gaining the level
Where trust must be earned and to demonstrate mettle.

I thought I was hopeless to learn any more,
But my mentors worked wonders with guidance galore!

Then one day my inbox had news I had hoped for:
Clear access to P3; it made my heart soar!

Team Diehards is where I went skipping so quickly
To help with those projects abandoned and prickly.

Some projects are tricky or boring or fun,
And sharing with teammates is second to none.

The visions from Surgery of so many leeches
Are far from the thought of a bowl full of peaches!

The sad Roll of Honour brought tears to my eyes,
But the story of bravery and valor survives.

My headache is gone; I give thanks with “Amen.”
And tomorrow I can’t wait to do it again!

This poem was contributed by Susan E., a Distributed Proofreaders volunteer. Hot off the Press wishes all its readers a happy and book-filled New Year!


A Whimsical Tour of Distributed Proofreaders

October 1, 2022

Distributed Proofreaders is 22 years old today, and we’re celebrating our anniversary with a bit of poetry. Congratulations to all the volunteers who have helped in “preserving history one page at a time” all these years!

DP = Distributed Proofreaders
PM = Project Manager
PP = Post-Processor
PG = Project Gutenberg

Calliope by Meynier

Welcome to DP; please step right on in.
The tour I shall give is about to begin.

The goal of DP is to save the old books
Where some have sat gathered in dusty ole nooks.

We find them and scan them to bring back to life;
So you can view free on your electric device.

Now let’s take a tour to observe what goes on
And capture the secrets of change undergone.

The PMs are wise folks who manage our projects.
They keep the work tidy for steps that come up next.

The next step is proofing to check for mistakes
That optical scanners leave lying in wait.

Our Proofers have keen eyes that don’t miss a beat.
Our books are proofed three times; it’s really a feat!

The Formatters labor to put things in line
So reading is easy and feels so refined.

They use special coding to line up the text.
This work is what keeps their big brain muscles flexed.

Some covers are damaged and look really ratty,
But DP has Wizards to make them look natty.

These Wizards are clever with mysterious notions
Who wave their wands deftly in all sorts of motions.

When all of the proofing and formatting’s done,
The job of the PP has only begun.

They gather and sort and they move things about
Till it’s pretty and perfect and ready, no doubt.

But wait, one more step comes before a book’s ready;
Our Smooth Readers check with their eyes fixed and steady.

Some issues crop up that we’ve not seen before,
But there’s always a Mentor with guidance galore.

The talent in all teams just boggles the mind,
Plus count on support from a Mentor who’s kind.

Believe it or not we have Squirrels running ‘round,
But they aren’t the same type in the woods to be found.

Our Squirrels are equipped with knowledge all-round,
Technicians who keep this place all safe and sound.

When DP transforms these old writings anew,
They’re sent to PG as enjoyment for you.

I think you will probably all quite agree
What a gift these books are to be offered for free!

At DP you know you will always be welcome–
Whatever your interests; wherever you’re from.

This tour is concluded but not without wishing
That you will come back, and not just go fishing.

And, if you come back, you might volunteer;
You’ll be welcomed with sentiments truly sincere.

By Susan E., a DP volunteer, with special thanks to all the DP volunteers for their inspiration.


Teamwork

July 1, 2022

Which major league contains these teams: United States, Poland, Texas, Yorkshire, Midnight Crew, P3 Archers, Procrastination, and more? Why, Distributed Proofreaders (DP), of course, a “major league” producer of e-books for Project Gutenberg.

Teams? Yes, DP volunteers have the chance to join any of over 400 Teams and “talk” online to other DP users with similar interests in the Team forums.

Let’s look at the range of topics the Teams focus on; the experience of being a Team member; and which Teams have stayed the course and flourished over nearly twenty years.

Types of Teams

We can broadly classify Teams as either “social” or “technical.”

Social Teams bring together those DP users who live in a particular part of the world, use a certain language, or share an interest outside DP.

Technical Teams are linked to some part of the DP process, such as proofreading, formatting, smooth reading, etc.; or to a special skill that may be needed in some of the books we work on, such as processing illustrations. There’s even a Team for DP Bloggers!

How Teams Work

Anyone registered with DP may join up to six Teams. They’re listed in the Team list, which gives a link for you to join any that interest you, or even enables you to start a new Team if you think you’ve identified a topic likely to support one.

Your Team memberships are listed on your profile for other users to see.

Each Team has a page with a mission statement and a link to the Team discussion thread, which appears in the Team Talk area of the DP Forums. You can take part in active Team discussions – which you can do even if you’re not a formal member of a Team – or see if you can revive an old Team with a peppy new post.

DP Teams: Facts

  • Oldest Teams:
    The first batch of Teams, established in February 2003, included Nederlands, Northern Virginia, Michigan, Florida, Aussies, Canada, Graduate Students, Finland, and the ironic Team Non-Competitive (“for people who hate the concept of teams and all the competition that goes with it”).
  • Newest Teams:
    The Teams to have got off the ground in 2022 are Vancouver and P2+1 (for volunteers who work in the second proofreading round and want to qualify for the third round). The next most recent Team to get started was launched in 2020: Cookbook Lovers.
  • Team with most members: UK, with 855 members.
  • Team with most posts: Keep Your Chin Up (“for those who proofread despite having some physical difficulty”), with 29412 posts.
  • Teams with the wackiest names:
    • Sgt. Proofers Lonely Hearts Club Team (“For every lonely heart feeling lonely and lost in this E-niversum”).
    • Distributed Avoirdupois Team (“for people who love proofing but don’t love the pounds that mysteriously attach themselves to a figure seated at a computer”).
    • No, really, I am not goofing off (“for those who should be doing their real work instead of proofing”).

Teams Then and Now

Teams became a feature of DP in 2003, not long after DP itself was born in October 2000. Since then, 413 Teams have been launched. Naturally not all of these have remained highly active, but those that are still active provide a lively environment for discussion.

Among “social” Teams, most of the active ones have a national or language focus: Western Europe makes the running with Team Germany, en français, Team Italia, Spanish, and Nederlands especially busy. There are also a few other “social” ones that have long been well attended, including Keep Your Chin Up, Proofing with Cats, and Knitters Who Read.

Active “technical” Teams, more closely focussed on DP work, include some centered on stages in the DP process: for instance, Smoooth [sic] Readers; the new P2+1 team described above; and F2 Fanatics, for volunteers working in the second formatting round. Some popular Teams offer specialist advice and assistance to other proofers, such as Music, Illustrators, Index, Turn the Tables, We’ve Got You Covered (designing custom covers for books without an original cover image), and Ad Addicts (some of our books were published with pages of adverts that present complex design issues).

The P3 round – the third stage of proofreading in which volunteers closely check each draft e-book to ensure that it matches the original – has two Teams with posts every day. One is the P3 Archers, who target projects calling for a quick finish. Then there are the P3 Diehards, who in recent months have made remarkable progress in “pushing along those P3 projects that are languishing in the list” – Diehards now usher even the largest and most challenging projects out of P3 within just a couple of months.

So, what place do Teams have within DP now? Do people still want to interact online in this way? After all, since Teams were first thought of, the rise of social media has provided many other opportunities for online socializing. The people who started the first DP Teams back in 2003 might not have predicted the proliferation of Teams and which Teams would be active now. But Teams have evolved to occupy several niches where people do find it satisfying and useful to take part, and the example of the P3 Diehards has shown how Teams can still do a lot for DP’s productivity. DP’s Teams provide both a sense of community and mutual aid in DP’s mission to “preserve history one page at a time.”

This post was contributed by Neil M., a Distributed Proofreaders volunteer.


Creating E-Book Covers

April 1, 2022

Distributed Proofreaders volunteers work hard to make the e-books they contribute to Project Gutenberg as user-friendly as possible. Among the things we do to that end is creating e-book cover images to make it easy for readers to find e-books of interest to them.

The role and requirements

Book covers in the digital age have taken a different role. Where in the past covers and dust jackets served to protect and later also advertise the book, they now mainly serve to advertise the book and make it easy to quickly locate it on a computer or e-reader screen. With that changed role, the requirements for book covers have also changed.

In short, the role of a book cover in the digital age is to

  • Invite a potential reader to give it some attention.
  • Provide an easy-to-locate icon in e-book readers or computer screens, so it can be found quickly.
  • Provide a reasonably sized, readable short title and the author’s name, so people can ascertain they have selected the book they want.
  • Give some impression of the type of content to expect.

All the while considering that a digital cover is now often just the size of a postage stamp.

A short history

Historically, decorated book covers are a relatively new invention. Books started to be sold in neatly designed covers only by the end of the 19th Century, and in some countries even later. Book buyers were expected to provide their own cover and binding, as desired and fitting for their personal library. So the publisher just sold the book as a bound stack of pages with a nondescript paper cover. That is why old libraries often look very uniform, with all those similarly and often richly bound and decorated volumes. (Our 34,000th title contributed to Project Gutenberg was a manual of artistic bookbinding published in 1878.)

Since books are stored in bookcases or cabinets with only their spine visible, the publisher needed only to put identifying information, such as the title, on the spine. The cover could remain boringly neutral, or, as with some ancient bibles, heavily decorated, but there was no need to put a title on them.

Fortunately, many of the originals we work from at Project Gutenberg are late 19th- and early 20th-Century titles, which often do have nice book covers. However, even when the book we are digitizing does have a cover, it is the part of a book (after the spine) that is most likely to suffer from wear and tear, stained with ink and coffee, mutilated by repeated unprofessional repairs, and defaced by libraries who like to put stickers with shelf locations and bar-codes on them. They are also most exposed to sunlight and so end up discolored.

Even then, such covers were designed to be attractive when placed in the book shop’s window, on a table, or when pulled from a shelf by a prospective buyer, so the requirements for large-size titles and author names are quite different from those you’ll need on a postage-stamp-sized digital image.

Challenges

When dealing with book covers, we at Distributed Proofreaders face a number of challenges. It is our intention to reproduce the original book in its full glory, “the book, the whole book, and nothing but the book.” Of course, with the transition to a digital format, we will lose some of the artifacts of the paper medium, such as page headers and page numbers, although we often retain the latter as small notes in the margin. Similarly, book covers will have to be reinvented for our books reincarnated in their digital form.

When preparing a book for Project Gutenberg, we will address these challenges in different ways.

Locating a good quality cover

First of all, we prefer to use an image of the original cover, so if we have one, we can use that as a starting point. In that case, it often requires some digital restoration. But before we invest in the labor-intensive process of restoration, we’ll seek out alternatives. If we don’t have a good quality cover, but have some idea of what it looks like, our first step is an internet search. Surprisingly often, better-quality scans of the same cover can be found, and sometimes those can be used. We need to be sure to pick only scans of a truly matching cover (i.e., same edition and printing), both to avoid a copyright violation, and to maintain the integrity of the e-book edition we’re making. Covers tend to appear in far more variations than the book itself, even within a single print-run.

Digitally restoring a damaged cover

If our search fails to unearth a good-enough cover, we will fire up our photo-editing software to restore what we do have. My personal guidelines in digital restoration is not to try to reconstruct an as-new cover (it would be nice if such a cover is still available), but only to remove mutilations like bar-codes and disfiguring damage, such as scratches and stains. Smaller aspects of wear and tear I will leave as is: it is not a shame to be old and look it. What I will also try to do is brighten up the colors, and restore color balance. Of course this involves a lot of guesswork, but again, if we can find alternative images on-line, even if tiny photographs, they can give us an indication of the original colors if our copy is particularly discolored.

Removing disfiguring stains from the cover of Van de Noordpool naar den Aequator
Improving the cover colors for Belgian Fairy Tales
Removing the bar code from the cover of The Mason Wasps

Adding titles and authors to original covers

As explained above, the original cover will often not mention the title or author at all. In such cases, to make it easier to recognize a book, we can decide to digitally add the title to the front, — that is, if the original design leaves space for it, which it often does. When adding the title and author, it makes sense to use a typeface matching that of the spine (if known), or the title page. Sometimes we can also use the title from the title page directly, manipulating the color and appearance to blend in with the original cover design.

Adding the title and author to the cover of Myths of the Cherokee

Designing our own cover

Then we come to the point where we have no cover to start with at all. The book at hand is in a generic, unmarked cover, or we have none at all, for example when we work from a set of scans produced elsewhere. In that case, we will design a new cover. From here onward I will concentrate mostly on the way I do this, as other volunteers may have different procedures. It may be tempting to go all overboard and design something really fancy, but here I normally try to restrict myself and keep it functional.

One starting point I often use is the scanned cloth pattern of a book’s back to serve as a generic background. I derived a range of color variants from it. I will pick one color, depending on my mood and gut feeling of what would be appropriate for the book, and will add the original title, author, and year of original publication in a centered design. If the book itself includes a suitable illustration, often the frontispiece, I will use that. If not, I will slightly emboss a generic “PGDP” design on it, but won’t use artwork not present in the source, because of the copyright implications that might have. Balancing out the letters takes some puzzling with font-sizes, splitting lines, and letter-spacing, but normally, I am able to produce a reasonable new cover in some 15 to 30 minutes. Not perfect, probably not to everybody’s taste, but better than auto-generated.

DP-created covers for Narrative of a Five Years’ Expedition Against the Revolted Negroes of Surinam, Serbian Fairy Tales, and De Hogerveldt’s Oorspronkelijk Tooneelspel in 3 Bedrijven

I normally use serif typefaces, capital letters, and symmetrical design, because that was the standard in the era most of our books where produced. Asymmetric designs only started to come into vogue after the 1920’s, and thus are inconsistent with most books’ age. I still don’t feel the need to fully emulate an old style cover: I typically use somewhat brighter and larger letters, and prominently place the year of the original copy at the bottom of my design. This should immediately signal to the reader they are dealing with an old book in a new digital cover.

Some things that work less well

An alternative I regularly see is to use the title page as a replacement for a cover. I am not a big fan of that, because title pages are far more similar to each other and often black-and-white, so they lack distinctiveness. Besides that, they often include more detailed information, like the publisher’s name, author credentials, and such, given in a much smaller type. Imagine what it does with your ability to spot the book you’re looking for on a screen filled with postage stamp sized title pages in an e-reader.

Not all books in Project Gutenberg have book covers, so as a gap-stop measure, PG has a system to generate generic covers automatically. The results are not always satisfactory, because the software we use isn’t smart enough to understand what part of the title is most significant and to tweak letter sizes and spacing accordingly to obtain a pleasing result.

Finally, a little searching on some large commercial e-book platforms will reveal a range of newly designed covers for public domain books (the texts for which are often harvested from Project Gutenberg’s offerings in bulk), which range from boring to utterly hilarious: using inappropriate photographs on designs that make serious literary classics, even non-fiction, look worse than cliché Harlequin romances. Such things should not happen at Project Gutenberg, except when we keep the original pulp magazine cover that happens to be equally cringe-worthy, such as this:

Cover of the Dutch pulp magazine Lord Lister No. 8

This post was contributed by Jeroen Hellingman, a Distributed Proofreaders volunteer.


Čħāṛᾀςŧέř ♭ῧįłďĭñġ (Character Building)

June 1, 2021

Editor’s Note: This post contains some uncommon Unicode characters, some of which may not display properly on older systems.

In May 2020 the Distributed Proofreaders (DP) site moved over to using the Unicode UTF-8 character base. “This is a very major change,” said General Manager Linda Hamilton. “This move has been a long-term objective for many years.” (For more information on this huge improvement, see the DP Wiki article Site conversion to Unicode.) Now, instead of being limited to about 200 assorted letters, numbers, and squiggles, DP has over a million to choose from.

We started modestly, providing a very few extra characters, such as the “œ” ligature often found in older books in words like Œdipus or cœlacanth. Soon though, DP site developers were picking up the pace, providing more and more additional character suites that Project Managers can assign to their projects. We rolled out three character suites for different European languages – with letters like ĝ, Đ, or ł; Basic Greek and Polytonic Greek character suites; and one for characters found in medieval books such as Ƿ or ȳ. In addition, Project Managers can now add individual Unicode characters to a project where they’re needed. These characters, which can include less familiar specimens such as ŧ ꝓ ᴚ ♅ ◘, show up in a “Custom” character suite on the proofreading screen.

All Greek to us

How does this really benefit our work in DP? Mainly, the Unicode-based encoding allows us to support languages that use characters outside the “Latin-1” character suite, which was what we had available before then.

Let’s take one important example: Ancient Greek. Why important? Because many of the books we work on, from the 19th century or before, do contain Greek words or whole passages in Greek. The writers of the time took it for granted that readers of the more scholarly type of book would have learnt Greek (along with Latin) as part of their education.

We reported here a few months ago about how DP handled Grote’s History of Greece, a monumental work with thousands of footnotes containing Greek text. Much of the work there fell to the Post-Processor, the person who prepares a project for final publication after it has all been proofread. With Unicode, the proofreaders can have a share of the fun!

Formerly, the proofreaders didn’t have the use of Greek characters. To represent the text during proofreading, a roundabout process was necessary in which proofreaders produced a transliterated version of the Greek written in our familiar Roman alphabet, like [Greek: mêde nein mêde grammata], to be transformed back into the original Greek – μηδὲ νεῖν μηδὲ γράμματα – by the Post-Processor. But now, the proofreaders can produce a correct text, drawing on a complete set of Greek characters. This is how the relevant part of the proofreading screen looks for a project that includes a Greek character set:

Asking proofreaders to work with Greek letters is also more practicable now that Optical Character Recognition (OCR) software has become better at reading Greek. Take a look at what OCR made of a page of Greek in 2005:

Looks like the book was written in Klingon doesn’t it? Now compare this, from 2020:

Still far from perfect, but good enough that proofreaders don’t have to retype the whole thing from scratch. Using the expanded character set, they can now correct the Greek text coming from OCR, just as they correct text in their own language.

The Mercury goes up

It’s the same with other types of characters. If proofreaders meet an unfamiliar letter in a medieval book, instead of typing [yogh] (for example), they can now input the actual letter ȝ. And we continue to add more character sets to meet the needs of our varied projects. Among recent ones is a set of characters used in Romanized forms of languages such as Arabic, Hebrew, and Sanskrit, so we can reproduce accurate transliterations of names such as ʿAlaʾuddīn or Mahābhārata or Viṣṇu (which previously had to be proofed as Vi[s.][n.]u). There’s also a “symbols collection,” including astronomical, zodiac, apothecary, and music symbols. With this collection, if we’re proofreading an astrological book, instead of [Mercury] we can now simply add ☿. Recipe for a bygone apothecary’s potion? Not [**ounce], but ℥. And so on.

So with Unicode, proofreaders know that someone else won’t need to come along and change all the awkward symbols later. Now they can do the whole job and produce a precise digital version of the original page, no matter what characters are on it!

This post was contributed by Neil M., a Distributed Proofreaders volunteer.