Note for those who don’t yet know the e-book creation workflow at Distributed Proofreaders: After a scanned book is turned into editable text, it goes through three rounds of proofreading. The third, P3, is the most challenging, as it requires the most expertise and the closest attention, so sometimes a project has to wait awhile until P3-qualified proofreaders can get to it. The P3 Diehards team has dedicated itself to rescuing projects that are languishing in that round.
I woke up this morning with a minor headache, So the first line of business: There’s coffee to make!
The headache was due to some major proofreading, But I wanted to help move some books to smooth reading.
I found my new passion at DP last year ‘Cause there’s so much to do plus there’s fellowship here.
The challenge was huge, but my spirit was keen, And each day I leaned on my good friend: Caffeine.
Each day I made progress, but slow in my mind. The goal seemed beyond reach; I felt so behind.
This goal was to help in the most needed place Where projects sat languished, forgotten, misplaced.
But one thing that kept me on path to my goal Was seeking that something that feeds mind and soul.
So, first was the hurdle of gaining the level Where trust must be earned and to demonstrate mettle.
I thought I was hopeless to learn any more, But my mentors worked wonders with guidance galore!
Then one day my inbox had news I had hoped for: Clear access to P3; it made my heart soar!
Team Diehards is where I went skipping so quickly To help with those projects abandoned and prickly.
Some projects are tricky or boring or fun, And sharing with teammates is second to none.
The visions from Surgery of so many leeches Are far from the thought of a bowl full of peaches!
The sad Roll of Honour brought tears to my eyes, But the story of bravery and valor survives.
My headache is gone; I give thanks with “Amen.” And tomorrow I can’t wait to do it again!
This poem was contributed by Susan E., a Distributed Proofreaders volunteer. Hot off the Press wishes all its readers a happy and book-filled New Year!
Distributed Proofreaders is 22 years old today, and we’re celebrating our anniversary with a bit of poetry. Congratulations to all the volunteers who have helped in “preserving history one page at a time” all these years!
Which major league contains these teams: United States, Poland, Texas, Yorkshire, Midnight Crew, P3 Archers, Procrastination, and more? Why, Distributed Proofreaders (DP), of course, a “major league” producer of e-books for Project Gutenberg.
Teams? Yes, DP volunteers have the chance to join any of over 400 Teams and “talk” online to other DP users with similar interests in the Team forums.
Let’s look at the range of topics the Teams focus on; the experience of being a Team member; and which Teams have stayed the course and flourished over nearly twenty years.
Types of Teams
We can broadly classify Teams as either “social” or “technical.”
Social Teams bring together those DP users who live in a particular part of the world, use a certain language, or share an interest outside DP.
Technical Teams are linked to some part of the DP process, such as proofreading, formatting, smooth reading, etc.; or to a special skill that may be needed in some of the books we work on, such as processing illustrations. There’s even a Team for DP Bloggers!
How Teams Work
Anyone registered with DP may join up to six Teams. They’re listed in the Team list, which gives a link for you to join any that interest you, or even enables you to start a new Team if you think you’ve identified a topic likely to support one.
Your Team memberships are listed on your profile for other users to see.
Each Team has a page with a mission statement and a link to the Team discussion thread, which appears in the Team Talk area of the DP Forums. You can take part in active Team discussions – which you can do even if you’re not a formal member of a Team – or see if you can revive an old Team with a peppy new post.
DP Teams: Facts
Oldest Teams: The first batch of Teams, established in February 2003, included Nederlands, Northern Virginia, Michigan, Florida, Aussies, Canada, Graduate Students, Finland, and the ironic Team Non-Competitive (“for people who hate the concept of teams and all the competition that goes with it”).
Newest Teams: The Teams to have got off the ground in 2022 are Vancouver and P2+1 (for volunteers who work in the second proofreading round and want to qualify for the third round). The next most recent Team to get started was launched in 2020: Cookbook Lovers.
Team with most members: UK, with 855 members.
Team with most posts: Keep Your Chin Up (“for those who proofread despite having some physical difficulty”), with 29412 posts.
Teams with the wackiest names: • Sgt. Proofers Lonely Hearts Club Team (“For every lonely heart feeling lonely and lost in this E-niversum”). • Distributed Avoirdupois Team (“for people who love proofing but don’t love the pounds that mysteriously attach themselves to a figure seated at a computer”). • No, really, I am not goofing off (“for those who should be doing their real work instead of proofing”).
Teams Then and Now
Teams became a feature of DP in 2003, not long after DP itself was born in October 2000. Since then, 413 Teams have been launched. Naturally not all of these have remained highly active, but those that are still active provide a lively environment for discussion.
Among “social” Teams, most of the active ones have a national or language focus: Western Europe makes the running with Team Germany, en français, Team Italia, Spanish, and Nederlands especially busy. There are also a few other “social” ones that have long been well attended, including Keep Your Chin Up, Proofing with Cats, and Knitters Who Read.
Active “technical” Teams, more closely focussed on DP work, include some centered on stages in the DP process: for instance, Smoooth [sic] Readers; the new P2+1 team described above; and F2 Fanatics, for volunteers working in the second formatting round. Some popular Teams offer specialist advice and assistance to other proofers, such as Music, Illustrators, Index, Turn the Tables, We’ve Got You Covered (designing custom covers for books without an original cover image), and Ad Addicts (some of our books were published with pages of adverts that present complex design issues).
The P3 round – the third stage of proofreading in which volunteers closely check each draft e-book to ensure that it matches the original – has two Teams with posts every day. One is the P3 Archers, who target projects calling for a quick finish. Then there are the P3 Diehards, who in recent months have made remarkable progress in “pushing along those P3 projects that are languishing in the list” – Diehards now usher even the largest and most challenging projects out of P3 within just a couple of months.
So, what place do Teams have within DP now? Do people still want to interact online in this way? After all, since Teams were first thought of, the rise of social media has provided many other opportunities for online socializing. The people who started the first DP Teams back in 2003 might not have predicted the proliferation of Teams and which Teams would be active now. But Teams have evolved to occupy several niches where people do find it satisfying and useful to take part, and the example of the P3 Diehards has shown how Teams can still do a lot for DP’s productivity. DP’s Teams provide both a sense of community and mutual aid in DP’s mission to “preserve history one page at a time.”
This post was contributed by Neil M., a Distributed Proofreaders volunteer.
Distributed Proofreaders volunteers work hard to make the e-books they contribute to Project Gutenberg as user-friendly as possible. Among the things we do to that end is creating e-book cover images to make it easy for readers to find e-books of interest to them.
The role and requirements
Book covers in the digital age have taken a different role. Where in the past covers and dust jackets served to protect and later also advertise the book, they now mainly serve to advertise the book and make it easy to quickly locate it on a computer or e-reader screen. With that changed role, the requirements for book covers have also changed.
In short, the role of a book cover in the digital age is to
Invite a potential reader to give it some attention.
Provide an easy-to-locate icon in e-book readers or computer screens, so it can be found quickly.
Provide a reasonably sized, readable short title and the author’s name, so people can ascertain they have selected the book they want.
Give some impression of the type of content to expect.
All the while considering that a digital cover is now often just the size of a postage stamp.
A short history
Historically, decorated book covers are a relatively new invention. Books started to be sold in neatly designed covers only by the end of the 19th Century, and in some countries even later. Book buyers were expected to provide their own cover and binding, as desired and fitting for their personal library. So the publisher just sold the book as a bound stack of pages with a nondescript paper cover. That is why old libraries often look very uniform, with all those similarly and often richly bound and decorated volumes. (Our 34,000th title contributed to Project Gutenberg was a manual of artistic bookbinding published in 1878.)
Since books are stored in bookcases or cabinets with only their spine visible, the publisher needed only to put identifying information, such as the title, on the spine. The cover could remain boringly neutral, or, as with some ancient bibles, heavily decorated, but there was no need to put a title on them.
Fortunately, many of the originals we work from at Project Gutenberg are late 19th- and early 20th-Century titles, which often do have nice book covers. However, even when the book we are digitizing does have a cover, it is the part of a book (after the spine) that is most likely to suffer from wear and tear, stained with ink and coffee, mutilated by repeated unprofessional repairs, and defaced by libraries who like to put stickers with shelf locations and bar-codes on them. They are also most exposed to sunlight and so end up discolored.
Even then, such covers were designed to be attractive when placed in the book shop’s window, on a table, or when pulled from a shelf by a prospective buyer, so the requirements for large-size titles and author names are quite different from those you’ll need on a postage-stamp-sized digital image.
Challenges
When dealing with book covers, we at Distributed Proofreaders face a number of challenges. It is our intention to reproduce the original book in its full glory, “the book, the whole book, and nothing but the book.” Of course, with the transition to a digital format, we will lose some of the artifacts of the paper medium, such as page headers and page numbers, although we often retain the latter as small notes in the margin. Similarly, book covers will have to be reinvented for our books reincarnated in their digital form.
When preparing a book for Project Gutenberg, we will address these challenges in different ways.
Locating a good quality cover
First of all, we prefer to use an image of the original cover, so if we have one, we can use that as a starting point. In that case, it often requires some digital restoration. But before we invest in the labor-intensive process of restoration, we’ll seek out alternatives. If we don’t have a good quality cover, but have some idea of what it looks like, our first step is an internet search. Surprisingly often, better-quality scans of the same cover can be found, and sometimes those can be used. We need to be sure to pick only scans of a truly matching cover (i.e., same edition and printing), both to avoid a copyright violation, and to maintain the integrity of the e-book edition we’re making. Covers tend to appear in far more variations than the book itself, even within a single print-run.
Digitally restoring a damaged cover
If our search fails to unearth a good-enough cover, we will fire up our photo-editing software to restore what we do have. My personal guidelines in digital restoration is not to try to reconstruct an as-new cover (it would be nice if such a cover is still available), but only to remove mutilations like bar-codes and disfiguring damage, such as scratches and stains. Smaller aspects of wear and tear I will leave as is: it is not a shame to be old and look it. What I will also try to do is brighten up the colors, and restore color balance. Of course this involves a lot of guesswork, but again, if we can find alternative images on-line, even if tiny photographs, they can give us an indication of the original colors if our copy is particularly discolored.
As explained above, the original cover will often not mention the title or author at all. In such cases, to make it easier to recognize a book, we can decide to digitally add the title to the front, — that is, if the original design leaves space for it, which it often does. When adding the title and author, it makes sense to use a typeface matching that of the spine (if known), or the title page. Sometimes we can also use the title from the title page directly, manipulating the color and appearance to blend in with the original cover design.
Then we come to the point where we have no cover to start with at all. The book at hand is in a generic, unmarked cover, or we have none at all, for example when we work from a set of scans produced elsewhere. In that case, we will design a new cover. From here onward I will concentrate mostly on the way I do this, as other volunteers may have different procedures. It may be tempting to go all overboard and design something really fancy, but here I normally try to restrict myself and keep it functional.
One starting point I often use is the scanned cloth pattern of a book’s back to serve as a generic background. I derived a range of color variants from it. I will pick one color, depending on my mood and gut feeling of what would be appropriate for the book, and will add the original title, author, and year of original publication in a centered design. If the book itself includes a suitable illustration, often the frontispiece, I will use that. If not, I will slightly emboss a generic “PGDP” design on it, but won’t use artwork not present in the source, because of the copyright implications that might have. Balancing out the letters takes some puzzling with font-sizes, splitting lines, and letter-spacing, but normally, I am able to produce a reasonable new cover in some 15 to 30 minutes. Not perfect, probably not to everybody’s taste, but better than auto-generated.
I normally use serif typefaces, capital letters, and symmetrical design, because that was the standard in the era most of our books where produced. Asymmetric designs only started to come into vogue after the 1920’s, and thus are inconsistent with most books’ age. I still don’t feel the need to fully emulate an old style cover: I typically use somewhat brighter and larger letters, and prominently place the year of the original copy at the bottom of my design. This should immediately signal to the reader they are dealing with an old book in a new digital cover.
Some things that work less well
An alternative I regularly see is to use the title page as a replacement for a cover. I am not a big fan of that, because title pages are far more similar to each other and often black-and-white, so they lack distinctiveness. Besides that, they often include more detailed information, like the publisher’s name, author credentials, and such, given in a much smaller type. Imagine what it does with your ability to spot the book you’re looking for on a screen filled with postage stamp sized title pages in an e-reader.
Not all books in Project Gutenberg have book covers, so as a gap-stop measure, PG has a system to generate generic covers automatically. The results are not always satisfactory, because the software we use isn’t smart enough to understand what part of the title is most significant and to tweak letter sizes and spacing accordingly to obtain a pleasing result.
Finally, a little searching on some large commercial e-book platforms will reveal a range of newly designed covers for public domain books (the texts for which are often harvested from Project Gutenberg’s offerings in bulk), which range from boring to utterly hilarious: using inappropriate photographs on designs that make serious literary classics, even non-fiction, look worse than cliché Harlequin romances. Such things should not happen at Project Gutenberg, except when we keep the original pulp magazine cover that happens to be equally cringe-worthy, such as this:
Editor’s Note: This post contains some uncommon Unicode characters, some of which may not display properly on older systems.
In May 2020 the Distributed Proofreaders (DP) site moved over to using the Unicode UTF-8 character base. “This is a very major change,” said General Manager Linda Hamilton. “This move has been a long-term objective for many years.” (For more information on this huge improvement, see the DP Wiki article Site conversion to Unicode.) Now, instead of being limited to about 200 assorted letters, numbers, and squiggles, DP has over a million to choose from.
We started modestly, providing a very few extra characters, such as the “œ” ligature often found in older books in words like Œdipus or cœlacanth. Soon though, DP site developers were picking up the pace, providing more and more additional character suites that Project Managers can assign to their projects. We rolled out three character suites for different European languages – with letters like ĝ, Đ, or ł; Basic Greek and Polytonic Greek character suites; and one for characters found in medieval books such as Ƿ or ȳ. In addition, Project Managers can now add individual Unicode characters to a project where they’re needed. These characters, which can include less familiar specimens such as ŧ ꝓ ᴚ ♅ ◘, show up in a “Custom” character suite on the proofreading screen.
All Greek to us
How does this really benefit our work in DP? Mainly, the Unicode-based encoding allows us to support languages that use characters outside the “Latin-1” character suite, which was what we had available before then.
Let’s take one important example: Ancient Greek. Why important? Because many of the books we work on, from the 19th century or before, do contain Greek words or whole passages in Greek. The writers of the time took it for granted that readers of the more scholarly type of book would have learnt Greek (along with Latin) as part of their education.
We reported here a few months ago about how DP handled Grote’s History of Greece, a monumental work with thousands of footnotes containing Greek text. Much of the work there fell to the Post-Processor, the person who prepares a project for final publication after it has all been proofread. With Unicode, the proofreaders can have a share of the fun!
Formerly, the proofreaders didn’t have the use of Greek characters. To represent the text during proofreading, a roundabout process was necessary in which proofreaders produced a transliterated version of the Greek written in our familiar Roman alphabet, like [Greek: mêde nein mêde grammata], to be transformed back into the original Greek – μηδὲ νεῖν μηδὲ γράμματα – by the Post-Processor. But now, the proofreaders can produce a correct text, drawing on a complete set of Greek characters. This is how the relevant part of the proofreading screen looks for a project that includes a Greek character set:
Asking proofreaders to work with Greek letters is also more practicable now that Optical Character Recognition (OCR) software has become better at reading Greek. Take a look at what OCR made of a page of Greek in 2005:
Looks like the book was written in Klingon doesn’t it? Now compare this, from 2020:
Still far from perfect, but good enough that proofreaders don’t have to retype the whole thing from scratch. Using the expanded character set, they can now correct the Greek text coming from OCR, just as they correct text in their own language.
The Mercury goes up
It’s the same with other types of characters. If proofreaders meet an unfamiliar letter in a medieval book, instead of typing [yogh] (for example), they can now input the actual letter ȝ. And we continue to add more character sets to meet the needs of our varied projects. Among recent ones is a set of characters used in Romanized forms of languages such as Arabic, Hebrew, and Sanskrit, so we can reproduce accurate transliterations of names such as ʿAlaʾuddīn or Mahābhārata or Viṣṇu (which previously had to be proofed as Vi[s.][n.]u). There’s also a “symbols collection,” including astronomical, zodiac, apothecary, and music symbols. With this collection, if we’re proofreading an astrological book, instead of [Mercury] we can now simply add ☿. Recipe for a bygone apothecary’s potion? Not [**ounce], but ℥. And so on.
So with Unicode, proofreaders know that someone else won’t need to come along and change all the awkward symbols later. Now they can do the whole job and produce a precise digital version of the original page, no matter what characters are on it!
This post was contributed by Neil M., a Distributed Proofreaders volunteer.