Like so many of my proofreading colleagues, I never rely on my eye alone. I’m human and my eye sometimes sees what it wants to see rather than what’s there, even when I’m proofreading for clients rather than reading for pleasure.
One of my favourite tools is TextSTAT. Actually, it wasn’t created with the proofreader or editor in mind. Rather, the program was designed to enable users to analyse texts for word frequency and concordance. However, I use it to generate, very quickly, simple alphabetized word lists. Time and again those word lists have flagged up potential problems that I need to check in a proofreading project.
I strip the text from the PDF proof and dump it into a Word file. I remove word breaks from that Word file (using "-^p") so that TextSTAT generates a list of whole words that I can compare, rather than thousands of useless broken words).
Here’s a small sample from a word list I recently generated in TextSTAT. As you can see, there are several possible problems:
(The colour coding is mine and I've provided it for clarity only. TextSTAT's word lists are in plain text.)
Upon checking the actual proofs, some of these issues turned out to be fine. For example:
Some issues had to be queried. For example:
Some issues needed further checking and amending. For example:
When proofreading hard-copy or PDF proofs, would I have spotted these problems with my eye alone? I'm not confident I'd have got everything, particularly the issues with the names of the less well-known cited authors. And if "beginings" had been in point-9 italic text, my eye might have passed over the missing letter.
Where’s the context?
There is no context – that’s the point. When using TextSTAT as a word-list generation tool, we’re just looking at one word and how it compares with words above and below it in our list. We’re not reading phrases; we’re not paying attention to grammar and syntax. It’s just a long list of words in alphabetical order. Later, we can focus on the words in context – TextSTAT’s word lists are just a tiny part of a process that help the proofreader to provide his or her client with a polished proof.
Fast, free and offline
TextSTAT isn’t the only word-list generation tool available for free. However, I love it because it can handle huge chunks of text without glitching – it will quickly generate word lists for books with hundreds of thousands of words (the sample I gave above was taken from a project of over 150,000 words, but I’ve used the program for much larger projects). It’s never crashed on me.
You can download the software to your own computer, so there’s no issue regarding confidentiality. My clients don’t want me to upload their content to third-party browsers without their permission, so when I use a particular proofreading tool to augment my eye, that tool needs to be able to sit offline on my PC.
It costs nothing. Say the creators: “TextSTAT is free software. It may be used free of charge and it may be freely distributed provided the copyright and the contents of all files, including TextSTAT.zip itself, are unmodified. Commercial distribution of the programme is only allowed with permission of the author. Use TextSTAT at your own risk; the author accepts no responsibility whatsoever. The sourcecode version comes with its own license."
Is it worth the effort?
Some might think that an hour or so trawling through a simple word list, and cross-checking any potential problems against hard copy or PDF, is a lot of extra time to build into a proofreading project. I think that time improves the quality of my work and increases my productivity. When I come to the actual reading-in-context stage, I'm confident that some really serious snags have already been attended to. That gives me peace of mind and enables me later to focus on other important issues like the page layout, the sense of the text, and more.
I've found that using this method for dense academic projects has been particularly worthwhile. However, I'll not forget a recent fiction project (a "big name"-authored book that's in its nth edition and was first published over two decades ago) where the main protagonist's name was spelled incorrectly in two places: an easy thing to miss again and again over many years and many proofreads. I caught it – not because my eyes are better than those who came before me, or because I'm a better proofreader than those who came before me, but because I used a simple tool that allowed me to concentrate on just the words.
Want to try TextSTAT?
If you want to give it a spin, it’s available from NEON - NEDERLANDS ONLINE.
Finally, the usual caveat applies: generating word lists as part of proofreading work isn't "the one and only true way". TextSTAT is an example of one tool that I and some of my colleagues utilize to improve the quality of our work. You might utilize different tools and different methods to achieve the same ends. All of which is great!
Louise Harnby is a professional proofreader and copyeditor. She curates The Proofreader's Parlour and is the author of several books on business planning and marketing for editors and proofreaders.
Visit her business website at Louise Harnby | Proofreader, say hello on Twitter at @LouiseHarnby, or connect via Facebook and LinkedIn.
Search the blog
I'm an Advanced Professional Member of the UK's national editorial society.
All text on this blog, The Proofreader's Parlour, and on the other pages of this website (unless indicated otherwise) is in copyright © 2011–17 Louise Harnby. Please do not copy or reproduce any of the content, in whole or part, in any form, unless you ask first.
Author Member of The Alliance of Independent Authors (ALLi). I abide by its Code of Standards in regard to my status as an independent writer.
Advanced Professional Member of the Society for Editors and Proofreaders (SfEP). I'm a signatory to its code of practice as a professional editor.
Featured in The Book Designer's Carnival of the Indies: Joel Friedlander's collection of 'outstanding articles recently posted to blogs'.