Text Processing in Python

Mertz, David
Addison-Wesley 2003
ISBN 0-321-11254-7
Date finished: 2003-11-28

Text Processing in Python takes its chosen topic and carefully covers every possible aspect of it: the built-in string operations, regular expressions, parsing tools such as mxTextTools and PLY, and, reaching slightly afield, Internet-related parsing tools such as HTMLParser and the email package. (The only significant omission is XML processing, which is a really large field that already has a few Python-specific books of its own.) I learnt a number of things from it, the most notable being exactly how mxTextTools works, and the discussion of parsing will be helpful next time I need to write a parser.

The text mixes sections of reference material with tutorial explanations, and Mertz's style is enjoyable, serious but readable without slipping into either boring formality or annoying jokiness. However, the first chapter's mixture of explaining higher-order functions, along with some reference material on Python's basic data types, is an odd blend and doesn't quite fit into the book's outline. Mertz is fond of functional programming, but this material isn't necessary to do text processing -- personally, I would prefer to write a bunch of iterators and combine them instead of writing complicated list comprehensions -- and I wonder if readers going through the book will get lost in the middle of the chapter and get no farther, but the chapter is easily skipped if you find it unedifying.

In summary: it's a fine book that covers all the aspects of text processing that I can think of, and covers them well. I score it 4 coils out of 5.

Tagged: python

Permalink: http://books.amk.ca/2003/11/Text_Processing_Python.html

%T Text Processing in Python
%K python
%G ISBN 0-321-11254-7
%I Addison-Wesley
%D 2003
%@ 2003-11-28
%P 520pp
%A Mertz, David


Contact me