Text is everywhere. Web pages, databases, the contents of files--for almost any programming task you perform, you need to process text. Cut even the most complex text-based tasks down to size and learn how to master regular expressions, scrape information from Web pages, develop reusable utilities to process text in pipelines, and more.
Most information in the world is in text format, and programmers often find themselves needing to make sense of the data hiding within. It might be to convert it from one format to another, or to find out information about the text as a whole, or to extract information from it. But how do you do this efficiently, avoiding labor-intensive, manual work?
Text Processing with Ruby takes a practical approach. You'll learn how to get text into your Ruby programs from the file system and from user input. You'll process delimited files such as CSVs, and write utilities that interact with other programs in text-processing pipelines. Decipher character encoding mysteries, and avoid the pain of jumbled characters and malformed output.
You'll learn to use regular expressions to match, extract, and replace patterns in text. You'll write a parser and learn how to process Web pages to pull out information from even the messiest of HTML.
Before long you'll be able to tackle even the most enormous and entangled text with ease, scything through gigabytes of data and effortlessly extracting the bits that matter.
What You
This book requires a passing familiarity with the Ruby programming language, and assumes that you already have Ruby installed on your computer.
I love recommending this book. It's sopping with practical information you can use to do cool moves in Ruby that have nothing to do with Rails or web. You'll learn how to efficiently digest huge datasets, and leverage features of Ruby that makes it absolutely pleasurable to process data with that often get ignored because Rails is hogging all the attention. Learn how to effectively stream buffered data, use Ruby's screaming regex engine, parse HTML, create streaming parsers, leverage NLP.
Seriously, take a quick look at the table of contents. It delivers.
Excellent overview of the basic ins and outs of text processing using the Ruby language, with nice supplemental coverage of the *nix shell philosophy and utilities.
The book offers a well written overview on all the things you need to take care of when you parse text in Ruby. Definitely a book that can save you hours of trouble.