I've been an O'Reilly author since 1996 and received a free copy of The Data Journalism Handbook through O'Reilly's publicity department.
The Data Journalism Handbook, published by O'Reilly Media, is a joint project of the European Journalism Centre and the Open Knowledge Foundation. The editors compiled short submissions from members of the global journalism community to share war stories, lessons learned, and best practices for organizations and individuals who want to transform large data collections into useful information. I started out as a newspaper journalism major at Syracuse University before switching to political science and later pursuing a career as a technical writer, so I have a great deal of sympathy for reporters who have to make sense of computer technologies before they can begin to make sense of the data they contain.
Data journalism is a young field and a bit hard to define, notes contributor Paul Bradshaw of Birmingham City University. "Data" and "journalism" are vague terms, so it's no surprise their combination is somewhat nebulous. The practice certainly includes analyzing large data collections for relevant trends, which might be termed "computer-assisted reporting", but adding the ability of journalists and non-journalists to interact with the data opens up the field considerably. In the canonical recent example, the UK newspaper The Guardian investigated the expenditures of British Members of Parliament after a number of MPs were discovered to have gamed the system. The government provided 450,000 scanned expense reports, which The Guardian had to transfer into analyzable format.
The newspaper opened the data store to its readers and invited them to investigate their own MP's expenses, entering the data into a database along the way. Many readers took them up on the their offer and, through their shared efforts, transcribed the entire collection in very short order. Throughout the process, the newspaper made already-entered records available so visitors to the site could see who was claiming what. This initial effort provided the foundation for the MPs' Expenses portal within the larger Guardian site.
The Data Journalism Handbook starts with a quick overview of the data journalism landscape, but turns very quickly to techniques journalists can use to gather data, convert it into machine-readable format, and present the data to an audience. The contributors' recommend specific software packages and web services for tasks ranging from scraping web sites, converting (and comparing) PDF files, and analyzing and displaying data. They also discuss the softer skills such as integrating teams of computer nerds and reporters, enlisting public participation, and pursuing Freedom of Information requests.
Gray, Bounegru, and Chambers include examples from around the globe, with contributors from, among many other locales, the UK, United States, Finland, Argentina, and Italy. There is some repetition, especially regarding the tools the contributors recommend, but multiple endorsements of a tool reinforce its usefulness.
If you buy The Data Journalism Handbook as an ebook through O'Reilly you'll receive any updates and additions to the book's contents. Even if they produce another edition in a couple of years, which I certainly hope they do, you'll have access to invaluable resources and advice that will put you well on your way to analyzing and reporting on data effectively. The tools and best practices apply equally well to corporate as well as journalistic practices, so I recommend this book without hesitation for everyone interested in discovering and sharing information from large data collections.
Curtis Frye is the editor of Technology and Society Book Reviews. He is the author of more than 30 books, most recently Improspectives; his list includes more than 20 books for Microsoft Press and O'Reilly Media. He has also created over a dozen online training courses for lynda.com. In addition to his writing, Curt is a keynote speaker and entertainer. You can find more information about him at www.curtisfrye.com.