Jump to ratings and reviews
Rate this book

Bioinformatics Data Skills: Reproducible and Robust Research with Open Source Tools

Rate this book
This practical book teaches the skills that scientists need for turning large sequencing datasets into reproducible and robust biological findings. Many biologists begin their bioinformatics training by learning scripting languages like Python and R alongside the Unix command line. But there's a huge gap between knowing a few programming languages and being prepared to analyze large amounts of biological data.
Rather than teach bioinformatics as a set of workflows that are likely to change with this rapidly evolving field, this book demsonstrates the practice of bioinformatics through data skills. Rigorous assessment of data quality and of the effectiveness of tools is the foundation of reproducible and robust bioinformatics analysis. Through open source and freely available tools, you'll learn not only how to do bioinformatics, but how to approach problems as a bioinformatician.

538 pages, Paperback

First published July 25, 2014

48 people are currently reading
196 people want to read

About the author

Vince Buffalo

1 book2 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
38 (53%)
4 stars
22 (30%)
3 stars
9 (12%)
2 stars
2 (2%)
1 star
0 (0%)
Displaying 1 - 9 of 9 reviews
Profile Image for Philipp.
687 reviews222 followers
August 27, 2015
This is my new "go to" recommendation for people looking to get into the field. I would not recommend it to biologists or those without any programming experience - for that, it assumes to much of the reader - but for programmers thinking of going into bioinformatics, or bioinformaticians who've dabbled a bit with the shell, this is a gold-mine.

It starts off with good advice like this one:


In teaching bioinformatics, I often share this idea as the Golden Rule of Bioinformatics: NEVER EVER TRUST YOUR TOOLS (OR DATA) This isn’t to make you paranoid that none of bioinformatics can be trusted, or that you must test every available program and parameter on your data. Rather, this is to train you to adopt the same cautious attitude software engineers and bioinformaticians have learned the hard way.


and how to structure and document your projects, then a short intro to using bash, one chapter on git, a bit of ssh, working with data, various useful Unix tools (you know, awk, sort etc.), a VERY dense introduction to R, bioconductor, GenomicRanges, dplyr, ggplot2 (this felt like the bulk of the book, possibly because R is the one thing I had the least experience with from the stuff in the book), an intro to some common file-formats like fasta/fastq/sam/bam and some of their related tools (samtools/bedtools), then a bit of streaming data, and closing with a quick introduction to SQL using SQLite. All of that crammed into ~300 pages! It's a dense book.

Luckily he introduces only the "timeless" stuff, the tools that should still be around in a few years. There's no introduction to read aligners here, for example, these change all the time. The tone is lively and direct and not overcomplicated, but I can't stress this enough, absolute programming beginners will struggle. For those there are other introductory books (like Haddock/Dunn's 'Practical Computing For Biologists').

On the other hand, if you're an advanced bioinformatician somewhere at the end of your PhD or even past that much in this book may bore you; in that case, it's always nice to see how colleagues go about their work. For example, why have I always googled SAM flags like some caveman instead of simply using 'samtools flags'?

Yet this book fills a niche that I've always wanted to see a book in - the relatively tool-agnostic, "how to go about your job in the most efficient way possible" niche.
Profile Image for Alex Ishkin.
8 reviews1 follower
April 18, 2019
This is literally 'must read' for bioinformaticians. It teaches skills which will always be required, whatever software tools and technologies are around in 10 years. I personally didn't learn much new from this book - but that's only because most of the tricks and skills described there were obtained the hard way over 10+ years on the job. For people just wetting their feet in the bioinformatics area this book is a godsend.
Profile Image for Samuel.
49 reviews6 followers
June 11, 2020
Very good book, covering the hands-on craft of bioinformatics that is very seldom if at all covered in most other bioinformatics books, which tend to focus almost entirely on theory, while in practice, there are *lots* of practical, logistical, organizational and other matters, that are very important to get right in order to successfully perform bioinformatics and computation biology these days.

It is quite readable, but not maybe a page turner, which is why I still hesitate to give it a full five stars (4.5, or 9 out of 10 would be fair though). Will see if that changes after finishing it completely though ...
18 reviews
July 11, 2025
Generally very good technical advice and code, though I would’ve liked a larger primer on the theory / scientific foundation of each operation rather than just how to simply run them in a particular program. Overall would recommend to a colleague seeking better UNIX skills for their bioinformatics work.
13 reviews1 follower
August 27, 2021
This book is brilliant. Such a great introduction and overview to the approach and method of day to day bioinformatics. I am a molecular biologist navigating the transition to bioinformatician and this book greatly aided the process. A great starting point for anyone inclined. Thank you Vince!
Profile Image for Aidan Hansen.
5 reviews
April 2, 2025
I will say, I have not completely read this book but have read a good amount of it for my class. This is a phenomenal introduction to bioinformatics and programming.
Profile Image for Sefa.
57 reviews
Read
December 7, 2021
It seems that this book hasn't been published yet. I went through its early release version.

Good advice on doing bioinformatics. Highly recommended for biology people who want to get some experience on computational side. As far as more advanced bioinformaticians concerned, although some chapters are pretty introductory, there is still some useful stuff on doing reproducible and organized research using mostly Unix-based tools.
Displaying 1 - 9 of 9 reviews

Can't find what you're looking for?

Get help and learn more about the design.