Jump to ratings and reviews
Rate this book

Scaling up Machine Learning: Parallel and Distributed Approaches

Rate this book
This book presents an integrated collection of representative approaches for scaling up machine learning and data mining methods on parallel and distributed computing platforms. Demand for parallelizing learning algorithms is highly task-specific: in some settings it is driven by the enormous dataset sizes, in others by model complexity or by real-time performance requirements. Making task-appropriate algorithm and platform choices for large-scale machine learning requires understanding the benefits, trade-offs, and constraints of the available options. Solutions presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters, concurrent programming frameworks including CUDA, MPI, MapReduce, and DryadLINQ, and learning settings (supervised, unsupervised, semi-supervised, and online learning). Extensive coverage of parallelization of boosted trees, SVMs, spectral clustering, belief propagation and other popular learning algorithms and deep dives into several applications make the book equally useful for researchers, students, and practitioners.

492 pages, Hardcover

First published December 30, 2011

4 people are currently reading
104 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
5 (38%)
4 stars
5 (38%)
3 stars
1 (7%)
2 stars
2 (15%)
1 star
0 (0%)
Displaying 1 of 1 review
Profile Image for Carlos.
2 reviews5 followers
August 20, 2012
I found Scaling Up Machine Learning both rich in insight and remarkably coherent given the breadth of its scope and the number of contributors.

The core chapters cover a good selection of algorithms and learning settings. They are written to a very high standard and are terse in their coverage of the base algorithms but expansive with the problematics of adapting and implementing them on various programming frameworks.

The last four chapters cover particular applications in considerable depth, but still with a focus on making efficient use of the frameworks and touching on many general issues. I found them a very worthwhile read even where I wasn't very familiar with the application area.

The initial chapters present the 4 frameworks discussed in the book - MapReduce, DryadLINQ, IBM PML and GPU fine-grained data parallelism. They are introductory but insightful, and lacking previous familiarity with some of the frameworks, they are enough for a fruitful first read of the book.

The introduction is very clear and motivating (why not offer it for download on the book website ?) and set me up for what turned out to be an enjoyable back-to-back read.
Displaying 1 of 1 review

Can't find what you're looking for?

Get help and learn more about the design.