If you're an experienced programmer interested in crunching data, this book will get you started with machine learning--a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation.
Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you'll learn how to analyze sample datasets and write simple machine learning algorithms. "Machine Learning for Hackers" is ideal for programmers from any background, including business, government, and academic research.Develop a naive Bayesian classifier to determine if an email is spam, based only on its textUse linear regression to predict the number of page views for the top 1,000 websitesLearn optimization techniques by attempting to break a simple letter cipherCompare and contrast U.S. Senators statistically, based on their voting recordsBuild a "whom to follow" recommendation system from Twitter data
The book is awesome. I felt like its repetitive in some areas to help a novice programmer to unedrstand how important is the notoin but its a 10/10 book. A must read for being a better coder.
In Machine Learning for Hackers by Drew Conway and John Myles White, the reader is introduced to a number of techniques useful for creating systems that can understand and make use of data. While the book has solid topical material and is written in a fluid and easy to read manner, I don't feel that this book is really for hackers, unless the definition of hacker is vastly different from "programmer".
Much of the text is taken up explaining how to parse strings, change dates, and otherwise munge data into shape to be operated on by statistical functions provided by R. In fact, there is so much of the book in that fashion that I end up skipping through large portions to get back to something that is worth spending time reading about. I can't understand why a programmer would need significant education in string parsing. I was also put off by the vast amount of text explaining basic statistics. Maybe a recent computer science graduate is simply the wrong reader for this book?
I think it is certainly possible to learn the basic principles of machine hacking from this book, and even to put them to good use with R in the same manner displayed in the examples. Indeed, the code and data available for this book would be very useful as prep for an introductory course at an academic institution. To make the best use of the text, you really should be sitting at your computer, reading the text side by side with the code, and operating on the data with R as instructed to do.
Personally, I found that wading through this text wasn't enjoyable it due to the lack of density of material at the depth I was looking for. Other readers may find it is just right for them, but I suspect those readers would not be hackers, contrary to the implication of the title. As best as I can figure, this book would best serve a student scientific researcher who wanted to understand what machine learning was about, and did not have significant prior experience in programming or statistics. Alternatively, if you are significantly distant in years from your time in statistics, or considered learning R one of your goals, this book could work well for you.
If this sounds like you, you can get it from O'Reilly. I wrote this post as part of their O'Reilly Blogger Review program, which is neat.
Perhaps I am biased by my data-centric training (Economics PhD, Physics AB), but even if I had a pure CS background I would dislike this book. Essentially, the authors ignore data and treat it as another computational resource like RAM, flops, etc. The book develops no understanding of how to use the tools or intuition for using ML -- merely to consume canned libraries and follow the bouncing ball. In the first half of the book, the authors fail to provide any of the big picture, organizational wisdom which readers seek and expect in a book like this. Perhaps, I will change my opinion when I have completed the book. However, if you want to learn ML, start with Andrew Ng's Coursera course and then move on to Hastie, Tibshirani, and Friedman or equivalent.
This book was very light on the theory. With each chapter, it gave you a shallow insight into what the algorithm would try to accomplish, the stops involved in shuffling the data around and then the appropriate R library to use.
I realize that the goal of this book was application rather than theory, but for what it's worth, if you've already taken Ng's ML class, you're probably ahead of anything this book will give you. It may give you some tips on R libraries to try, but you can find that out using Google.
Machine Learning for Hackers is not a reference book or a standard programming tutorial on machine learning. For references, you go to Hastie, Tibshirani,and Friedman's 'The Elements of Statistical Learning.' For tutorials, there are a fair number of sources that could walk you through the use of regression, data exploration, classifiers, principle component analysis, etc functions in R. But what MLfH gives you are Drew Conway and John Myles White. And they don't teach skills, they are passing on wisdom of how to work with data, how data needs to be explored, understood, manipulated, and finally, using machine learning methods to gain understanding.
In computer modeling in general and data analysis in particular, one thing that is often hard to convey is that the purpose of computing is not numbers, but insight. The effects of this problem is seen in graduate from even the best schools knowing how to drive a computer program, but not knowing how to interpret results or how to ask a question, then taking the results from that and asking the next question. The course we teach and the texts that we use do not help. Our courses are each siloed to present a distinct portion of the total body of knowledge. Textbooks are often either theoretical or intended to provide a glimpse of application, but always in bounded chunks. Computer application books are often built around the capabilities of program in question, but often stop at the edge of the capabilities of the application or environment in question. What is needed is not to tech methods or tools, but to teach wisdom. The ideal is to be able to sit side by side with an expert who can walk through a data set and ask questions, get answers, and to think about what to do next, whether the answer is what was expected or not.
This is what Conway and White do. For each topic, they open up with a discussion of the problem type and the tools, and sometimes with a toy example. But then they go through a substantive example. And the narrative text is where they shine. They take a messy dataset (often the publicly available/accessible form) and work through what needs to be massaged to get it into useable form. Next is processing the data into the R data type needed for the analysis. Then initial exploratory steps where you gain understanding of problem, and how to analyze it, finally analysis and presentation.
I've been taught that in learning a programming language, it is often beneficial to have two books for reference (other than tutorials), one that is a proper reference (i.e. how to do something), and one a morality reference, how you should approach doing something. In data analysis, you should know the theory/methodology and how to use the tools at hand to apply the methodology, but also how to think about problems. And short of an apprenticeship with a master, MLfH does very well in this.
Disclaimer: I received a free electronic copy of this book from the OReilly Blogger Program. More information on this book can be found at the >book web site.
Title should be "Introduction of Machine Learning with few examples
1. What did I learn from this?
I did learn few new basic concepts & serves as a refresher.
2. Outline of this work
Using R Data Exploration Classification: Spam Filtering Ranking Regression Regularization Optimization PCA - Principal Component Analysis MDS -Multidimensional scaling K-NN Social Graphs Model Comparison
3. What's missing?
In most of the Introductory works, nobody seems to focus on bottom-up approach.
Eg: Take PCA - Principal Component Analysis.
Well, we use it to find the variables that are most dominating the dataset.
How cool is it?
4. So -- How do we create PCA?
Rough example,
Ah!
So most dominating features eh? or say, reduce dimensions.
One can use voodo math symbols to speak formally.
But put that aside, basic idea, most dominating features.
Bottom-up approach:
-We probably need, EigenValue - Why? To find dominating features -We probably need, to find some sort of relationship, Covariance - We probably need to sort them, all - Keep them in some Matrix
Describe this as Vodoo-magic math formal jargon
Voila, Poof, there you go.
5. How not to explain or write?
K-nn in Wikipedia bombards a reader with useless noise in explaining, K-nearest neighbor algorithm.
Actually, this is a machine learning book for coders (hackers is the mighty catchy word for attracting readers).
Instead of using math symbols, the book was written in the developers' language: programming language R. All complex mathematic formulas were transformed into coder-friendly commands and functions. Anyway, the author didn't provide detailed theories but focused on the practical implementation of machine learning through executable source code.
There was a GitHub that provided all R source code in the book. However, some of the codes were outdated due to expired online APIs. The majority of the codes still work well and can be used while reading. Interestingly, there was one programmer who also converted those sources to Python. Although I knew it was the best way to learn, I was too lazy to do the same.
Honestly, I passed a lot of code in the book because of my limit in R language. Therefore, I strongly recommend those interested in this book take the time to learn the basics of R and ML before reading, for better understanding.
I'm actually not sure if the content of the book was good or not back in the day, but this book was published in 2012 and the field of machine learning has changed a lot since then. I bought it without knowing that and was disappointed it when I opened it. I think if you are learning ML, you should probably find more up to date sources of information.
On chapter one as of 8/15/18. This book isn’t for beginners. Some of the codes are not easy to follow. I have to read it over and over again to get it.
I sorta buzzed through this book without sitting at my computer running the code as I went, which is almost certainly not recommended practice. I'm sure I'd get more out of it if I'd "done it right". But here we are.
Anyway, this is a fun enough little book, and does a good job of showing how much you can accomplish in R without many lines of code. Real world case studies, as used, are definitely a good thing for a book like this, even if changing APIs bite them in the butt occasionally. I was already reasonably familiar with R and many of the techniques for machine learning they discussed, so I can't say how good the book is for learning those things from scratch or so. I hadn't heard of (or, at least, don't remember hearing about) "Multidimensional Scaling" (MDS), for looking at clusters, so that was a fun little learning chapter.
Honestly, I probably enjoyed reading Programming Collective Intelligence more than this book, but that may have been because most of the material was new for me for that book, and it used Python.
I know writing books is hard and all, but there were enough typos (in text, code, and even a link), and at least one mis-referenced figure, that I sort of had the feeling this book might have been rushed out the door. And I know the book was purposefully not being too detailed mathematically, but the little description of the Konigsberg problem ruffled my feathers. It's Euler, it's worth doing it justice.
This book didn't have a very contiguous feel, to me. Maybe that's the result of having two authors, I don't know (at some point I figured you could take writing samples from the author and train a model to predict who wrote which chapter). One chapter dealt with dates one way, and then another chapter used the lubridate package. In the chapter just after the built-in dist function was used, a hand-rolled version was written. The depth of coverage of some of the math also seemed inconsistent, like some things were easy so we'll spend a page or more on them, and some things aren't as easy, so don't warrant any description at all. Probably better, in my mind, would have been little clearly-denoted "skippable" mathy sections. Finally, the graphs regularly employed colors, which were generally indistinguishable in the print version - I guess if I'd been running the code as I went, it wouldn't matter.
Anyway, enough criticism. It's a fun book with good examples and it demonstrates the power of R
I've had difficulty with rating this book and here is why:
I found that there is a big discrepancy between how Conway and Myles approached the writing of this book. Myles has a very bottom up style, often times explaining concepts that might not be appropriate for an intermediate to "hacker-level" programmer book. I think parts of the book that have to do with explaining basic regression, mean, mode, and matrix multiplication are just plainly redundant. Hence, the chapters written by Myles are going to appeal to beginner level programmers, who have little knowledge in terms of statistics or programming in R. (I assume that those chapters are written by Myles because the code in the repository for those chapters is denoted with his name).
Conway has a very top-down approach. His style consists in presenting the problem and then throwing a bunch of code at you without explaining much in terms of syntax. The chapters written by him appeal to an intermediate level programmer.
So ye, this book tries to appeal to two audiences, and it is understandable why so many people would complain about two seemingly contradictory things (that the book is too basic or that the book has too much advanced syntax without explanations)
It is unfortunate that they have also not implemented the code in the book in a self-contained package. The book is not very theoretical and instead shows you how to work through a bunch of prediction problems. The issue with non-theoretical or hands-on books is that if your code does not work, then the reader can't follow along with you in trying to complete the projects and instead spends most of the time debugging.
The book is a hands-on review of the main topics of machine learning (ML). All along the chapters, the authors gently walk the reader through examples with real world data, paying particular attention to the more practical aspects of the implementation of the analysis and showing a lot of R code, which becomes a de-facto replacement for the equations one would expect to see in a purely stats book.
This approach has several nice advantages, of which I'll highlight three: first, the use of real world data and stories shows the readily applicability of ML and introduces a way of understanding the world we live in through numbers; second, the non-technical but rather practical approach makes the book much more readable than a purely math-type textbook; and third, you get to learn some cool R tricks and libraries, which is never to be underestimated.
If any, the only downside I would point out is that it falls a bit short in introducing the statistical methods and background behind the algorithms. As a quantitative social scientist, I am used to more depth in explaining the "science" behind practice, and the book tends to skip those explanations as soon as they get a little heavy. I guesss for that there are other references and, as I said, it makes the book much more readable from cover to cover. At any rate, great entry port to the field if you like code more than equations.
While it may be a nice short introduction to machine learning algorithms and R, I found it to be rather shallow even as a first book on the subject: it does make an introduction, but it leaves out a lot of details (why does R syntax work in this way? Must be magic) and encourages the usage of black box functions from various R packages. That's not a problem if the reader could understand the maths behind the models and know when not to apply them blindly (which is obviously, the hardest part in machine learning itself) but the book somehow encourages you to think that machine learning is easy. It is not. And I need another book to learn more. But perhaps that means that Machine Learning for Hackers was successful in hooking me onto the subject. I should run a regression on that.
If you know nothing about machine learning, but need to get something up and working by next Friday, this may help.
This crash course on machine learning with R covers basic visualisation, regression, classification, clustering, dimensionality reduction, SVM and network analysis. No maths nor Greek symbols here, only R code snippets to help you get it working. Move on if you need to understand why this works.
This remains a decent overview of machine learning techniques. While easier but less complete than Pattern Recognition and Machine Learning for instance, it may better suit maths-averse readers.
Like other reviewers mentioned, I read this book without a computer handy to work the exercises, which could have impacted my rating. I found it a great hacker targeted book -- a decent but fast-paced introduction to an approach and algorithm, then an annotated, worked, example. While I am not now a machine learning expert, I feel like solving (or experimenting with) these kinds of problems is now in my grasp. And I have enough vocabulary to drill deeper should I need to.
As someone who has been working with/learning more about R recently, the examples were also helpful for just general R problem-solving with real (messy) data.
Works through everything in easy to follow (as much as it can be) R code.
They have all of the code up in a Github repo so you can follow along with updated code (some changes to ggplot2 and base R break the code in the book).
Tough to go wrong with anything that Drew Conway and John Myles White write.
I'm looking very forward to seeing the next iteration of it (mostly to see and use whatever language they're using by then).
I don't get hate against this book. If you want to read boring theory - there is plenty of books out there. However if you want practical book - there is not much of them. Sure, this book doesn't describe all ML algorithms and assume some R knowledge but what i really liked is detail on how to prepare data. Maybe not for hackers, but still decent PRACTICAL book how to apply some ML stuff to real life problems.
As hackers continue to develop noise generation attacks in an attempt to weaken the automated defense systems and solutions, the use of AI has become more widespread among the hacker community. The first step toward this objective is to gather information and identify unauthorized access by knowing common security exploits. The chances of success depend on the scale of the collected data. That is why hackers collect vast amounts of data to improve social engineering techniques.
Update: I give up. I have no desire to learn R or statistics.
I've put this book on hold until I can pick up an R book. R is really required to work with the examples. For me there is no point reading the book if I'm not going to do the examples; I end up missing out on the "wow" moments that capture my attention and interest. "Seven Languages in Seven Weeks" is a good example of that.
This is a great introductory text for people who do not have a machine learning background and are not interested in immediately diving into the math behind the algorithms. If, like me, you have any background in machine learning or have ever played around with data in R, this might be too light for you.
O'Rilley has once again done it, this time with "Machine Learning for Hackers", a very comprehensive book on using R to do machine learning. One is left with a very clear notion that R, the programming language by statisticians for statisticians, is in fact one of the best possible tools for all researchers in data science.
Mixed feelings, some good example code, but the discussion of important details (that I know about) was wrong in enough places that I don't trust the author's coverage of the topics I am unfamiliar with. Disappointing.
This book was way too superficial and didn't focus enough on the underlying theory and algorithms. It's a good introduction to R, but you should definitely look elsewhere if you're wanting a decent introduction to machine learning concepts.
A great read on the Plane and an introduction to Machine Learning if you are a developer. You hear the term thrown around a lot today especially with programming, big data and algorithms.
A collection of machine learning applications using R. The book intentionally doesn't have much theory which can be considered as both good and bad thing.