This book serves an introduction to data science, focusing on the skills and principles needed to build systems for collecting, analyzing, and interpreting data. As a discipline, data science sits at the intersection of statistics, computer science, and machine learning, but it is building a distinct heft and character of its own.
In particular, the book stresses the following basic principles as fundamental to becoming a good data scientist: "Valuing Doing the Simple Things Right," laying the groundwork of what really matters in analyzing data; "Developing Mathematical Intuition," so that readers can understand on an intuitive level why these concepts were developed, how they are useful and when they work best, and; "Thinking Like a Computer Scientist, but Acting Like a Statistician," following approaches which come most naturally to computer scientists while maintaining the core values of statistical reasoning. The book does not emphasize any particular language or suite of data analysis tools, but instead provides a high-level discussion of important design principles.
This book covers enough material for an "Introduction to Data Science" course at the undergraduate or early graduate student levels. A full set of lecture slides for teaching this course are available at an associated website, along with data resources for projects and assignments, and online video lectures.
Other Pedagogical features of this book include: "War Stories" offering perspectives on how data science techniques apply in the real world; "False Starts" revealing the subtle reasons why certain approaches fail; "Take-Home Lessons" emphasizing the big-picture concepts to learn from each chapter; "Homework Problems" providing a wide range of exercises for self-study; "Kaggle Challenges" from the online platform Kaggle; examples taken from the data science television show "The Quant Shop," and; concluding notes in each tutorial chapter pointing readers to primary sources and additional references.
A good introductory book to statistical analysisdata mining data science. This is clearly aimed at students - the Coda at its conclusion exhorts the reader to now get a data science job (no thanks, got a real job already), and there is an expectation in the word-frequency discussion that the reader has never encountered the word defenestrate (ha! just last week I had to defenstrate an intruder!).
It's always good to get Skiena's take on things -- I've read three or four of his books now -- and this one is no exception. The statistical-learner stuff is linked more closely to standard CS topics (e.g. algorithmic complexity) than in most other texts, and the overview of linear algebra is really quite good.
The only real downside is that it doesn't do what is says on the tin. Unlike The Algorithm Design Manual, this isn't presented as a taxonomy of data science methods with a briefing of when and how each should be supplied. More's the pity, as that particular book is sorely needed - even in this one, Skiena points out that most researchers become comfortable with one approach and use it for everything, rather than testing alternate approaches on new problems.
Instead, it's a standard Introduction to Data Science textbook with chapters devoted to topics of increasing complexity/sophistication. Well-written, often entertaining, with an excellent selection of exercises (including many Kaggle challenges and some publicly-available datasets - precisely the sort of project that a beginner needs to get their feet wet).
This was a nice read. The war stories are very illuminating, it eases from practice into theory quite nicely and the funny quotes interspersed into the war stories are enjoyable. The examples given were sometimes quite illuminating. Some examples are the intuition of p-values via the concept of permutation tests and the conceptual difference between SVMs and logistic regression (maximising margin between the closest points from each side versus maximizing the total confidence of our classification over all points). Other times, things were supposed to be illuminating, but weren't so much (an example is the duality between points and lines in linear regression). This might have to do with the background knowledge of the reader, of course. Theoretical parts were sometimes hard to follow, because they were described very briefly due to the book's character to be a summary of techniques, instead of a deep dive. An is the sudden jump into the explanation of how eigenvalues can be used for clustering, even though the explanations for clustering were otherwise insightful and simple. I settled on a 4-star rating, because it was a nice book I learned a lot from, but there were bits that felt they could use some more editing so that they can be more easily palatable to the reader and this is what kept me from giving a 5-star one.
I started this book motivated because I learnt I was going to learn something. What happened was that the more I read, the more I hated this book. I think this book is written as notes for a course, and it is only good for that. Most of the information in it can be read in many sites in internet and in many books. Also, it makes more harm than good, many math concepts are treated with a language that is not clear, probably trying to reach a non so math-oriented audience. But, at the same time, the book assumes most things are already know, it is like commenting on things already known. Finally, the language is very US-oriented, where jokes are not funny at all out of the US. I feel really disappointed, as other books from this author are really good. But not this one.
I don't give 1 start only because I bought the hardcover version and the printing quality is incredible good.
Nice war stories and a great chapter about visualizations - this is what is hard to find in other books, and I guess it might be a new read for many computer scientists/programmers. I also appreciate many practical examples. A few other chapters are more or less standard and some of them, for instance, the one about distributed computing, feel like thrown there just to fill the content. Nevertheless, I recommend reading it.
Good textbook introducing (or in my case, refreshing) topics in data science. Readable and designed for a non-math heavy audience, probably for computer scientists who never go beyond a couple calculus courses and discrete math. Sometimes this serves to its detriment, spending multiple pages when one who is versed in mathematical formalism would understand it in one page or less. Nonetheless, a readable and accessible book on the subject.
One of the best resources at the time to go into the matter of Data Science. Highly recommended, as any of the S. Skienna books (which he also recommends reading for a better understanding of some topics).
I used this for my masters thesis and it really helped with all the tasks and methods used in data science. I do wish there was a little more about verification and validation, but I found the rest of the book very useful.
Not good. I find that many math concepts presented in the book are somewhat discrete and unclear. Also, the jokes in this book are not funny, really :). For example: "... The theory of linear algebra works except when it doesn’t work...".