Jump to ratings and reviews
Rate this book

Malware Data Science: Attack Detection and Attribution

Rate this book
Malware Data Science explains how to identify, analyze, and classify large-scale malware using machine learning and data visualization.

Security has become a "big data" problem. The growth rate of malware has accelerated to tens of millions of new files per year while our networks generate an ever-larger flood of security-relevant data each day. In order to defend against these advanced attacks, you'll need to know how to think like a data scientist.

In Malware Data Science, security data scientist Joshua Saxe introduces machine learning, statistics, social network analysis, and data visualization, and shows you how to apply these methods to malware detection and analysis.

You'll learn how
- Analyze malware using static analysis
- Observe malware behavior using dynamic analysis
- Identify adversary groups through shared code analysis
- Catch 0-day vulnerabilities by building your own machine learning detector
- Measure malware detector accuracy
- Identify malware campaigns, trends, and relationships through data visualization

Whether you're a malware analyst looking to add skills to your existing arsenal, or a data scientist interested in attack detection and threat intelligence, Malware Data Science will help you stay ahead of the curve.

274 pages, Kindle Edition

Published September 25, 2018

47 people are currently reading
306 people want to read

About the author

Joshua Saxe

3 books1 follower

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
15 (28%)
4 stars
21 (40%)
3 stars
11 (21%)
2 stars
5 (9%)
1 star
0 (0%)
Displaying 1 - 7 of 7 reviews
1 review5 followers
November 28, 2020
Although containing good information, the book feels sometimes focusing much on the machine learning side, teaching ML algorithms and how to implement that with Python, without digging deeper in the malware detection side. This book is great for you if you intend to work on a malware detection project using ML, and want to familiarize yourself on how the process looks like.
Profile Image for Ben Rothke.
348 reviews47 followers
December 31, 2018
The proverb “Give a man a fish and you feed him for a day; teach a man to fish and you feed him for a lifetime,” is known by almost everyone. In Malware Data Science: Attack Detection and Attribution, authors Joshua Saxe and Hillary Sanders artfully show the reader how not only to avoid being a victim of malicious code, but how to actively defend against it, and even build your own systems to do that.

Malware is such a huge issue, that many anti-virus vendors update their signatures hourly to deal with the never-ending set of threats. The authors work at Sophos (Chief Data Scientist and Infrastructure Data Science Team Lead, respectively) and have written a highly technical and effective guide that readers can use to implement their own defensive systems.

At the start, the authors define data science as, “A growing set of algorithmic tools that allow users to understand and make predictions about data using statistics, mathematics, and artful statistical data visualizations.” The book focuses on data science as it applies to malware, which they define as, “executable programs written with malicious intent.”

The book is meant as an introduction to the use of data science to malware analysis and detection. The authors take a broad approach to the topic and discuss static malware analysis, x86 disassembly, dynamic analysis and identifying attack campaigns using malware networks. Later chapters detail how the reader can build machine learning detectors, neural network malware detector, and more.

The book is insightful for all information security professionals in general, but more specifically those who code and can read code, specifically in Python. The authors provide many code and data samples and have included all of them on the book’s web site. There the reader can also find instructions for downloading and running a VirtualBox Ubuntu virtual machine which contains the book's code, data, and all of the requisite dependencies.

The book closes with a chapter on becoming a data scientist, where the authors discuss the paths to becoming a security data scientist and a day in the life of a security data scientist. More importantly, they detail the traits of an effective security data scientist. For those looking to become a security data scientist, or just want to get a comprehensive understanding of how to use data science to deal with malicious software, Malware Data Science: Attack Detection and Attribution is a superb reference to help you get there.
Profile Image for Eran.
295 reviews
May 5, 2019
Since this is a technical book, just some random thoughts while reading it:
- Too shallow. It needlessly goes into unhelpful details, but then doesn't expend on the meaningful details that require more in-depth explanations. From the first few chapters I doubt even a novice would benefit anything, and the book could have began in chapter five with a much shorter preamble before that. There's one place that it gives an "in a nutshell" explanation followed by an "in depth" one, but the in depth practically just repeats what was said before and calls the rest "out of scope of this book".

- Badly written. Especially uses a lot of phrasing which sound foreign and sort of makes me read it with an accent. Things like "First I show you X. Second I show you Y. Then you learn to Z" or "In chapter X you learned to Y and Z, you are now learning to Q".

- It feels more like a first year of uni project that summarises another work, rather than a professional book written by experts in the field.

- The only plus to this book is that towards the end in touches on some interesting topics, and after reading it, it gives you at least the proper terms to be able to talk about it. But then again so would a list of terms that link to wikipedia and other sources, which is mostly what I read while reading this book.

- Took to reading it because it was on Tal's desk. Overall since it didn't take much time, it wasn't a complete waste to read.
Profile Image for Andrew Waite.
48 reviews2 followers
January 21, 2020
Yikes! What can I say? When was the last time I demolished a technical book in one sitting? Don’t think I’ve ever done so.

Ok, so I didn’t read cover to cover in on sitting; took the authors advice and skipped foundational chapters I already knew, and didn’t scan all code samples line by line. BUT the thought processes, use case and implementations discussed throughout covered almost exactly the content I was looking for, and I can’t wait to hit the lab tomorrow to implement what I’ve learned from Malware Data Science.

Feeling MUCH more confident to handle and (crucially) visualise the datasets I’ve been working with recently. Chapters 4 and 9 covering Network graphs and visualisation respectively will be getting heavily referenced over the coming days.

Joshua, Hillary; fingers crossed I bump into you at a conference some time in the near future, I owe you several of whatever your favourite tipple is.
Profile Image for Raphaela.
16 reviews1 follower
June 21, 2023
A malware-flavored machine learning book. If you're a beginner in either area you will have to do some deeper research on the side to understand what is going on, which is expected but more so than usual. For this reason, I did not appreciate the lack of citations and references used throughout the book, especially considering some models and figures were taken from research papers not published by the author (please correct me if I'm wrong!). As a malware researcher, I really wanted to like this book but was overall pretty disappointed. If this ever gets a second edition, please include references at the end of the chapters or the book.
Profile Image for Dancy Xia.
1 review
December 5, 2019
A very good entry level guide book for people who are interested in the security data science area

A lot of examples along with detail explanation made this book a very good guide for those who want to start on security data science work.
Displaying 1 - 7 of 7 reviews

Can't find what you're looking for?

Get help and learn more about the design.