Jump to ratings and reviews
Rate this book

Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists

Rate this book

Feature engineering is a crucial step in the machine-learning pipeline, yet this topic is rarely examined on its own. With this practical book, you’ll learn techniques for extracting and transforming features—the numeric representations of raw data—into formats for machine-learning models. Each chapter guides you through a single data problem, such as how to represent text or image data. Together, these examples illustrate the main principles of feature engineering.

Rather than simply teach these principles, authors Alice Zheng and Amanda Casari focus on practical application with exercises throughout the book. The closing chapter brings everything together by tackling a real-world, structured dataset with several feature-engineering techniques. Python packages including numpy, Pandas, Scikit-learn, and Matplotlib are used in code examples.

You’ll examine:

Feature engineering for numeric data: filtering, binning, scaling, log transforms, and power transforms Natural text techniques: bag-of-words, n-grams, and phrase detection Frequency-based filtering and feature scaling for eliminating uninformative features Encoding techniques of categorical variables, including feature hashing and bin-counting Model-based feature engineering with principal component analysis The concept of model stacking, using k-means as a featurization technique Image feature extraction with manual and deep-learning techniques

360 pages, Kindle Edition

Published March 23, 2018

104 people are currently reading
402 people want to read

About the author

Alice Zheng

6 books8 followers
Alice is a technical leader in the field of machine learning. Her experience spans algorithm and platform development and applications. Currently, she is a Senior Manager in Amazon's Ad Platform. Previous roles include Director of Data Science at GraphLab/Dato/Turi, machine learning researcher at Microsoft Research, Redmond, and postdoctoral fellow at Carnegie Mellon University. She received a Ph.D. in Electrical Engineering and Computer science, and B.A. degrees in Computer Science in Mathematics, all from U.C. Berkeley.

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
40 (26%)
4 stars
60 (39%)
3 stars
44 (28%)
2 stars
8 (5%)
1 star
1 (<1%)
Displaying 1 - 14 of 14 reviews
Profile Image for Ahmed.
108 reviews18 followers
April 3, 2025

الكتاب بيتكلم عن فكرة الـ Feature Engineering، يعني إزاي تاخد بيانات زي النصوص أو الأرقام أو الصور، وتظبطها عشان تبقى جاهزة للموديل.

الفصول اللي عجبتني في الكتاب:

* الفصل التالت: بيتكلم عن النصوص، إزاي تحول كلام عادي لـ Bag-of-Words أو n-Grams، وتنضف الكلام من الزيادة زي Stopwords.

* الفصل الرابع: هنا بيخش على tf-idf، ودي طريقة تخلّيك تعرف الكلمات المهمة في النص بدل ما تسيب كل حاجة زي بعضها.

* الفصل التامن: بيدخل في الصور، وإزاي تستخرج منها ميزات زي SIFT و HOG، أو تستخدم الشبكات العصبية زي AlexNet.

سلبيات:

* الفصل الأخير: ده اللي الكتاب مركز عليه، بيتكلم عن إزاي أبني نظام توصية للأوراق الأبحاث حجمه 2.5 جيجا. المفروض إن الفصل ده هو المثال العملي بتاع الكتاب، لكن مكنش مكتمل بشكل نهائي. يعني نتيجة الموديل في النهاية مكنتش كويسة، وكان ممكن يعملوا تعديلات عليه عشان يحسنوا نتيجة التوصيات.

باختصار: الفصل كان مركز جدًا على التجربة خطوة بخطوة (من استيراد البيانات لتعديل الميزات)، لكن ما اداش مساحة كبيرة لتحليل النتايج أو اقتراح حلول مختلفة، لأن نتيجة النموذج مكنتش أفضل حاجة، وتحس إن أهم فصل في الكتاب مكنش مكتمل.

* أحيانًا بيخش في تفاصيل تقنية زيادة (زي PCA في الفصل السادس)، ممكن تدوخ لو مش متظبط في الرياضيات، وفي نفس الوقت كان ممكن يختصر.<

Profile Image for Mahmoud Rabie.
5 reviews6 followers
April 8, 2018
I liked the book a lot, yes the book contains a lot of math but the author tried to always explain in simple words

The book don't focus on giving a tips and tricks or a how to guide for feature engineering, instead it focus on the effect of feature engineering on the data and the models

i.e. the effect of using log transformation on linear regression

As the book size is small - around 200 pages - the book don't cover a lot of topics, but at the end of the book your understanding for feature engineering will be changed and you will get a good deep idea of the impact of feature engineering phase of your model training and performance

Profile Image for Rick Sam.
432 reviews155 followers
January 30, 2020
A Quick read, teaches you basics in Feature Engineering, this might give you raw knowledge about Feature Engineering, meanwhile in Production or Industry, one has to practice or apply based on the the raw-knowledge.

If someone could come up with a best way to gain know-how or procedural knowledge faster, do let me know.

I have a summary of this, if you want do PM me, it might save your time.

Here's an Outline of the Book:

0. Introduction and my thoughts
1. Machine Learning Pipeline
2. Fancy Tricks with Simple Numbers
3. Text Data: Flattening, Filtering and Chunking
4. Effects of Feature Scaling: From Bag of Words to TF-IDF
5. Categorical Variables: Counting Eggs in Age of Robotic Chickens
6. Dimensionality Reduction: Squashing Data Pancake with PCA
7. Non-linear Featurization with K-Means Model Stacking
8. Automating Featurizer: Image Feature Extraction and Deep Learning
9. Back to the Feature: Building an Academic Paper Recommender
10. Linear Modeling and Linear Algebra Basics

I would recommend this to Statisticians, Computer Scientists, PhD researchers, Software Engineers

Deus Vult,
Gottfried
Profile Image for Joe.
445 reviews18 followers
September 20, 2019
Fine start, but there's not a lot here. There are some good tricks for people who don't have a lot of experience building predictive models.

The book also focuses on some specific types of data that lots of people won't need to work with: images and text (as in trying to get a computer to understand text). More general discussion about categorical and numeric variables would have been better, since those ideas can be translated to other contexts more easily.

This book will probably be obsolete within five years or so. I got the sense that the authors and publishers would agree with me. The book looks rushed. Most of the data visualizations are drawn by hand and scanned into the book. That's pretty lazy, since many of them could have been mocked up to look better with any visualization software (even Excel).
Profile Image for Auggie Heschmeyer.
108 reviews5 followers
August 28, 2019
This book is great if you want to know the exact mathematical expressions for the "feature engineering" part of the title, but a total drag if you're looking for anything practical. This book is clearly after a depth not breadth approach, but they go deep down the wrong paths. For instance, the final chapter is billed as a case study. However, rather than engineering the right features to do the task they lay out, they engineer some bad features then slap on some okay ones and say, "There you go. Just do that and that's feature engineering." They don't even give you a way to gauge how accurate their method was.
Profile Image for Chaouki.
77 reviews1 follower
June 24, 2023
I believe, in general ,women write better science books than Men. Male writing is most often rigged with pomp and ego and unrequired complexity. I guess it's because women have a thicker stomach and more patience to handle the irksome task to write a science book. With that said, this book is just my cup of tea , quick loose summary of everything that's necessary, oils up the cogs and softens up the road for further readings. Great book.
One more thing, O'Reilly books rock Hard 🤘
Profile Image for Mehdi.
23 reviews
May 16, 2020
Good book. I liked Chapter 2 5 and 9.

I wish more time was spent on:
* Feature transformation of numerical values
* Feature selection
* Determining which transformation is adequate with which model type (NN, Random Forest, ..)
* The incremental impact of additional features over other selected features
Profile Image for Gene Ishchuk.
235 reviews72 followers
December 23, 2018
it is quite good, maybe not 100% noob-friendly, it is for me to reread again but there is a list of concepts I haven't heard of before
Profile Image for Virajdatt Kohir.
11 reviews1 follower
July 14, 2020
Ummmm, good tricks thought it would help me (intermediate) but most of the things mentioned are for beginners. It did help me with revising things.
Profile Image for Pawin.
55 reviews2 followers
November 1, 2020
Concepts are not fully explained. The book is recommended for the readers who already grasp all basic concepts and just need further examples.
Profile Image for Maria Pantsiou.
12 reviews1 follower
September 20, 2021
Nice introduction to feature engineering, with examples and common scenarios that data scientists encounter daily.
107 reviews4 followers
August 5, 2019
Good for a general review, but too basic (e.g., not explaining the data distribution suitable for an engineering technique) and having some errors (the PCA chapter, although its explanation is good and useful).
Displaying 1 - 14 of 14 reviews

Can't find what you're looking for?

Get help and learn more about the design.