Rate this book

Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists

Name: Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists
Rating: 3.85 (14 reviews)
ISBN: 9781491953204

Alice Zheng, Amanda Casari

Rate this book

Feature engineering is a crucial step in the machine-learning pipeline, yet this topic is rarely examined on its own. With this practical book, you’ll learn techniques for extracting and transforming features—the numeric representations of raw data—into formats for machine-learning models. Each chapter guides you through a single data problem, such as how to represent text or image data. Together, these examples illustrate the main principles of feature engineering.

Rather than simply teach these principles, authors Alice Zheng and Amanda Casari focus on practical application with exercises throughout the book. The closing chapter brings everything together by tackling a real-world, structured dataset with several feature-engineering techniques. Python packages including numpy, Pandas, Scikit-learn, and Matplotlib are used in code examples.

You’ll examine:

Feature engineering for numeric data: filtering, binning, scaling, log transforms, and power transforms Natural text techniques: bag-of-words, n-grams, and phrase detection Frequency-based filtering and feature scaling for eliminating uninformative features Encoding techniques of categorical variables, including feature hashing and bin-counting Model-based feature engineering with principal component analysis The concept of model stacking, using k-means as a featurization technique Image feature extraction with manual and deep-learning techniques

GenresProgrammingComputer ScienceScienceTechnologyArtificial IntelligenceNonfictionTechnical

360 pages, Kindle Edition

Published March 23, 2018

104 people are currently reading

402 people want to read

About the author

Alice Zheng

6 books8 followers

Alice is a technical leader in the field of machine learning. Her experience spans algorithm and platform development and applications. Currently, she is a Senior Manager in Amazon's Ad Platform. Previous roles include Director of Data Science at GraphLab/Dato/Turi, machine learning researcher at Microsoft Research, Redmond, and postdoctoral fellow at Carnegie Mellon University. She received a Ph.D. in Electrical Engineering and Computer science, and B.A. degrees in Computer Science in Mathematics, all from U.C. Berkeley.

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

40 (26%)

4 stars

60 (39%)

3 stars

44 (28%)

2 stars

8 (5%)

1 star

1 (<1%)

Displaying 1 - 14 of 14 reviews

Ahmed

108 reviews18 followers

April 3, 2025

الكتاب بيتكلم عن فكرة الـ Feature Engineering، يعني إزاي تاخد بيانات زي النصوص أو الأرقام أو الصور، وتظبطها عشان تبقى جاهزة للموديل.

الفصول اللي عجبتني في الكتاب:

* الفصل التالت: بيتكلم عن النصوص، إزاي تحول كلام عادي لـ Bag-of-Words أو n-Grams، وتنضف الكلام من الزيادة زي Stopwords.

* الفصل الرابع: هنا بيخش على tf-idf، ودي طريقة تخلّيك تعرف الكلمات المهمة في النص بدل ما تسيب كل حاجة زي بعضها.

* الفصل التامن: بيدخل في الصور، وإزاي تستخرج منها ميزات زي SIFT و HOG، أو تستخدم الشبكات العصبية زي AlexNet.

سلبيات:

* الفصل الأخير: ده اللي الكتاب مركز عليه، بيتكلم عن إزاي أبني نظام توصية للأوراق الأبحاث حجمه 2.5 جيجا. المفروض إن الفصل ده هو المثال العملي بتاع الكتاب، لكن مكنش مكتمل بشكل نهائي. يعني نتيجة الموديل في النهاية مكنتش كويسة، وكان ممكن يعملوا تعديلات عليه عشان يحسنوا نتيجة التوصيات.

باختصار: الفصل كان مركز جدًا على التجربة خطوة بخطوة (من استيراد البيانات لتعديل الميزات)، لكن ما اداش مساحة كبيرة لتحليل النتايج أو اقتراح حلول مختلفة، لأن نتيجة النموذج مكنتش أفضل حاجة، وتحس إن أهم فصل في الكتاب مكنش مكتمل.

* أحيانًا بيخش في تفاصيل تقنية زيادة (زي PCA في الفصل السادس)، ممكن تدوخ لو مش متظبط في الرياضيات، وفي نفس الوقت كان ممكن يختصر.<

Mahmoud Rabie

5 reviews6 followers

April 8, 2018

I liked the book a lot, yes the book contains a lot of math but the author tried to always explain in simple words

The book don't focus on giving a tips and tricks or a how to guide for feature engineering, instead it focus on the effect of feature engineering on the data and the models

i.e. the effect of using log transformation on linear regression

As the book size is small - around 200 pages - the book don't cover a lot of topics, but at the end of the book your understanding for feature engineering will be changed and you will get a good deep idea of the impact of feature engineering phase of your model training and performance

Amir Sarabadani

77 reviews51 followers

April 17, 2019

Most of the graphs are hand-written, that was weird.

best-software-engineering-books

Rick Sam

432 reviews155 followers

January 30, 2020

A Quick read, teaches you basics in Feature Engineering, this might give you raw knowledge about Feature Engineering, meanwhile in Production or Industry, one has to practice or apply based on the the raw-knowledge.

If someone could come up with a best way to gain know-how or procedural knowledge faster, do let me know.

I have a summary of this, if you want do PM me, it might save your time.

Here's an Outline of the Book:

0. Introduction and my thoughts
1. Machine Learning Pipeline
2. Fancy Tricks with Simple Numbers
3. Text Data: Flattening, Filtering and Chunking
4. Effects of Feature Scaling: From Bag of Words to TF-IDF
5. Categorical Variables: Counting Eggs in Age of Robotic Chickens
6. Dimensionality Reduction: Squashing Data Pancake with PCA
7. Non-linear Featurization with K-Means Model Stacking
8. Automating Featurizer: Image Feature Extraction and Deep Learning
9. Back to the Feature: Building an Academic Paper Recommender
10. Linear Modeling and Linear Algebra Basics

I would recommend this to Statisticians, Computer Scientists, PhD researchers, Software Engineers

Deus Vult,
Gottfried

artificial-intelligence computer-science engineering

Joe

445 reviews18 followers

September 20, 2019

Fine start, but there's not a lot here. There are some good tricks for people who don't have a lot of experience building predictive models.

The book also focuses on some specific types of data that lots of people won't need to work with: images and text (as in trying to get a computer to understand text). More general discussion about categorical and numeric variables would have been better, since those ideas can be translated to other contexts more easily.

This book will probably be obsolete within five years or so. I got the sense that the authors and publishers would agree with me. The book looks rushed. Most of the data visualizations are drawn by hand and scanned into the book. That's pretty lazy, since many of them could have been mocked up to look better with any visualization software (even Excel).

computers math nonfiction

Auggie Heschmeyer

108 reviews5 followers

August 28, 2019

This book is great if you want to know the exact mathematical expressions for the "feature engineering" part of the title, but a total drag if you're looking for anything practical. This book is clearly after a depth not breadth approach, but they go deep down the wrong paths. For instance, the final chapter is billed as a case study. However, rather than engineering the right features to do the task they lay out, they engineer some bad features then slap on some okay ones and say, "There you go. Just do that and that's feature engineering." They don't even give you a way to gauge how accurate their method was.

Chaouki

77 reviews1 follower

June 24, 2023

I believe, in general ,women write better science books than Men. Male writing is most often rigged with pomp and ego and unrequired complexity. I guess it's because women have a thicker stomach and more patience to handle the irksome task to write a science book. With that said, this book is just my cup of tea , quick loose summary of everything that's necessary, oils up the cogs and softens up the road for further readings. Great book.
One more thing, O'Reilly books rock Hard 🤘

Mehdi

23 reviews

May 16, 2020

Good book. I liked Chapter 2 5 and 9.

I wish more time was spent on:
* Feature transformation of numerical values
* Feature selection
* Determining which transformation is adequate with which model type (NN, Random Forest, ..)
* The incremental impact of additional features over other selected features

Gene Ishchuk

235 reviews72 followers

December 23, 2018

it is quite good, maybe not 100% noob-friendly, it is for me to reread again but there is a list of concepts I haven't heard of before

Virajdatt Kohir

11 reviews1 follower

July 14, 2020

Ummmm, good tricks thought it would help me (intermediate) but most of the things mentioned are for beginners. It did help me with revising things.

ml-dl-ds

Pawin

55 reviews2 followers

November 1, 2020

Concepts are not fully explained. The book is recommended for the readers who already grasp all basic concepts and just need further examples.

Maria Pantsiou

12 reviews1 follower

September 20, 2021

Nice introduction to feature engineering, with examples and common scenarios that data scientists encounter daily.

THN

107 reviews4 followers

August 5, 2019

Good for a general review, but too basic (e.g., not explaining the data distribution suitable for an engineering technique) and having some errors (the PCA chapter, although its explanation is good and useful).