Effective visualization is the best way to communicate information from the increasingly large and complex datasets in the natural and social sciences. But with the increasing power of visualization software today, scientists, engineers, and business analysts often have to navigate a bewildering array of visualization choices and options.
This practical book takes you through many commonly encountered visualization problems, and it provides guidelines on how to turn large datasets into clear and compelling figures. What visualization type is best for the story you want to tell? How do you make informative figures that are visually pleasing? Author Claus O. Wilke teaches you the elements most critical to successful data visualization.
Explore the basic concepts of color as a tool to highlight, distinguish, or represent a value Understand the importance of redundant coding to ensure you provide key information in multiple ways Use the book's visualizations directory, a graphical guide to commonly used types of data visualizations Get extensive examples of good and bad figures Learn how to use figures in a document or report and how employ them effectively to tell a compelling story
“All our thinking is based on stories. We get excited when we hear a good story, and we get bored when the story is bad or when there is none. Moreover, any communication creates a story in the audience’s minds. If we don’t provide a clear story ourselves, then our audience will make one up. In the best-case scenario, the story they make up is reasonably close to our own view of the material presented. However, it can be and often is much worse. The made-up story could be “this is boring,” “the author is wrong,” or “the author is incompetent.”
Good primer for all, but this book is probably most helpful for academia/researchers (particularly for science research) who want to improve their data visualizations.
If you already have some design background, most of the design things in here might seem basic; however, you will be exposed to an approach to data visualizations from the perspective of a science researcher. For example, data replication is very important for research, so the author prefers data visualizations to be easily replicated. You will also be introduced to some new/different types of graphs than you're used to seeing.
If you're coming at this from an academic standpoint (ie: you're a researcher who wants to improve your graphs/data visualizations), then this book should be perfect for you. There are a lot of great examples explaining why a certain graph/visual is bad, why one is ugly, and why one is preferred to the other two examples. You'll also be exposed to general rules for design.
What a strong, well written, detailed, well structured book! The author were very clear in tackling all the strong points and fundamentals as well in data visualisation. the book is devided into three main chapters: *From Data to Visualization *Principles of Figure Design *Miscellaneous Topics Where the purpose is to get the art right without getting the science wrong, and vice versa; if we defined the Data Visualisation as a part of art and science.
Great intro to the fundamentals of data visualization. It's a survey of many visualization techniques covered with great insights about when they works great and what are their pitfalls. I would recommend this book for anyone who regularly work with data and want to be able to understand the data better and make better visualization.
An excellent resource to make better charts and visualizations. In some ways, the book is an implementation guide to the Tufte program but is also focused on the programmatic visualization of data, i.e. how to make good charts with computers. In no way is this of lesser value than Tufte. He has his own ideas of what makes an effective and beautiful chart as well as pointing out pitfalls to avoid.
It is possible to easily recreate all the charts shown in the book as they are all built with an R language add on to the ggplot2 package (cowplots). But no knowledge of R is needed to use this book.
If you need to present some data in a visual format, this should be your first stop. In fact, anyone who is tempted to prepare charts in excel should read this book.
To make sure your figures work for people with cvd, don’t just rely on specific color scales. Instead, test your figures in a cvd simulator.
If there is a clear visual ordering in your data, make sure to match it in the legend. Whenever possible, design your figures so they don’t need a legend.
To visualize several distributions at once, kernel density plots will generally work better than histograms. we can visualize distributions with histograms or density plots. Both of these approaches are highly intuitive and visually appealing. However, as discussed in that chapter, they both share the limitation that the resulting figure depends to a substantial degree on parameters the user has to choose, such as the bin width for histograms and the bandwidth for density plots. As a result, both have to be considered as an interpretation of the data rather than a direct visualization of the data itself. For large dataset, empirical cumulative distribution functions (ecdfs) and quantile–quantile (q-q) plots can be used. https://clauswilke.com/dataviz/ecdf-q...
Violins can be used whenever one would otherwise use a boxplot, and they provide a much more nuanced picture of the data. In particular, violin plots will accurately represent bimodal data whereas a boxplot will not. Before using violins to visualize distributions, verify that you have sufficiently many data points in each group to justify showing the point densities as smooth lines. Because violin plots are derived from density estimates, they have similar shortcomings (Chapter 7). In particular, they can generate the appearance that there is data where none exists, or that the data set is very dense when actually it is quite sparse. Whenever the dataset is too sparse to justify the violin visualization, plotting the raw data as individual points will be possible. Finally, we can combine the best of both worlds by spreading out the dots in proportion to the point density at a given y coordinate. This method, called a sina plot Choropleths work best when the coloring represents a density (i.e., some quantity divided by surface areaThere are two conditions under which we can color-map quantities that are not densities: First, if all the individual areas we color have approximately the same size and shape, then we don’t have to worry about some areas drawing disproportionate attention solely due to their size. Second, if the individual areas we color are relatively small compared to the overall size of the map and if the quantity that color represents changes on a scale larger than the individual colored areas, then again we don’t have to worry about some areas drawing disproportionate attention solely due to their size.
As a rule of thumb, qualitative color scales work best when there are three to five different categories that need to be colored. Once we reach eight to ten different categories or more, the task of matching colors to categories becomes too burdensome to be useful, even if the colors remain sufficiently different to be distinguishable in principle.
Some key rules for table layout are the following:
Do not use vertical lines. Do not use horizontal lines between data rows. (Horizontal lines as separator between the title row and the first data row or as frame for the entire table are fine.) Text columns should be left aligned. Number columns should be right aligned and should use the same number of decimal digits throughout. Columns containing single characters are centered. The header fields are aligned with their data, i.e., the heading for a text column will be left aligned and the heading for a number column will be right aligned.
Finally, there is a key distinction between figures and tables in where the caption is located relative to the display item. For figures, it is customary to place the caption underneath, whereas for tables it is customary to place it above. This caption placement is guided by the way in which readers process figures and tables. For figures, readers tend to first look at the graphical display and then read the caption for context, hence the caption makes sense below the figure. By contrast, tables tend to be processed like text, from top to bottom, and reading the table contents before reading the caption will frequently not be useful. Hence, captions are placed above the table.
Bitmaps or raster graphics store the image as a grid of individual points (called pixels), each with a specified color. By contrast, vector graphics store the geometric arrangement of individual graphical elements in the image. First, because vector graphics are redrawn on the fly by the graphics program with which they are displayed, it can happen that there are differences in how the same graphic looks in two different programs, or on two different computers. Second, for very large and/or complex figures, vector graphics can grow to enormous file sizes and be slow to render. Bitmaps or raster graphics store the image as a grid of individual points (called pixels), each with a specified color. By contrast, vector graphics store the geometric arrangement of individual graphical elements in the image.Even if jpeg artifacts are sufficiently subtle that they are not immediately visible to the naked eye they can cause trouble, for example in print production. Therefore, it is a good idea to avoid the jpeg format whenever possible. In particular, you should avoid it for images containing line drawings or text, as is the case for data visualizations or screen shots. The appropriate format for those images is png or tiff. I use the jpeg format exclusively for photographic images. And if an image contains both photographic elements and line drawings or text, you should still use png or tiff. The worst case scenario with those file formats is that your image files grow large, whereas the worst case scenario with jpeg is that your final product looks ugly.
This entire review has been hidden because of spoilers.
Exactly what the title says and more. Well structured, and combines both, how a good chart looks like as well as when to actually use it and when not. It's mostly to the point, while it also is critical with the topic (again when to use which chart) as well as with itself. I.e., already at the beginning, the author points out that his tips and "good" vs. "bad" examples are purely based on his experiences and therefore his opinion doesn't need to be shared by everyone. Really enjoyed reading it. What I didn't like so much is the IMO unnecessary repetition between text and illustration text.
Excellent, clear examples accompanied by thorough but not condescending explanations of why a particular design choice is appropriate or inappropriate. Topics range from fairly introductory to more advanced visualization issues that even very experienced data people could learn from.
Would have liked it to be easier to cross-reference a particular graph with the code that produced that plot (the code is all available on github, but you have to find it rather than something like a line reference + link included in the text).
This is a great primer. As the title suggests, it constitutes only fundamentals and in some moments it might seem opinionated, but I think it manages to defend well its points, clearly presents advice for newcomers into the field and for veterans can serve as an exercise to see the "bad" visuals and explain what is wrong before reading the explanation. Certainly worth keeping around also thanks to its good organization.
I would recommend this book to both beginners and professionals interested in the field of data visualization. It covers basic and somewhat more advanced stuff like correlograms and paired data which I rarely, if ever, have used.
It contains useful advice about design and storytelling. I liked his non-dogmatic approach to controversial visualizations like pie charts. One of my main takeaways is that it made me rethink some of my favourite minimalistic design preferences.
Sadly, the book is more theoretical than practical. I'd like to see the same content, but with reproducible step-by-step code samples.
The book is suitable for experienced professionals who work with Python / R / JavaScript daily and need guidelines on Data Visualization.
IMO, the book isn't -- at all -- suitable for people who want to start visualizing stuff. You better tinker or do some workshops in Google Colab to develop hands-on experience first.
This is a beautiful little book. It's worth its weight for the directory of visualizations alone. Two days after reading this book I was already making use of: multipanel figures, 2D histograms, phase plots, and loess smoothing. I was also led to think much more clearly about representing uncertainty, and I picked up some best practices about color use and representing 3-4 dimensional data.
reading the first few chapters of this book makes me able to comprehend the way of visualising the data in a better way. I find this book is compelling and brings insights to further visualise data that can help non-technical readers able to get the holistic way of the visualisation. Highly recommended.
perhaps too specific and detailed oriented at times? don't know if this is good or bad. Also I cannot retain or comprehend chapter 11 on visualizing nested proportions and it has left me scarred trying to make sense of those plots.
Um bom livro para quem procura saber mais sobre visualização de dados. Dá uma perspetiva completa e aprofundada sobre o tema, trazendo de volta os básicos, mas também imensa informação nova e não tão trivial.
A bold knowledgeable book about data visualization and goes through details of kinds of charts, good and bad examples, and how to use design elements such as colors, spaces, and size. It contains useful resources and references for more deep dive in story telling and data visualization.
A good intro book / reference for data people. A technology-agnostic text explaining, eg. what graphs to use to show X or general data visualisation / graphic design principles to avoid graphic atrocities.
There are almost as many books teaching data visualization as there are software packages for this purpose. However, these books mostly focus on the technicalities of creating visualizations. In contrast, this book discusses what makes a good figure, irrespective of how it is created. For instance, the author discusses the right choice of colour to highlight certain features of the data, how many dimensions you should include in a figure without overloading it, and which type of figure suits which purpose the best. These practical principles are very clearly explained and richly illustrated with many example visualizations. The book also comes with a companion website so the interested reader can look up the implementation of a particular figure if they wished to do so. I found the book very readable and extremely useful as a reference manual. I think that every scientist, from master student to senior researcher, should have a copy of this book.