Over the past couple of weeks, I quickly finished the book I signed up to read for this class titled, “Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are”. This New York Times best-seller was written by Seth Stephens-Davidowitz, a New York Times op-ed contributor and a former Google data scientist.
When I first chose this book before the end of last semester, I only fixated on the words “Big Data” in the title because that is where my interests lie. However, I quickly realized that “Everybody Lies” is the part of the title that makes this book so great as it serves as Stephens-Davidowitz’s thesis. Essentially, he argues throughout the book that many conventional sources of data such as surveys are inherently flawed. This is due to the fact that people frequently lie to themselves and others. In response, the author proves Google Search trends and other types of new data provide the truth researchers have been looking for.
While many book introductions can be quite elementary, this introduction was anything but that. Using Google Search data, the author demonstrates that racism is alive and well in the United States. This racism was extremely prevalent following the 2008 Presidential Election and has been sustained through the 2016 election of President Trump. In an eerie geographical comparison, Stephens-Davidowitz shows that areas that supported Trump in the largest numbers were those that made the most Google searches for the n-word. In his discussion, the author makes sure to note that when Obama was elected, some traditional sources spoke of a “post-racial” United States (2). Unfortunately, they couldn’t be farther from the truth.
Since this book was written by a data junkie, the organization is quite logical. The book is broken up into three clearly defined sections. In Part I of the book, Stephens-Davidowitz works to demystify data science. He tries to demonstrate that the best methods of data science are quite intuitive and natural while the results are frequently counterintuitive (33). Our gut gives us a good sense of how the world works, yet we need data to sharpen the picture. This is shown in a long-winded example that proves the assumption that African-American NBA basketball players are mostly from low-socioeconomic upbringings is actually false.
Part II is the “meat” of this book. In this section, Stephens-Davidowitz explicates the four powers of big data through profiling interesting studies performed by data scientists (53-54). They are as follows:
- Big Data offers up new types of data such as words, data from pornographic sites, and much more.
- Big Data provides honest data. No one lies in their Google Search Bar. In fact, many turn to it in times of hardship.
- Big Data allows us to zoom in on small subsets of people. Stephens-Davidowitz puts it best when he says “We can compare say the number of people who dream of cucumbers versus those who dream of tomatoes” (54).
- The last power of Big Data is the ability to conduct many causal experiments. These tests are typically used by businesses but have potential to expand to social science because they go beyond testing for correlation.
In the author’s discussion of data re-imagination, he tells an anecdote that was, in my opinion, the most interesting part of the entire book. This story titled “Bodies as Data” discusses the most recent Triple Crown winner, American Pharaoh (62-71). As some of you may know, the rich love owning racehorses. And, of course, they want their horses to be winners. Traditionally, a horse’s price is determined by its pedigree. Nonetheless, at a 2013 Saratoga Springs horse auction, this methodology was turned on its head by data scientist Jeff Seder. Seder had analyzed every physical aspect of horses and compared the data he collected to past winners. Eventually, he discovered that the size of a horse’s left ventricle was the best determinant of success. Given that American Pharaoh had a left ventricle in the 99.61st percentile, he consulted his client to buy back the horse he had previously owned. The rest is history.
The final section of the book, Part III, is short, yet important. It covers the caveats associated with Big Data and its increasing popularity. Stephens-Davidowitz talks about the “curse of dimensionality” or the luck that ensues with analyzing a lot of variables and only a few observations. One independent variable may seem statistically significant but has not been tested nearly enough times. Also, the ethical issues surrounding Big Data are examined. The author believes it is dangerous for corporations and the government to have this much data because it provides the means for coercion. For that reason, we as a society must keep a watchful eye on the development of this new field of information overload.
If any of you have any interest in Big Data or innovative thinking, I would strongly recommend reading this book. Seth Stephens-Davidowitz has a great, casual writing style that allowed me to read this book so quickly. The author brings in work from a lot of economists and social scientists to legitimize his claims. Most importantly, he remains as objective as possible. Besides for a few remarks about President Trump sprinkled throughout the book, he remains focused on the facts and does not judge others, whether they are white-nationalists or people searching for obscure videos on the internet.
Had I not read this book, I would never have known that Strawberry Frosted Pop Tarts (A childhood breakfast favorite of mine) are the most popular item sold at Walmart before big storms. I also would not have known that when students from similar backgrounds get into both Harvard and Penn State, their choice of school has a very little effect on their incomes in their careers. The former is a nice tidbit. The latter, in my opinion, is groundbreaking. For more “need to know” facts like these, get yourself a copy of this book.