Big Data? Data Science?
Ever since human entered the era of digital world, astronomical amount of data is pouring into the world every second. At one point last year, I remember, to the student struggling looking for an internship opportunity for his interest, the professor replied, “too much data means no data.” What he meant by this, I think, is just the size of the dataset doesn’t necessarily indicate the usefulness of data. This happening happens to coincide with Seth’s claim that too many data scientists today are cumulating massive sets of data and telling us a very little of importance (21). He says we need the right data and the right question, not a massive body of data.
As the book title suggests, Seth Stephan-Davidwitz is suggesting Internet as alternative (and the right) source of data as to more conventional ones like surveys and polls. The reason is that everybody tends to tell lies when it comes to survey because one wants to feel better about oneself even at the very anonymous situation. He then adopts Google search as his main source of data and analyzes the most random facts to more serious ones. It is fascinating how our everyday activities like Google searches or Facebook activities can be taken seriously and be analyzed so well that many “serious researchers” are now looking into Google data for their research.
Followed by the articulate introduction of why we should care about Big Data, part I demystifies data science. As scary as it sounds, data science can be as simple as Grandmother’s wise words based on her experience. But because our gut feelings or grandmother’s words can be faulty, computer science and analysis come into place in order to correct such faulty situation. Seth took himself as his own example for thinking great NBA players are from low income and single parented family; however, after in-depth analysis with Big Data, his assumption is refuted, proving sometimes intuitions can be misleading.
Power of Big Data
The second part of the book introduces four powers of Big Data– or four reasons we should care about Big Data.
- Offering up new types of data is the first power of Big Data : numerical, linguistic, visual, morphological and many more . . .
- Providing honest data: Google can provide incentives that surveys couldn’t–information one would like to know
- Allowing us to zoom in on small subset of people
- Allowing us to do many causal experiments
My favorite part of the book is when Seth talks about different types of data that can be dissected. As of now, many think of lists of numbers or characters when they hear the word “data” However, not only there are data that are visual, but also linguistic and even morphological. With the example of American Pharaoh, sometimes body structure can be accounted for the success of race horse. Had it not been these data scientists, we still wouldn’t have had any idea how such normal looking race horse, number 85, could become the best horse in decades.
Of all these data types, the one that dragged my attention was visual data. Using the most cutting edge technology, scientists created the picture of “average” Americans featured on yearbooks, and one can see Americans started smiling as the time goes. Who knew people first thought taking pictures is like drawing portraits, and that’s why they kept their stern faces! And the reason they started smiling was merely from marketing, business and Another common obvious example but important one is the GDP is related to how much light there is. In Indonesia, the amount of lights sharply dropped when Asian Financial Crisis happened because they weren’t able to afford electricity.
The Truth About Hate and Prejudice
Another intriguing(and thought-provoking) part is the section where he reveals the truth of hate and prejudice. When media reported the names of two gunmen that are seemingly Muslim, top Google search in California was “kill Muslims.” Yes, we are well aware of Islamophobia and Xenophobia. But during Obama’s speech asking Americans to “not forget that freedom is more powerful than fear,” just about every negative search related to Muslim almost tripled. Vaguely knowing what it hatred is like and seeing the actual number of google search felt way different, the latter being more real. As I mentioned above, Big Data can dig out truth and provide the most honest result. The truth can’t always be like unicorns and rainbows, and negative truths are something that we should pay close attention to.
I would highly recommend this book for everyone, especially those who are new to the field of big data. In the beginning of the book, Seth Stephens-Davidowitz claimed a big statement that everyone is data scientist in a way that data science is not so grandeur as many think, but more natural and “grandma-like.” If everyone is in many ways data scientist, why not dig in and find out what the nuts and bolts of it and how we can utilize such powerful tool? Also, I genuinely enjoyed this book because he gave out so many interesting examples in his straightforward voice. One of examples is the 2016 presidential election, it is truly disappointing to see the substantial evidence that the area that supported Trump had a strong correlation with those who searched the word “nigger” the most. Such results seem devastating to those who thought post-racial era finally came.
I also want to recommend this book to young entrepreneurs and anyone in TechTrek. Many of companies we’ll be visiting are leading ones that acknowledge the power of big data. I’m not completely sure to what extent they incorporate data science to their business, but as the author himself is a data scientist at Google, the increase in the importance of correctly using datasets is for sure.
Lastly, as much as bins of data can provide the tools to dissect the truth of certain people, we now know that, if used correctly, we can change the world with the truth that we discovered. This book truly made me believe in the power of data and data science, Therefore I highly recommend for anyone to read this book!!