Last week, I tweeted a BBC article “AAAS: Machine learning ‘causing science crisis’ “. (AAAS stands for The American Association for the Advancement of Science. It is the world’s largest general scientific society, with over 120,000 members, and is the publisher of the well-known scientific journal Science.) The terrifying title suggests that ML (machine learning) is doing bad things right under our noses. Browsing through the comments from the reporter’s twitter account @BBCPallab, I found that many labeled it fake news while others contributed the cause of science crisis to human error. The diverging views made me question the credibility of the article.
In the article, Dr Genevera Allen from Rice University warned scientists on the use of ML and presented her research at AAAS in Washington. Since the correspondent did not provide any link to the research, many viewers doubted the validity of the story.
After some internet searches, I found that DR. Allen is, indeed, a professor at Rice University in the Department of Statistics with a PhD in Statistics from Stanford University. The presentation abstract can also be found online.
But the BBC article only tells a biased portion of the story.
Dr. Allen: Problem with ML
According to a news release “Can we trust scientific discoveries made using machine learning?” from Rice University, Dr. Allen explains the problem with ML.
Allen said much attention in the ML field has focused on developing predictive models that allow ML to make predictions about future data based on its understanding of data it has studied. “A lot of these techniques are designed to always make a prediction,” she said. “They never come back with ‘I don’t know,’ or ‘I didn’t discover anything,’ because they aren’t made to.”
She continued that uncorroborated data-driven discoveries from recently published ML studies of cancer data are a good example:
“In precision medicine, it’s important to find groups of patients that have genomically similar profiles so you can develop drug therapies that are targeted to the specific genome for their disease,” Allen said. “People have applied machine learning to genomic data from clinical cohorts to find groups, or clusters, of patients with similar genomic profiles.
“But there are cases where discoveries aren’t reproducible; the clusters discovered in one study are completely different than the clusters found in another,” she said. “Why? Because most machine-learning techniques today always say, ‘I found a group.’ Sometimes, it would be far more useful if they said, ‘I think some of these are really grouped together, but I’m uncertain about these others.’”
In essence, the problem with machine learning, according to Allen, is that it’s trained to look for patterns even where none exist. The solution, she suspects, will be in next-generation algorithms that are better able to evaluate how reliable the predictions they make are.
Human Error: Overfitting & Underfitting
The issue in finding pattens even where non exist also connects with another foundamental error: overfitting. Overfitting is a modeling error that occurs when the function/model make an overly complex justification to explain idiosyncrasies in the data.
In the illustration above, circles and crosses represent two different groups. Here the model suggests a trend separating two groups.
In fact, the pattern can be captured using a simpler green-ink line, and the initial model is an example of overfitting. Underfitting is the exact opposite, where the model does not fit the data well enough.
In my opinion, the BBC article utilizes a “clickbait” title for seeking media attention and exaggerating the impact of statistical errors in machine learning. Quoted from one of the twitter comments, “there is not such a thing a wrong technique, only wrong application”. It is also important to keep a critical eye in assessing the impact of novel, and potentially revolutionary technology.
Recommended ML Accounts to Follow:
Martin is a best-selling author for his book Rise Of The Robots,New York Times bestseller in 2015.
Pedro Domingos is the author for The Master Algorithm, a deep, comprehensive guide to machine learning.
Were you ever wondering, who is doing big data and AI research for Facebook? Get insights from a Facebook AI-researcher Soumith Chintala.
Richard Yonck is an AI-researcher, a futurist, an author of the best-seller Heart Of The Machine.