Machine Learning Causing Science Crisis?

Last week, I tweeted a BBC article AAAS: Machine learning ‘causing science crisis’. (AAAS stands for The American Association for the Advancement of Science. It is the world’s largest general scientific society, with over 120,000 members, and is the publisher of the well-known scientific journal Science.) The terrifying title suggests that ML (machine learning) is doing bad things right under our noses. Browsing through the comments from the reporter’s twitter account @BBCPallab, I found that many labeled it fake news while others contributed the cause of science crisis to human error. The diverging views made me question the credibility of the article.


In the article, Dr Genevera Allen from Rice University warned scientists on the use of ML and presented her research at AAAS in Washington. Since the correspondent did not provide any link to the research, many viewers doubted the validity of the story.

After some internet searches, I found that DR. Allen is, indeed, a professor at Rice University in the Department of Statistics with a PhD in Statistics from Stanford University. The presentation abstract can also be found online.

But the BBC article only tells a biased portion of the story.

Dr. Allen: Problem with ML

According to a news release Can we trust scientific discoveries made using machine learning?” from Rice University, Dr. Allen explains the problem with ML.

Allen said much attention in the ML field has focused on developing predictive models that allow ML to make predictions about future data based on its understanding of data it has studied. “A lot of these techniques are designed to always make a prediction,” she said. “They never come back with ‘I don’t know,’ or ‘I didn’t discover anything,’ because they aren’t made to.”

She continued that uncorroborated data-driven discoveries from recently published ML studies of cancer data are a good example:

“In precision medicine, it’s important to find groups of patients that have genomically similar profiles so you can develop drug therapies that are targeted to the specific genome for their disease,” Allen said. “People have applied machine learning to genomic data from clinical cohorts to find groups, or clusters, of patients with similar genomic profiles.

“But there are cases where discoveries aren’t reproducible; the clusters discovered in one study are completely different than the clusters found in another,” she said. “Why? Because most machine-learning techniques today always say, ‘I found a group.’ Sometimes, it would be far more useful if they said, ‘I think some of these are really grouped together, but I’m uncertain about these others.’”

In essence, the problem with machine learning, according to Allen, is that it’s trained to look for patterns even where none exist. The solution, she suspects, will be in next-generation algorithms that are better able to evaluate how reliable the predictions they make are.

Human Error: Overfitting & Underfitting

The issue in finding pattens even where non exist also connects with another foundamental error: overfitting. Overfitting is a modeling error that occurs when the function/model make an overly complex justification to explain idiosyncrasies in the data.

In the illustration above, circles and crosses represent two different groups. Here the model suggests a trend separating two groups.

In fact, the pattern can be captured using a simpler green-ink line, and the initial model is an example of overfitting. Underfitting is the exact opposite, where the model does not fit the data well enough.

In my opinion, the BBC article utilizes a “clickbait” title for seeking media attention and exaggerating the impact of statistical errors in machine learning. Quoted from one of the twitter comments, “there is not such a thing a wrong technique, only wrong application”. It is also important to keep a critical eye in assessing the impact of novel, and potentially revolutionary technology.

Recommended ML Accounts to Follow:

Martin is a best-selling author for his book Rise Of The Robots,New York Times bestseller in 2015.

Pedro Domingos is the author for The Master Algorithm, a deep, comprehensive guide to machine learning.

Were you ever wondering, who is doing big data and AI research for Facebook? Get insights from a Facebook AI-researcher Soumith Chintala.

Richard Yonck is an AI-researcher, a futurist, an author of the best-seller Heart Of The Machine.

5 thoughts on “Machine Learning Causing Science Crisis?

  1. Such an interesting take on ML and AI in general! It’s funny because my blogpost is also on the potential drawbacks of ML (but on cybersecurity). It seems like finally in the midst of societal worship at AI, more and more people are raising doubts about how it can possibly harm human beings. And this also relates to what Prof. Kane talked about in last week’s class how we are in the period of “positive hype” for AI, once more and more people realize the risk in overusing AI, things might start to change. Great post!


  2. This is an interesting article Cecilia. I had no idea that Machine Learning had such significant modeling errors. That is especially concerning when different studies discovery different clusters in the same data. Similar to our class discussion on AI, it seems to be very important that there are still humans behind these ML technologies to ensure proper usage and application.


  3. This is a really interesting blogpost. As advancements in ML continue, it is imperative to know what is actually going on behind the scenes. As you wrote, ML basically can’t take no or unsure for an answer. The thing is though, that not everyone knows this which is why it could dangerously spread data.


  4. Great post especially your description of overfitting. I think the author of the article raises a good point that these algorithms are built to find connections and so will more often than not report some finding, though I wouldn’t be surprised if we can then test the significance of these findings using standard statistical methods. More concerning for me is the fact that we do not understand the inner workings of these algorithms and so while we may think the machine is exploring one correlation, it could be basing these findings on something completely different. For instance, based off the readings for this week an algorithm finding possible customers for tickets to Vegas targeting individuals with bipolar disorder about to enter the manic phase. ML is incredible but very scary stuff.


  5. This is such an interesting topic! Modern science in general is currently undergoing a replication crisis, so the advent of machine learning just contributes, I think, to an existing problem. I wonder if machine learning algorithms could be designed to look for the absence of patterns, precisely the opposite of the way they’re currently used. That could potentially address both the issue you’ve raised here and help find bias or unconfirmed results in scientific literature as a whole.


Leave a Reply to David Kocen Cancel reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s