Last week I retweeted an article about DeepMind, and their new project to teach their AI to play a card game called Hanabi that requires theory of mind and more reasoning beyond other card games. I wanted to bring it up in the Twitter discussion, but we decided to table the discussion on AI until this week. However, I couldn’t get this topic out of my mind, and therefore decided to look further into it.
DeepMind and Hanabi
As we know, Google acquired DeepMind in 2014, and DeepMind is focused on progressing artificial intelligence to solve complex problems on its own. One of their recent ventures, is to get their AI to play Hanabi, which involves cooperating with other players, and knowing everyone’s hand but your own. The AI must give effective hints to help the other players succeed, while simultaneously understanding and converting the other players’ hints into useful information. DeepMind hopes that this will better their AI’s ability to cooperate with humans, which this entire game is dependent on. The core aspect that this game requires, theory of mind, requires comprehending others’ mental states and being able to understand that their differ from our own. This quality is obviously very crucial to daily human interaction, and if AI is able to adopt it, can change the quality of human-AI interaction tremendously.
DeepMind’s plan to advance in this mission, is to have the community actively participate in an open-source Hanabi environment. This concept isn’t new to DeepMind, but a common method they use in line with their “culture of collaboration and shared progress” (deepmind.com). DeepMind will release open source code, environments, and data sets to bring in the community to further the progress of their work. In this case, DeepMind released the Hanabi Learning Environment, in which people can interact to create code that will achieve a high score without aid from other AI, and to “test and train” (Wiggers) AI players to play and cooperate with both other AI and humans.
Why Theory of Mind?
This aspect of theory of mind, that Hanabi can help AI develop, is crucial to bridging the gap for AI-human interaction, and appears to be the next step for AI. Children develop theory of mind at approximately the age of four, and this remains crucial to our social interactions. It can create “common sense” for AI, to help it understand people’s needs. We saw in our readings for the week, that a large struggle with machine learning is the “black box” in that we cannot effectively communicate a lot of the information we know, and much of our information is tacit. If we cannot put into words much of our known information, how can we code it into a robot to know the same information. I think this is where the theory of mind can make a difference. Programmers wouldn’t necessarily need to tell the robot what to do, but it would have the “mental” capacity to understand a human’s needs and respond appropriately.
DeepMind’s strategy to pursue development of theory of mind includes a series of neural networks called ToMnet. The three separate networks have different functions: following and learning from past AI’s tendencies, creating an overview of the mindset at a moment, including beliefs and mindset, and then the third predicts the AI’s actions based on the other two neural networks. Here you can see how the AI is learning from itself. Through observations and reflection, the AI can learn to adapt, and can learn new skills. However, this isn’t a formed concept of theory of mind. Hopefully in the process of learning to play Hanabi, the AI can develop stronger understanding of others’ mindsets and differences from their own, to create a more complete theory of mind.
As I mentioned above, ToMnet is DeepMind’s theory of mind AI, that has learned many things from itself, but needs to continue to improve. One experiment it underwent was watching three characters, with different specifications move around a room to grab colored boxes to gain points. The blind character tended to stay towards the walls, while the character that could not remember past runs went to the closest box. Finally, the character that could both see and remember past runs, developed a strategy to gain the most points, that improved with every succession. ToMnet improved to the point that it could differentiate and predict the future moves of each character. The AI is improving upon itself without human programming, but is still lacking in the area of theory of mind.
An important thing to remember is that this example covers how AI is learning to read the “minds” of other computers, not humans. However, these characters simulate what a person with the corresponding qualities would likely do. The missing aspect is understanding human mindsets and how they differ from one another and that of the AI. I think that although it appears these advancements are occurring so quickly, there is still a big step to create this understanding on a human level.
I cannot believe our trip is only two weeks away, and I cannot wait! This research has definitely opened up new questions for me to investigate when we visit Google. See you all Wednesday!