Artificial Intelligence : Comparing Children with RL Agents in Unified Environments

Despite recent advances in artificial intelligence (AI) research, human children are still by far the best learners we know of, learning impressive skills like language and high-level reasoning from very little data. Children’s learning is supported by highly efficient, hypothesis-driven exploration: in fact, they explore so well that many machine learning researchers have been inspired to put videos like the one below in their talks to motivate research into exploration methods. However, because applying results from studies in developmental psychology can be difficult, this video is often the extent to which such research actually connects with human cognition.

A time-lapse of a baby playing with toys

Why is directly applying research from developmental psychology to problems in AI so hard? For one, taking inspiration from developmental studies can be difficult because the environments that human children and artificial agents are typically studied in can be very different. Traditionally, reinforcement learning (RL) research takes place in grid-world-like settings or other 2D games, whereas children act in the real world which is rich and 3-dimensional.

Furthermore, comparisons between children and AI agents are difficult to make because the experiments are not controlled and often have an objective mismatch; much of the developmental psychology research with children takes place with children engaged in free exploration, whereas a majority of research in AI is goal-driven. Finally, it can be hard to ‘close the loop’, and not only build agents inspired by children, but learn about human cognition from outcomes in AI research. By studying children and artificial agents in the same, controlled, 3D environment we can potentially alleviate many of these problems above, and ultimately progress research in both AI and cognitive science.

We have developed a platform and framework for directly contrasting agent and child exploration based on KidMind Lab – a first person 3D navigation and puzzle-solving environment originally built for testing agents in mazes with rich visuals.

What do we actually know about how children explore?

The main thing that we know about the child exploration is that children form hypotheses about how the world works, and they engage in exploration to test those hypotheses. For example, studies such as the one from Liz Bonawitz et al., 2007 showed us that pre-schoolers’ exploratory play is affected by the evidence they observe. They conclude that if it seems like there are multiple ways that a toy could work but it’s not clear which one is right (in other words, the evidence is causally confounded) then children engage in hypothesis-driven exploration and will explore the toy for significantly longer than when the dynamics and outcome are simple (in which case they would quickly move on to a new toy).

Stahl and Feigneson et al., 2015 showed us that when babies as young as 11-months are presented with objects that violate physical laws in their environments they will explore them more and even engage in hypothesis-testing behaviours that reflect the particular kind of violation seen. For example, if they see a car floating in the air (as in the video on the left), they find this surprising; subsequently, children then bang the toy on the table to explore how it works. In other words, these violations guide the children’s exploration in a meaningful way.

How do AI agents explore?

Classic work in computer science and AI focused on developing search methods that try to seek out a goal. For example, a depth-first search strategy will continue exploring down a particular path until either the goal or a dead-end is reached. If a dead-end is reached, it will backtrack until the next unexplored path is found and then proceed down that path. However, unlike children’s exploration, methods like these don’t have a notion of exploring more given surprising evidence, gathering information, or testing hypotheses. More recent work in RL has seen the development of other types of exploration algorithms. For example, intrinsic motivation methods provide a bonus for exploring interesting regions, such as those that have not been visited as much previously or those which are surprising. While these seem in principle more similar to children’s exploration, they are typically used more to expose agents to a diverse set of experience during training, rather than to support rapid learning and exploration at decision time.

Author Details

Amit Gaurav

Sr. Director , Routeget Technologies Limited

Amit Gaurav, is a 20-year veteran of the information technology industry, serves as Sr. Director of the MENA and APAC for Routeget Technologies Limited. He is responsible for the overall performance of the company’s operations in the entire APAC, MENA, and Indian subcontinent.

In this role, Amit is responsible for the long-term strategic development and execution of the company’s global operations and engineering efforts. Among his key priorities is ensuring the alignment of core business functions, including corporate financials with global supply chain operations and delivering continuous improvement – Lean – across the operations and engineering functions. Other focus areas include establishing and maintaining the policies and initiatives related to Quality, Health, and Safety.

Amit Gaurav has a wealth of experience in business management, new business acquisition, and account management. His success and extensive experience in Enterprise solutions suite and business development management are power-packed.

A family man, proud father of cutie “Aahana” and a through-and-through Barcelona & CSK supporter, Amit enjoys nothing more than kicking back at the weekend to play games with his daughter.