MaCro Philosophy
12-May-2019 (revised 24-May-2019)

Hidden Tests and the Syllabus (Part 1)

The last post looked at the motivations for the competition itself. This week we will introduce the first half of the syllabus and look at some of the issues with designing a competition built around hidden tests.

If you just want information on the syllabus jump to here.

Hidden Tests

In machine learning it is standard practice to split a data set into a training set and a test set. The training set is used for learning and the test set is used to test the results. Because the test set is only used for testing and contains problems not seen during training, success at test time means the system has learned more than just to repeat correct answers. If it passes novel tests, the system is shown to solve the problem it was trained for in a generalisable way.

The tests shouldn't require any new skills to solve than those in the problems in the training set and it shouldn't make a difference if elements of the two had been swapped prior to any training. This is similar to providing students with mock exams. Mocks form a training set meant to be representative of the actual test.

We are doing something a bit different. There is no sensible set from which to create a train-test distinction and it would make a difference if the train and test sets were swapped.

You could consider every possible configuration of our environment to be the data set, and it is true that all the tests belong to this set. However, we have not taken the tests by sampling from every possible configuration. The tests exist in a very small percentage of possible configurations that are believed to test for the kinds of cognitive abilities that biological entities (sometimes) possess. Our tests are not representative of a random problem. We are looking for agents capable of solving problems requiring cognitive abilities, not agents capable of solving random problems.

All this means that we are doing something that might be challenging for standard approaches in machine learning and we therefore need to be careful to make the competition accessible. Along with mock exams, students are also given a syllabus so they know what kind of contents the exams will have and can direct their studies. This is something we can also do here. We hope the syllabus provides a helpful guide to anyone entering the competition.

The Syllabus (Part 1 - Simple tasks)

This post only introduces the first 4 entries in the syllabus. These are the introductory categories, each focusing on single elements of the environment. They are a bit less exciting, but are prerequisites for the more complex tasks that will come later. As the categories will be equally weighted it will be important to do well here as well as in the more complex tests. Next time Murray Shanahan will introduce the first advanced category, Internal Models, which we expect to be incredibly challenging for standard Deep Reinforcement Learning approaches.

Note that all tasks could have external worlds to constrain the space in the arena. This is not mentioned individually in the categories and will only ever be used to change the arena sizes. The suggested training files are meant to be representative of the kind of problems in the category, but do not cover everything. Being able to solve the suggested problems perfectly would not guarantee solving all the category problems, but it would be a very good start.

Basic Categories

1. Food

This category contains the most basic tasks in the competition. There will be no objects in the environment except food, but there may be multiple food in the same environment and the food may move around. This is important because most animal cognition tests rely on an animal's motivation to acquire food. Retrieving food is an important skill that is basis of all future tests.

Allowed objects:

  • GoodGoal, GoodGoalMove, GoodGoalBounce
  • GoodGoalMulti, GoodGoalMultiMove GoodGoalMultiBounce

Suggested Basic Training:

  • Any food and nothing else
  • See e.g. configs/justFood.yaml

2. Preferences

This category tests agent preferences for achieving higher reward in the environment. Almost all animals will display preferences for more food or easier to reach food although the exact details differ between species. Certain animals possess the ability to make complex decisions about the most rewarding course of action.

Allowed objects:

  • All Food
  • Walls

Suggested Basic Training:

  • 1-5 green, yellow, red food of any size (stationary).
  • see e.g. configs/preferences.yaml

3. Obstacles

This category contains barriers impeding the ease with which the agent can move around. This tests the agent’s ability to navigate its environment and is the first time that the food might not be easily visible. Behaviours tested for include foraging and exploration. This test also includes movable objects that can be pushed. Whilst the more complex tasks involving pushing objects all appear in later categories, the agent must be able push some objects to solve all the tasks here.

Allowed objects:

  • Food
  • Immovable objects
  • Movable objects

Suggested Basic Training:

  • 1 green food (stationary) and 1-5 immovable or movable objects
  • See e.g. configs/obstacles.yaml

4. Avoidance

This category introduces the pain and death zones, areas which give a negative reward if they are touched by the agent. Animals show avoidance responses to aversive stimuli and the death zones simulate an extreme version of this, also ending the episode. It is especially important that agents are capable of avoiding the death zones as they appear in many future tasks.

Allowed objects:

  • DeathZone
  • PainZone (not yet implemented)
  • All Food
  • Walls

Suggested Basic Training:

  • 1 green food (stationary) and 1-2 red zones
  • See e.g. configs/avoidance.yaml

Summary of the issues with hidden tests

Finally we sum up some of the issues with running this competition and our proposed solutions. We will be monitoring these issues as we go.

Train-Test from same Distribution Hidden Tasks Solution
No “unfair” surprises Potential for “unfair” surprises All problems are built from elements in the playground environment using the same configuration files available to the participants.
Training always useful Training may not be productive We are releasing information about the categories for testing including sample problems and sample playground configurations.
Easy to know how well you are doing Hard to know how well you are doing We will keep a live leaderboard showing each agent's overall scores. We may also show the breakdown by category or even more detailed information.
Easy to guess algorithmic improvements Hard to know what will lead to success All tasks involve food retrieval so there is a common goal and we expect approaches that can learn robust behaviours to do well. After the competition we will make the details of the tests public for discussion and for use as an ongoing testbed for AI.
Fits into standard paradigms Creates a new paradigm We believe this is a good thing! It is important to be able to create AI capable of solving the kinds of problems that can occur in the real world.


Since the last post v0.2 and v0.3 have been released. Highlights include:

Find more information at the competition website or download from the github page.

Back to top