MaCro Philosophy
12-May-2019 (revised 24-May-2019)

Hidden Tests and the Syllabus (Part 1)

The last post looked at the motivations for the competition itself. This week we will introduce the first half of the syllabus and look at some of the issues with designing a competition built around hidden tests.

If you just want information on the syllabus jump to here.

Hidden Tests

In machine learning it is standard practice to split a data set into a training set and a test set. The training set is used for learning and the test set is used to test the results. Because the test set is only used for testing and contains problems not seen during training, success at test time means the system has learned more than just to repeat correct answers. If it passes novel tests, the system is shown to solve the problem it was trained for in a generalisable way.

The tests shouldn't require any new skills to solve than those in the problems in the training set and it shouldn't make a difference if elements of the two had been swapped prior to any training. This is similar to providing students with mock exams. Mocks form a training set meant to be representative of the actual test.

We are doing something a bit different. There is no sensible set from which to create a train-test distinction and it would make a difference if the train and test sets were swapped.

You could consider every possible configuration of our environment to be the data set, and it is true that all the tests belong to this set. However, we have not taken the tests by sampling from every possible configuration. The tests exist in a very small percentage of possible configurations that are believed to test for the kinds of cognitive abilities that biological entities (sometimes) possess. Our tests are not representative of a random problem. We are looking for agents capable of solving problems requiring cognitive abilities, not agents capable of solving random problems.

All this means that we are doing something that might be challenging for standard approaches in machine learning and we therefore need to be careful to make the competition accessible. Along with mock exams, students are also given a syllabus so they know what kind of contents the exams will have and can direct their studies. This is something we can also do here. We hope the syllabus provides a helpful guide to anyone entering the competition.

The Syllabus (Part 1 - Simple tasks)

This post only introduces the first 4 entries in the syllabus. These are the introductory categories, each focusing on single elements of the environment. They are a bit less exciting, but are prerequisites for the more complex tasks that will come later. As the categories will be equally weighted it will be important to do well here as well as in the more complex tests. Next time Murray Shanahan will introduce the first advanced category, Internal Models, which we expect to be incredibly challenging for standard Deep Reinforcement Learning approaches.

Note that all tasks could have external worlds to constrain the space in the arena. This is not mentioned individually in the categories and will only ever be used to change the arena sizes. The suggested training files are meant to be representative of the kind of problems in the category, but do not cover everything. Being able to solve the suggested problems perfectly would not guarantee solving all the category problems, but it would be a very good start.

Basic Categories

1. Food

Most animals are motivated by food and this is exploited in animal cognition tests. The same is true here. Food items provide the only positive reward in the environment and the goal of each test is to get as much food as possible before the time runs out (usually this means just getting 1 piece of food). This introductory category tests the agent's ability to reliably retrieve food and does not contain any obstacles.

Allowed objects:

  • All goals
  • Nothing else

Suggested Basic Training:

  • Just an arena with food in it.
  • See e.g. examples/configs/1-Food.yaml

2. Preferences

This category tests an agent's ability to choose the most rewarding course of action. Almost all animals will display preferences for more food or easier to obtain food, although the exact details differ between species. Some animals possess the ability to make complex decisions about the most rewarding long-term course of action.

Allowed objects:

  • All except zones.

Suggested Basic Training:

  • An arena with different types and sizes of food.
  • See e.g. examples/configs/2-Preferences.yaml

3. Obstacles

This category contains immovable barriers that might impede the agent's navigation. To succeed in this category, the agent may have to explore its environment. Exploration is a key component of animal behaviour. Whilst the more complex tasks involving pushing objects all appear in later categories, the agent must be able push some objects to solve all the tasks here.

Allowed objects:

  • All except zones.

Suggested Basic Training:

  • One food with multiple immovable and immovable objects
  • See e.g. examples/configs/3-Obstacles.yaml

4. Avoidance

This category introduces the hot zones and death zones, areas which give a negative reward if they are touched by the agent. A critical capacity possessed by biological organisms is the ability to avoid negative stimuli. The red zones are our versions of these, creating no-go areas that reset the tests if the agent moves over them. This category of tests identifies an agent’s ability to detect and avoid such negative stimuli.

Allowed objects:

  • At this point all the objects have been introduced and these and future tasks can contain any type of object.

Suggested Basic Training:

  • 1 green food (stationary) and 1-2 red zones
  • See e.g. examples/configs/4-Avoidance.yaml

Summary of the issues with hidden tests

Finally we sum up some of the issues with running this competition and our proposed solutions. We will be monitoring these issues as we go.

Train-Test from same Distribution Hidden Tasks Solution
No “unfair” surprises Potential for “unfair” surprises All problems are built from elements in the playground environment using the same configuration files available to the participants.
Training always useful Training may not be productive We are releasing information about the categories for testing including sample problems and sample playground configurations.
Easy to know how well you are doing Hard to know how well you are doing We will keep a live leaderboard showing each agent's overall scores. We may also show the breakdown by category or even more detailed information.
Easy to guess algorithmic improvements Hard to know what will lead to success All tasks involve food retrieval so there is a common goal and we expect approaches that can learn robust behaviours to do well. After the competition we will make the details of the tests public for discussion and for use as an ongoing testbed for AI.
Fits into standard paradigms Creates a new paradigm We believe this is a good thing! It is important to be able to create AI capable of solving the kinds of problems that can occur in the real world.


Since the last post v0.2 and v0.3 have been released. Highlights include:

Find more information at the competition website or download from the github page.

Back to top