MaCro Philosophy
12-May-2019

Hidden Tests and the Syllabus (Part 1)

Each week I will post a development blog about a different aspect of designing, building or running the competition. Last week we looked at the motivations for the competition itself. This week we will introduce the first half of the syllabus and look at some of the issues with designing a competition built around hidden tests.

If you just want information on the syllabus jump to here.

Hidden Tests

In machine learning it is standard practice to split a data set into a training set and a test set. The training set is used for learning and the test set is used to test the results. Because the test set is only used for testing and contains problems not seen during training, success at test time means the system has learned more than just to repeat correct answers. If it passes novel tests, the system is shown to solve the problem it was trained for in a generalisable way.

The tests shouldn't require any new skills to solve than those in the problems in the training set and it shouldn't make a difference if elements of the two had been swapped prior to any training. This is similar to providing students with mock exams. Mocks form a training set meant to be representative of the actual test.

We are doing something a bit different. There is no sensible set from which to create a train-test distinction and it would make a difference if the train and test sets were swapped.

You could consider every possible configuration of our environment to be the data set, and it is true that all the tests belong to this set. However, we have not taken the tests by sampling from every possible configuration. The tests exist in a very small percentage of possible configurations that are believed to test for the kinds of cognitive abilities that biological entities (sometimes) possess. Our tests are not representative of a random problem. We are looking for agents capable of solving problems requiring cognitive abilities, not agents capable of solving random problems.

All this means that we are doing something that might be challenging for standard approaches in machine learning and we therefore need to be careful to make the competition accessible. Along with mock exams, students are also given a syllabus so they know what kind of contents the exams will have and can direct their studies. This is something we can also do here. We hope the syllabus provides a helpful guide to anyone entering the competition.

The Syllabus (Part 1 - Simple tasks)

We only give the first five entries to the syllabus here, the rest will come in future weeks. The first five categories are all very simple, focusing on single elements of the environment. They are prerequisites for the more complex tasks that will be in the later categories. As the categories will be equally weighted it will be important to do well on these and they should also provide everyone a chance to solve some tasks in the competition.

1. Basic Food Retrieval

This category contains the most basic tasks in the competition. There will be no objects in the environment except food (normal size), but there may be multiple food in the same environment. This is important because most animal cognition tests rely on an animal's motivation to acquire food. This is a whole category because retrieving food is an important skill that is used for all future tests.

Allowed objects:

  • Green food (size 1, stationary)
  • Yellow food (size 1, stationary)

Suggested Training:

  • Any food (size 1, stationary) and nothing else
  • justFood.yaml

2. Preferences

This category contains rewards of different values to test agent preferences for achieving higher reward in the environment. Almost all animals will display preferences for more food or easier to reach food although the exact details differ between species. There may be some walls to give variety to the environment but not to provide any navigational challenges.

Allowed objects:

  • Green food (any size, stationary)
  • Yellow food (any size, stationary)
  • Red food (any size, stationary)
  • Walls

Suggested Training:

  • 1-5 green, yellow, red food of any size (stationary).

3. Avoidance Learning

This category introduces the death zones, areas which end the experiment with negative reward if they are touched by the agent. Animals show avoidance responses to aversive stimuli which we hope can be matched here. Agent's capable of avoiding the death zones are especially important as they can then be used to create no-go areas in future tasks.

Allowed objects:

  • Red zone
  • Green food (stationary)
  • Walls (not creating navigational challenges)

Suggested Training:

  • 1 green food (stationary) and 1-2 red zones
  • avoidance.yaml

4. Obstacles

This category contains immovable barriers to the agent moving around. This tests the agent’s ability to navigate its environment and is the first time that the food might not be easily visible. Behaviours tested for include foraging and exploration.

Allowed objects:

  • Green food (stationary)
  • Immovable objects

Suggested Training:

  • 1 green food (stationary) and 1-3 immovable objects
  • obstacles.yaml

5. Moving Targets

The final category introducing basic concepts. Now the food can move and the agent may have to chase it down, or avoid it. Many animals must track down moving food in the wild. The ability for agents to track moving objects is important for many tests.

Allowed objects:

  • Green food
  • Yellow food
  • Red food
  • Death Zones
  • Immovable Objects

Suggested Training:

  • Moving food of different types.

6+ Start testing for more complicated abilities

The rest of the syllabus goes beyond the basics and will be presented here in future weeks.

Summary of the issues with hidden tests

Finally we sum up some of the issues with running this competition and our proposed solutions. We will be monitoring these issues as we go.

Train-Test from same Distribution Hidden Tasks Solution
No “unfair” surprises Potential for “unfair” surprises All problems are built from elements in the playground environment using the same configuration files available to the participants.
Training always useful Training may not be productive We will release information about the categories for testing including sample problems and sample playground configurations.
Easy to know how well you are doing Hard to know how well you are doing We will keep a live leaderboard showing each agent's overall scores. We may also show the breakdown by category or even more detailed information.
Easy to guess algorithmic improvements Hard to know what will lead to success All tasks involve food retrieval so there is a common goal and we expect approaches that can learn robust behaviours to do well. After the competition we will make the details of the tests public for discussion and for use as an ongoing testbed for AI.
Fits into standard paradigms Creates a new paradigm We believe this is a good thing! It is important to be able to create AI capable of solving the kinds of problems that can occur in the real world.

Changelog this week:

Find more information at the competition website or download from the github page.


Back to top