Table of contents

  1. Overview
    1. Phase 1, phase 2
    2. Why two phases?
  2. Seed database
    1. Construction
    2. Download
    3. Format
    4. API
  3. Deciders
    1. Definition
    2. Undecided machines index file
  4. Reproducibility and verifiability statement

Overview

With the Busy Beaver Challenge we want to decide the halting problem of all 5-state Turing machines (from all-0 tape). That way we will learn BB(5), the 5th busy beaver value. See Story.

In order to achieve this goal we need to analyse the behavior of every single 5-state Turing machine. We quickly run into a problem: there are roughly 16 trillion 5-state Turing machines ( 211021^{10} to be exact).

Thankfully most of this space is not useful to us and only a fraction needs to be studied in order to find BB(5). This is for instance because there are 4!=244! = 24 ways to permute the states (aside from the start state) of a machine and 2 ways to permute the move directions which does not change behavior hence only one of these 48 machines needs to be studied^[1. These 48 machines are said to be isomorphic.].

Hence, we aim at sparsely enumerating the space of 5-state Turing machines: that is trying to enumerate the least amount of machines that are necessary to study in order to find BB(5).

Phase 1, phase 2

The method that we present to sparsely enumerate the space of 5-state Turing machines and analyse their behavior is fundamentally inspired by [Marxen and Buntrock, 1990] with some notable differences that we will outline. The first difference is that our method is divided into two successive and independent phases:

  1. Phase 1: seed database. Sparsely enumerate the space of 5-state Turing machines and mark as undecided any machine that exceeded the set time or space limits. Undecided machines are put in the seed database which seeds the Busy Beaver Challenge.

  2. Phase 2: deciders. Write independent deciders, i.e. programs that will decide the behavior of families of machines in the seed database. We aim to classify these families in the zoology and to come up with deciders for each family.

Phase 1 was completed in December 2021:

  • it enumerated 125,479,953 Turing machines in 30 hours^[2. Splitting the task among several computers in parallel.]. See these metrics for more.
  • it marked 88,664,064 machines as undecided and they are stored in the seed database. We refer to undecided 5-state machines thanks to their index in the seed database (e.g. Machine #7,410,754).

Although Phase 1 of the project was completed, it needs to be reproduced independently in order to confirm its results and increase trust. See Contribute.

Phase 2 started in January 2022 and you are invited to write your own deciders for the remaining (or yet-unknown) families and to reproduce or verify existing ones! See Contribute.

Currently applied deciders

Currently applied deciders are listed on the forum and you are invited to Contribute.

Why two phases?

Phase 2’s deciders could be integrated into the enumeration algorithm of phase 1 in order to mark a lot less than 88,664,064 machines as undecided to begin with. This is the approach taken by [Marxen and Buntrock, 1990].

However we strongly advocate for our model where responsibility is split into two independent phases, that is because:

  1. Splitting responsibilities yields shorter and easier to verify/test/debug code for both phases. In particular, it is very important to establish trust in the seed database of undecided machines hence the simpler the code that generates it the better.

  2. Some deciders require a lot more resources than others and might only be relevant to a very small and targetted subset of machines. Hence we don’t want to execute them on all enumerated machines which would considerably slow down the enumeration process.

Our approach provides modularity and hopefully facilitates reproducibility, peer reviewing, and trust in the final result.

Seed database

The Busy Beaver Challenge is based on a downloadable seed database of 88,664,064 undecided 5-state machines which was constructed during phase 1 of the project, completed in December 2021. You are more than invited to reproduce this result, see Contribute.

The code to construct the seed database is available at https://github.com/bbchallenge/bbchallenge-seed. This code is open source and was built with readibility and concision in mind: it “only” consists of 675 lines of Go^[3. Go is ideal for lightweight parallelisation which is very useful in this case.] and 105 lines of C and is unit tested. See our reproducibility and verifiability statement.

This is to be compared to the unpublished ≈8000 lines of C reported by [Marxen and Buntrock, 1990] or the ≈6000 uncommented lines of Pascal of https://skelet.ludost.net/bb/nreg.html and justifies our clear separation between phase 1 and phase 2 in this project.

The main aim of the Busy Beaver Challenge is to decide every machine in the seed database.

Construction

The algorithm that we implement to sparsely enumerate the space of 5-state Turing machines is a variation of [Marxen and Buntrock, 1990] but the core idea is the same.

The algorithm recursively constructs the tree of 5-state Turing machines starting from the following common ancestor^[4. By symmetry this common ancestor can use tape movement R and go to state B without loss of generality (if it goes to state A, that machine will obviously never halt). Writing a 1 is motivated by the fact that for any machine which writes a 0 first, we can simulate that machine forward until it writes it’s first 1 and then permute the states so that that state (which wrote the first 1) is the start state. This new machine will have the exact same behavior as the original machine (except that it runs less than 4 steps less). A post-analysis of all machines can find all examples like this.]:

0 1
A 1RB ---
B --- ---
C --- ---
D --- ---
E --- ---

Each machine (i.e. node of the tree) is simulated until either:

  1. time or space limits are met
  2. the machine exceeds BB(4) = 107 time steps while having visited only 4 states out of 5
  3. an undefined transition is met

In case 1. the machine is marked as undecided and is inserted in the seed database. Note that introducing the idea of a space limit is novel compared to [Marxen and Buntrock, 1990]. We conjecture that BB_SPACE(5) = 12,289.

In case 2. the machine is marked as non-halting, see Story for more details on BB(4).

In case 3. there are naïvely 225=202*2*5=20 choices for the undefined transition that was encountered. This number of choices is reduced by imposing an order on the set of states as this allows not to visit machines that are the same up to renaming of the states (isomorphic machines). Further pruning methods are implemented to discard redundant machines. The algorithm is then applied recursively to the machines equipped of their new transition.

Complete pseudo-code and details of the construction are available on the forum.

Thanks to (a) using a space limit, (b) using low level code for the simulation algorithm and (c) using 2021’s computers we do not need to burden the algorithm’s code with simulation speed-ups as in [Marxen and Buntrock, 1990].

Time and space limits

During the enumeration algorithm we need a criterion to stop simulating machines that have been running for too long and mark them as undecided. We use the conjectured value of BB(5) = 47,176,870 steps as a cut-off time limit [Aaronson, 2020] [Marxen and Buntrock, 1990].

We introduce the idea of a space limit. Indeed the busy beaver value is traditionally concerned with time only. But we can also ask an analogous question about space: “what is the maximum number of memory cells that a 5-state machine can visit before halting?“.

BB_SPACE

We define BB_SPACE:

BB_SPACE(n) =
Maximum number of memory cells visited by a halting Turing machine with n states starting from all-0 memory tape

Note that BB_SPACE is not Rado’s Σ\Sigma function^[5. Rado’s Σ(n)\Sigma(n) is the maximum number of 1s on the final tape of a n-state halting Turing machine from all-0 tape. It does not provide a convenient space cut-off as the number of 1s on the final tape is not necessarily the maximum number of 1s seen in the execution of the machine.][Rado, 1962]](https://cs.famaf.unc.edu.ar/~hoffmann/cc18/Rado-On-non-computable.pdf).

We conjecture:

BB_SPACE(5) = 12,289

Which is the number of memory cells visited by the 5-state busy beaver time champion.

It turns that BB_SPACE(5) is a much more practical cut-off to use in the enumeration algorithm than BB(5) as many more machines will visit more than 12,289 memory cells before they exceed 47,176,870 time steps.

Note that if our conjecture is false, i.e. if BB_SPACE(5) > 12,289, the true BB_SPACE winner is necessarily in the seed database and should hopefully be discovered through the effort of deciding the database. Same if BB(5) > 47,176,870.

Metrics

The enumeration algorithm was run in December 2021 and here are some metrics about the enumerated space of 5-state Turing machines:

# machines # machines # machines
total halting (undefined transition met) 34,104,723
total non-halting (using BB(4)) 2,711,178
total pruned 944 579
total undecided (time limit) 14,322,029
total undecided (space limit) 74,342,035
total undecided 88,664,064
total enumerated 125,479,989

Download

Mirrors

The seed database of 88,664,064 undecided 5-state machines is available for download at:

The zipped database is 243M and approx 2G unzipped, each machine is encoded on 30 bytes and the first 30 bytes consist of a reserved header, see format.

Database shasum:

  1. zipped: 2576b647185063db2aa3dc2f5622908e99f3cd40
  2. unzipped: e57063afefd900fa629cfefb40731fd083d90b5e

You are welcome to host the database on your own mirror (as long as preserving shasum), see

Contribute.

Format

The database is a binary file where each machine is described on 30 bytes. It starts with a 30-byte reserved header which currently contains the following information (first 13 bytes):

  1. 14,322,029: number of undecided machines that exceeded the 47M steps time limit. 4-byte big endian integer
  2. 74,342,035: number of undecided machines that exceeded the 12k cells space limit. 4-byte big endian integer
  3. 88,664,064: total number of undecided machines. 4-byte big endian integer
  4. 1: a boolean indicating that the machines were lexicographically sorted. The first 14,322,029 undecided machines (47M time limit exceeded) were lexicographically sorted independently of the next 74,342,035 undecided machines (12k space limit exceeded). 1-byte boolean

Then, each machine is encoded on 30 bytes. First come the 14,322,029 machines that exceeded the time limit and then the 74,342,035 machines that exceeded the space limit, see time and space limits. These two sets of machines are both lexicographically sorted.

The 30-byte encoding of a 5-state Turing machine is better understood with an example, for instance with machine #7,103,458 of the databse:

0 1
A 1RB 0LD
B 0LC 1LE
C 1LD 1LC
D 0RA ---
E 1RB 1RE

The machine is encoded using the following 30-byte array, with R=0 and L=1:

[1,R,2, 0,L,4,
 0,L,3, 1,L,5,
 1,L,4, 1,L,3,
 0,R,1, 0,0,0,
 1,R,2, 1,R,5]

Note that states are indexed starting at A=1 as the state value 0 is used to encode undefined transitions. Write and direction bytes of undefined transitions are set to 0 as well.

Use the database

Here are some routines that you can use to extract machines from the database:

Python

def get_header(machine_db_path):
  with open(machine_db_path, "rb") as f:
    return f.read(30)

def get_machine_i(machine_db_path, i, db_has_header=True):
  with open(machine_db_path, "rb") as f:
    c = 1 if db_has_header else 0
    f.seek(30*(i+c))
    return f.read(30)

More Python utils at https://github.com/bbchallenge/bbchallenge-py/

Go

type TM [30]byte

func GetMachineI(db []byte, i int, hasHeader bool) (tm TM, err error) {
  if i < 0 || i > len(db)/30 {
    err := errors.New("invalid db index")
    return tm, err
  }

  offset := 0
  if hasHeader {
    offset = 1
  }

  copy(tm[:], db[30*(i+offset):30*(i+offset+1)])
  return tm, nil
}

More go utils at https://github.com/bbchallenge/bbchallenge-go/

API

You can also query the database through the following API:

GET https://api.bbchallenge.org/machine/<machine_id>

For instance, https://api.bbchallenge.org/machine/12345678 will return:

{
	"machine_code": "1RB1LC_1RC1RC_1LB1RD_1LA1LE_---0RA",
	"machine_id": 12345678,
	"status": "decided"
}
  • The field “machine_code” is the string representation of the machine.

  • The field “machine_id” is the ID that you queried.

  • The field “status” keeps track of the deciders that have been applied to the database, the goal is for all machines to become “decided”. For instance, this machine was decided by the decider for Cyclers.

The goal is for all the machines of the database to eventually be decided by Deciders.

Deciders

Definition

A decider is a program that outputs true if it is able to tell whether a given machine halts or not. Deciders are applied to the machines of the seed database in order to reduce the number of undecided machines from 88,664,064 to 0.

We expect that almost all machines of the seed database do not halt hence deciders are primarily focused on deciding that machines do not halt.

To be trusted, a decider should be accompanied with a proof of correctness which certifies that the machines that it recognises do not halt. The decider’s code should also be tested on a significant number of examples and counterexamples machines, see Contribute.

Deciders are closely related to the zoology of 5-state machines as we aim to decide each family of the zoo. For instance:

Deciders are not necessarily directly connected to a family of the zoology, a good example of this case is the decider for Backward Reasoning a notion developed in [Marxen and Buntrock, 1990].

Writing, testing and proving deciders is a collaborative task, see the decider section of the forum, and you are invited to Contribute.

Trusted deciders are applied to the seed database.

Undecided machines index file

Download the index file containing all the indices in the seed database of the currently undecided machines (i.e. machines that remain undecided after all trusted deciders are applied).

Machines’ indices are stored in order as 4-byte big endian integers. Here are some routines to extract these indices from an index file:

Python

def get_indices_from_index_file(index_file_path):
  index_file_size = os.path.getsize(index_file_path)

  machines_indices = []
  with open(index_file_path, "rb") as f:
    for i in range(index_file_size//4):
      chunk = f.read(4)
      machines_indices.append(int.from_bytes(chunk, byteorder="big"))

  return machines_indices

Go

func GetIndicesFromIndexFile(indexFilePath string) (
  machinesIndices []uint32, err error) {

  var rawIndex []byte
  rawIndex, err = ioutil.ReadFile(indexFilePath)

  if err != nil {
    return machinesIndices, err
  }

  for i := 0; i < len(rawIndex)/4; i += 4 {
    machinesIndices = append(
    machinesIndices, binary.BigEndian.Uint32(rawIndex[i:i+4]))
  }

  return machinesIndices, err
}

Reproducibility and verifiability statement

Any result coming from the Busy Beaver Challenge will be fundamentally based on the numerous programs involved in the project such as the seed database’s generating code or the deciders.

Because programs can contain bugs (and often do), such computer-based results tend to struggle gaining trust among the scientific community, where the gold standard is mathematical proof in a peer-reviewed publication.

Because we aim to achieve this standard, the following principles are at the core of the Busy Beaver Challenge. Any program involved in the project must be:

  1. Open source
  2. Open to collaboration
  3. Modular, concise and clear
  4. Documented and unit tested
  5. Reproducible with clear build/run instructions
  6. Eventually accompanied by a proof of correctness

For deciders, we detail the reproducibility rules in this validation process.

We would encourage the use of automatic proving tools such as Lean or Coq although it would be an extremely demanding endeavour.

You are invited to Contribute at making the Busy Beaver Challenge more reproducible and verifiable.