Where do these arrays and numbers come from? : learnprogramming

TL;DR how exactly did these people come up with these arrays?

I’m making this chess engine, well, attempting to make a chess engine. A few of the confusing things that keep popping up are these seemingly arbitrary arrays of numbers from 0 to 63, the so-called “De Bruijn sequence” and a function that no one on the internet seems to be fully understood. A person tried to figure it out here. the code goes like this:

const int BitTable[64] = {
63, 30, 3, 32, 25, 41, 22, 33, 15, 50, 42, 13, 11, 53, 19, 34, 61, 29, 2,
51, 21, 43, 45, 10, 18, 47, 1, 54, 9, 57, 0, 35, 62, 31, 40, 4, 49, 5, 52,
26, 60, 6, 23, 44, 46, 27, 56, 16, 7, 39, 48, 24, 59, 14, 12, 55, 38, 28,
58, 20, 37, 17, 36, 8
};

int pop_1st_bit(uint64 *bb) {
uint64 b = *bb ^ (*bb - 1);
unsigned int fold = (unsigned) ((b & 0xffffffff) ^ (b >> 32));
*bb &= (*bb - 1);
return BitTable[(fold * 0x783a9b23) >> 26];
}

uint64 index_to_uint64(int index, int bits, uint64 m) {
int i, j;
uint64 result = 0ULL;
for(i = 0; i < bits; i++) {
j = pop_1st_bit(&m);
if(index & (1 << i)) result |= (1ULL << j);
}
return result;
}

the post I specified above did a great deal of explaining what the code is doing (finding the least significant bit and turning it into a 64 array index and vice versa), but it is still a mystery as for the how it’s doing it. The origins of the mysterious BitTable[64] array and the 0x783a9b23 magic number are up in the air.

To add further confusion that is not the only mysterious array and magic number that pops up in chess programming, here are a few more taken from the chess programming wiki:

const int index64[64] = {
0,  1, 48,  2, 57, 49, 28,  3,
61, 58, 50, 42, 38, 29, 17,  4,
62, 55, 59, 36, 53, 51, 43, 22,
45, 39, 33, 30, 24, 18, 12,  5,
63, 47, 56, 27, 60, 41, 37, 16,
54, 35, 52, 21, 44, 32, 23, 11,
46, 26, 40, 15, 34, 20, 31, 10,
25, 14, 19,  9, 13,  8,  7,  6
};

/**
* bitScanForward
* u/author Martin Läuter (1997)
*         Charles E. Leiserson
*         Harald Prokop
*         Keith H. Randall
* "Using de Bruijn Sequences to Index a 1 in a Computer Word"
* u/param bb bitboard to scan
* u/precondition bb != 0
* u/return index (0..63) of least significant one bit
*/
int bitScanForward(U64 bb) {
const U64 debruijn64 = C64(0x03f79d71b4cb0a89);
assert (bb != 0);
return index64[((bb & -bb) * debruijn64) >> 58];
}

That 25 year old code uses a whole different array of seemingly random numbers and the “magic number” 0x03f79d71b4cb0a89 (?) and shifts a bit expression bb & -bb by 58 of all numbers (why!?). here’s another one:

const int index64[64] = {
0, 47,  1, 56, 48, 27,  2, 60,
57, 49, 41, 37, 28, 16,  3, 61,
54, 58, 35, 52, 50, 42, 21, 44,
38, 32, 29, 23, 17, 11,  4, 62,
46, 55, 26, 59, 40, 36, 15, 53,
34, 51, 20, 43, 31, 22, 10, 45,
25, 39, 14, 33, 19, 30,  9, 24,
13, 18,  8, 12,  7,  6,  5, 63
};

/**
* bitScanForward
* u/author Kim Walisch (2012)
* u/param bb bitboard to scan
* u/precondition bb != 0
* u/return index (0..63) of least significant one bit
*/
int bitScanForward(U64 bb) {
const U64 debruijn64 = C64(0x03f79d71b4cb0a89);
assert (bb != 0);
return index64[((bb ^ (bb-1)) * debruijn64) >> 58];
}

The same confusing “magic number”, but a different array and bit expression bb ^ (bb-1). And of course, another one.

const int lsb_64_table[64] =
{
63, 30,  3, 32, 59, 14, 11, 33,
60, 24, 50,  9, 55, 19, 21, 34,
61, 29,  2, 53, 51, 23, 41, 18,
56, 28,  1, 43, 46, 27,  0, 35,
62, 31, 58,  4,  5, 49, 54,  6,
15, 52, 12, 40,  7, 42, 45, 16,
25, 57, 48, 13, 10, 39,  8, 44,
20, 47, 38, 22, 17, 37, 36, 26
};

/**
* bitScanForward
* u/author Matt Taylor (2003)
* u/param bb bitboard to scan
* u/precondition bb != 0
* u/return index (0..63) of least significant one bit
*/
int bitScanForward(U64 bb) {
unsigned int folded;
assert (bb != 0);
bb ^= bb - 1;
folded = (int) bb ^ (bb >> 32);
return lsb_64_table[folded * 0x78291ACF >> 26];
}

That one has a different array, magic number, and bit shifts it by 26 (again, why!?). the point is, I want to know how do people come up with these arrays and numbers, that is the core of my question.

EDIT: I think I got the second one. bear with me here, I’m gonna really dumb my wordings down so even someone with basic knowledge could understand. But before that, some basic bitwise operations and the DeBruijn sequence:

  1. The De Bruijn sequence

Basically, The De Bruijn sequences (DBS for short) are cyclic “strings” of digits of base k of which substring of length n appears once (notated by B(k, n)). We only need to concern ourselves with binary operations (B(2, n)). for example, say you want B(2, 2), there are 4 (22, notice the pattern here) possible combinations of a binary number with length 2, namely 00, 01, 10, 11. So, the DBS for that is 0011 (notice again the length of the DBS of 4) with 00, 01, 10, and 11 appearing only once in that sequence, remember, DBS are cyclic.

{11}00
1{10}0
11{00}
1}10{0

Now, a chess board contains 64 squares, indexed from 0 to 63, and to represent 63 in binary, we need at least 6 bits (111111 in binary is 63) so we need a B(2, 6) DBS.

2. Bit manipulation

I’m assuming we all know the basic , ~, &, etc. operations, the thing I’m trying to picture here are the expressions used bb ^ (bb-1), bb & -bband the variations used in the example I gave, using more digestible sizes of bb, take bb as an unsigned 8-bit instead of 64-bit:

bb       = 0110 0100 = 100
-bb      = 1001 1100 = 156
bb & -bb = 0000 0100 = 8 //The value is irrelevant, we just need to mask the least significant bit

So, what is going on here? well, since bb is an unsigned 8-bit number, it can store values ​​from 0 to 255 (28 different values). What happens when you add a minus sign to an unsigned integer? think of it this way, if moving forwards from 0 leads you to 0, 1, 2, 3, etc. then moving backward will loop you back to the highest value, so 0, 255, 254, 253, etc. Thus, if bb has a value of 100 (100 steps forward from 0) then -bb will have a value of 156 (100 steps backward from 0)

a similar thing happens when you inverse a binary number, bb = 0110 0100 which has a value of 100 (100 steps forward from 0) will of course has an inverse of ~bb = 1001 1011 which has a value of 155 (100 steps backward from 255again notice the pattern here)

with a bit of logic, one can conclude that -bb = ~bb + 1. the point is we can now isolate the least significant bit on the number, so bb & -bb isolates the least significant bit (LSB)

now onto the next expression b^(b-1):

bb            = 0110 0100
bb - 1        = 0110 0011
bb ^ (bb - 1) = 0000 0111

so, the separation bb ^ (bb-1) contains all bits set including and below the LS1B

const int index64[64] = {
0,  1, 48,  2, 57, 49, 28,  3,
61, 58, 50, 42, 38, 29, 17,  4,
62, 55, 59, 36, 53, 51, 43, 22,
45, 39, 33, 30, 24, 18, 12,  5,
63, 47, 56, 27, 60, 41, 37, 16,
54, 35, 52, 21, 44, 32, 23, 11,
46, 26, 40, 15, 34, 20, 31, 10,
25, 14, 19,  9, 13,  8,  7,  6
};

/**
* bitScanForward
* u/author Martin Läuter (1997)
*         Charles E. Leiserson
*         Harald Prokop
*         Keith H. Randall
* "Using de Bruijn Sequences to Index a 1 in a Computer Word"
* u/param bb bitboard to scan
* u/precondition bb != 0
* u/return index (0..63) of least significant one bit
*/
int bitScanForward(U64 bb) {
const U64 debruijn64 = C64(0x03f79d71b4cb0a89);
assert (bb != 0);
return index64[((bb & -bb) * debruijn64) >> 58];
}

this one is the most straightforward, that array is based on that 0x03f79d71b4cb0a89 number which is actually a B(2, 6) DBS, 0000 0011 1111 0111 1001 1101 0111 0001 1011 0100 1100 1011 0000 1010 1000 1001 to be precise. Now, that array is generated by locating the numbers 0 to 63 in the DBS, for example 0 = 000000 is the 0th substring on the DBS, 1 = 000001 is the 1st, 2 = 000010 is the 48th (here 0000 0011 1111 0111 1001 1101 0111 0001 1011 0100 1100 1011 [0000 10]10 1000 1001), and so on.

say we have a number bb = 0x00FF00000000FF00 or 0000 0000 1111 1111 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 0000 0000we know the LSB of bb is of index 8

we use the bb & -bb operator

bb       = 0x00FF00000000FF00
debruijn = 0x03f79d71b4cb0a89
         = 0000 0011 1111 0111 1001 1101 0111 0001 1011 0100 1100 1011 0000 1010 1000 1001
bb & -bb = 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000

we multiply bb & -bb with the DBS, and since bb & -bb is just 2indexthis is the same as shifting the DBS (index) to the right (DBS << index)

(bb & -bb) * debruijn = 1111 0111 1001 1101 0111 0001 1011 0100 1100 1011 0000 1010 1000 1001 0000 0000

shifting the DBS << 8 which is the same as taking the 8th substring in the DBS, to isolate that substring we need to shift the number 64-6=58 to the left

eighth substring in the DBS = 0000 0011 [1111 01]11 1001 1101 0111 0001 1011 0100 1100 1011 0000 1010 1000 1001
isolated eighth substring   = 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 1101 = 61

if we index 61 to our array it spits out 8 which is the number that we want. still no insight on the other three though. the one signed by Matt Taylor takes a 32-bit DBS and he was able to somehow generate an index array with 64 elements in it, the one signed by Kim Walisch multiplies the DBS with not only the LSB but also all the bits below it, and have an array corresponding to that product, how? I am aware that there are some high-level mathematics involved, but surely you don’t have to have a Phd in computer science to pick up a hobby in chess programming… right?

Leave a Comment