Archive‎ > ‎Fall 2009‎ > ‎Course Project‎ > ‎Buddy Suite‎ > ‎Milestone 3: Chess Game Logic‎ > ‎

Huffman encoding

This is an alternative chess board representation. The information contained in this page has been copied from here. Be careful, the information might not be correct or complete. This page only gives you an overview of this type of chess board representation.

Inspired by Huffman coding, chess positions can be represented with bit patterns in such a way that more common board elements (empty squares and pawns) are stored with less bits than less common board elements:
Empty square = 0
Pawn = 10c
Bishop = 1100c
Knight = 1101c
Rook = 1110c
Queen = 11110c
King = 11111c

Where c is a bit representing the color of the piece (1 = LIGHT, 0 = DARK).

Additional bits are needed for:
50-move rule (7 bits)
en-passant column (4 bits)
color to move (1 bit)
castling rights (4 bits).

Huffman encoding schemes allow a complete board state (starting position) to be represented in just 23 bytes.

For the fifty rule move, a number representing one of 101 possibilities is needed: a pawn was just moved or a piece just captured, a pawn was moved or a piece was captured 1..99 moves ago, or a pawn was moved or a piece captured 100 or more moves ago. This fits in 7 bits.

Any board may have a maximum of 5 seemingly en-passant-capturable pawns. Thus a number representing one of 6 possibilities is needed. This fits in 3 bits.

Trailing empty squares can be omitted. If the last piece is a king, it can by definition be omitted without loss of information, saving six more bits in some cases. The en passant column is only needed when an opportunity seems to exist. Castling rights need be stored only for those rooks for whom it seems permissible.

With the above, a complete board state can be represented in a maximum of 25 bytes (actually 24 and 3 bits extra).

Huffman encodings are fairly CPU intensive, in comparison to other board representations which seek to minimize required processor and memory cycles. However, the small size of the final representation makes this approach well suited to storage of long term knowledge, for instance in storing positions in an opening book, where minimizing the board representation's size is more important than minimizing CPU cycles. Huffman encoded boards are also sometimes used in transposition tables for shallow entries.

Huffman fine-tuning

An even more compact variant sacrifices the two to four bit representation of as many empty squares for encoding two to nine empty squares in five bits. If another zero follows, the number is extended by two bits, allowing ten to 41 empty squares to be encoded in eight bits.

One empty square               = 0
nnn+2 (2-9) empty squares = 00nnn
nnnnn+10 (10-41) empty squares = 00nnn0nn


nnnn+10 (10-25) empty squares = 00nnn0n

For up to 41-length gaps, like the initial or end game boards, this saves up to 33 bits. For all sparse boards where there are pieces or pawns at near nine-square intervals or much longer gaps, there are also nice savings, e.g. 16 bits for four gaps of length nine. This is offset by the extra bits for short gaps. To always have the optimal encoding, one extra bit can mark whether straightforward or compact empty square encoding is used.

In addition the coordinates of the kings can be stored and their fields ignored in the bit sequence instead. This also takes 6 bits each. But the queen can then be coded in five bits. And gaps to the right and left can then be coalesced, making for one longer gap which is often a candidate for better compacting.

Four extra bits can store the number of pieces of the first color encountered. After the last piece of that color, the color bit can be skipped. For the initial boards, this saves 12 bits. Alternately or additionally, when the maximal possible number of one piece of the first color has been stored, i.e. eight black pawns, or two white rooks, all further pawns, rooks, etc. must be of the other color and don't require the color bit.

Another extra bit appears when the number of pieces of the first color is eight or less. It can mark the fact that no pawns are left. In this case the first bit of all pieces can be skipped:

Bishop       = 100c
Knight = 101c
Rook = 110c
Queen = 1110c
King = 1111c

Huffman multi-table approach

Additionally you can consider that for each rook there are 3 possibilities related to its home square: it has left it or it is there, either with or without castling right. When combining this for all rooks, that creates 81 different board categories, going from those with all rooks being home with castling rights, to none being home. If you store each category in a separate table, that table holds implicit information about rooks and castling rights, i.e. you only need to encode those rooks in those tables which mean it is not home. Moreover, in 25 of these categories, on each side at least one rook has its castling right, meaning that the king's position is implicitly known and need not be encoded either. And if the king's position is implicit, the queen's last bit is not needed or where both rooks are home, its shorter bit sequence can be used for the queen.

This means that for the nine tables with both rooks home, and at least one on each side having castling rights, you always save 38 bits (20 bits for four rooks, 12 bits for two kings and 2 bits for the queens and, as in all cases, 4 bits for the castling rights). At the other extreme, with no rooks home, you only save the 4 castling bits. So the gains are much higher for an opening data base than for end games.