Playing chess using decoder-only transformer models
Introduction
Chess, a game of strategy and intellect, is one of the world's most widely played board games. Its origins
can be traced back to India, where it was first known as "Chaturanga" around the 7th century CE. Chess has
long been a benchmark for artificial intelligence. Modern Engines such as Stockfish and AlphaZero are now
able to perform better than even the current grandmasters.
In recent years, decoder-only transformer has emerged as a powerful architecture, revolutionizing tasks that
require sequential understanding and generation, with LLMs like ChatGPT and Gemini being the perfect
examples of its capabilities. Given the sequential nature of chess, where each move depends on prior
positions and strategies, decoder-only transformer models are well-suited for playing chess.
Objective
We aim to train a lightweight and performant chess model that is based on the decoder-only transformer
architecture.
Methodology
We created a tiny Llama-based decoder-only transformer model for chess play, consisting of just 23M
parameters. The dataset consists of 3M high-quality games sourced from Lichess.com, played by elite players
all around the world. It uses the UCI format for input and output, making it easy to integrate in chess
applications. The model is trained for 5 epochs with batch size of 16, on a single Nvidia L4 GPU for 18
hours, using the Google Cloud's Vertex AI platform.
Hyperparameters
Total Parameters
23001600
Layers
8
Model Dimensions
512
FFN Dimensions
1024
Attention Heads
8
Key/Value Heads
8
Peak Learning Rate
0.0005
Activation Function
SiLU
Vocabulary Size
1974
Results & Analysis
The model performs with an expected Elo rating of 1400. 99.1 % moves made by the model were legal. It
significantly outperforms the global average of 620 at chess.com. It also outperforms other decoder-only
transformer based chess models of similar size, like the ones based on the GPT-2 architecture. When
competing against Stockfish, the leading chess engine stronger than the best human players, it outperforms
it at skill-level 0, but starts to fall behind in higher skill levels, which is expected. Compared to
Stockfish, it should have a more human-like feel in its gameplay, mainly due to the training dataset
consisting of real-world games. Thanks to its small size and Llama-based architecture, it runs very fast (
<0.1s per move ) even on modern mobile devices.
Conclusion
The results and analysis indicate the following:
99.1 % of the moves played by the model are legal, proving that the attention mechanism can
understand the complex rules of chess.
Small language models are capable of performing domain-specific tasks while maintaining high
performance and efficiency, allowing for a framework consisting of multiple smaller models working
together to perform a diverse set of tasks.
Inferencing small transformer models is memory-bound, rather than compute-bound. As a result, small
language models are extremely fast and can be deployed on edge and mobile devices.
The combination of the model’s small size, fast inference speed, and human-like play makes it a good
choice for use in resource-constrained environments such as mobile applications and websites.