Chess-Llama

Moves

Playing chess using decoder-only transformer models

Introduction

Chess, a game of strategy and intellect, is one of the world's most widely played board games. Its origins can be traced back to India, where it was first known as "Chaturanga" around the 7th century CE. Chess has long been a benchmark for artificial intelligence. Modern Engines such as Stockfish and AlphaZero are now able to perform better than even the current grandmasters.
In recent years, decoder-only transformer has emerged as a powerful architecture, revolutionizing tasks that require sequential understanding and generation, with LLMs like ChatGPT and Gemini being the perfect examples of its capabilities. Given the sequential nature of chess, where each move depends on prior positions and strategies, decoder-only transformer models are well-suited for playing chess.

Objective

We aim to train a lightweight and performant chess model that is based on the decoder-only transformer architecture.

Methodology

We created a tiny Llama-based decoder-only transformer model for chess play, consisting of just 23M parameters. The dataset consists of 3M high-quality games sourced from Lichess.com, played by elite players all around the world. It uses the UCI format for input and output, making it easy to integrate in chess applications. The model is trained for 5 epochs with batch size of 16, on a single Nvidia L4 GPU for 18 hours, using the Google Cloud's Vertex AI platform.
Hyperparameters
Total Parameters 23001600
Layers 8
Model Dimensions 512
FFN Dimensions 1024
Attention Heads 8
Key/Value Heads 8
Peak Learning Rate 0.0005
Activation Function SiLU
Vocabulary Size 1974

Results & Analysis

The model performs with an expected Elo rating of 1400. 99.1 % moves made by the model were legal. It significantly outperforms the global average of 620 at chess.com. It also outperforms other decoder-only transformer based chess models of similar size, like the ones based on the GPT-2 architecture. When competing against Stockfish, the leading chess engine stronger than the best human players, it outperforms it at skill-level 0, but starts to fall behind in higher skill levels, which is expected. Compared to Stockfish, it should have a more human-like feel in its gameplay, mainly due to the training dataset consisting of real-world games. Thanks to its small size and Llama-based architecture, it runs very fast ( <0.1s per move ) even on modern mobile devices. Chess-Llama vs Stockfish

Conclusion

The results and analysis indicate the following: