Quick Start: Training a Sudoku Solver

This guide provides a hands-on tutorial to train an expert-level Sudoku solving AI using the Hierarchical Reasoning Model. This example demonstrates the core workflow of the project on a task that can be run on a modern consumer GPU.

Goal

The goal is to train a model that can solve extremely difficult 9x9 Sudoku puzzles. We will use a small subset of the full dataset (1,000 puzzles) with data augmentation to achieve this quickly.

Step 1: Build the Sudoku Dataset

Before training, you must process the raw data into the format required by the model. The provided script handles this, including data augmentation.

Run the following command from the root of the repository:

python dataset/build_sudoku_dataset.py --output-dir data/sudoku-extreme-1k-aug-1000 --subsample-size 1000 --num-aug 1000

Let's break down this command:

--output-dir data/sudoku-extreme-1k-aug-1000: Specifies the directory where the processed dataset will be saved.
--subsample-size 1000: Uses only 1,000 puzzles from the original training set.
--num-aug 1000: Creates 1,000 augmented variations for each of the 1,000 original puzzles. Augmentations include shuffling numbers and permuting rows/columns in valid ways.

This script creates a training and test set in the specified output directory, ready for the training script.

Step 2: Start Training

With the dataset prepared, you can start the training process using the pretrain.py script. The following command is configured for a single-GPU setup, like a high-end laptop or desktop.

OMP_NUM_THREADS=8 python pretrain.py \
  data_path=data/sudoku-extreme-1k-aug-1000 \
  epochs=20000 \
  eval_interval=2000 \
  global_batch_size=384 \
  lr=7e-5 \
  puzzle_emb_lr=7e-5 \
  weight_decay=1.0 \
  puzzle_emb_weight_decay=1.0

Here's what the key parameters mean:

data_path: Points to the dataset we just created.
epochs: The total number of training epochs.
eval_interval: How often (in epochs) to run evaluation on the test set.
global_batch_size: The total number of examples processed in one step. A smaller size like 384 is suitable for a single GPU with limited VRAM.
lr, weight_decay, etc.: Hyperparameters for the optimizer. These have been tuned for this task.

Step 3: Monitor and Evaluate

As the model trains, progress will be printed to the console. All metrics, including training loss, learning rate, and evaluation accuracy, are automatically logged to your Weights & Biases account.

Expected Runtime: Approximately 10 hours on an NVIDIA RTX 4070 laptop GPU.
Expected Performance: The model should reach near-perfect accuracy on the test set.

Note: For this small dataset, late-stage overfitting can sometimes lead to numerical instability. It is advisable to monitor the training and use early stopping once the test accuracy plateaus near 100%.

Once training is complete, you will have a trained model checkpoint in the checkpoints/ directory, which can be used for further evaluation.