Configuration Management
The project uses Hydra to manage all configurations for training and evaluation. This allows for a clean separation of settings in YAML files and easy overriding of parameters from the command line.
Configuration File Structure
The main configuration files are located in the config/ directory:
config/cfg_pretrain.yaml: The main configuration file that sets default values for training.config/arch/hrm_v1.yaml: The configuration file specific to the HRM model architecture.
Main Configuration (cfg_pretrain.yaml)
This file contains hyperparameters and settings related to the training process, data, and logging.
# config/cfg_pretrain.yaml
# Data path
data_path: data/arc-aug-1000
# Hyperparams - Training
global_batch_size: 768
epochs: 100000
eval_interval: 10000
checkpoint_every_eval: True
lr: 1e-4
lr_min_ratio: 1.0
lr_warmup_steps: 2000
# Standard hyperparameter settings for LM, as used in Llama
beta1: 0.9
beta2: 0.95
weight_decay: 0.1
puzzle_emb_weight_decay: 0.1
# Hyperparams - Puzzle embeddings training
puzzle_emb_lr: 1e-2
Key Parameters:
data_path: Path to the processed dataset directory.global_batch_size: Total batch size across all GPUs.epochs: Total number of training epochs.eval_interval: Run evaluation every N epochs.lr: Peak learning rate for the main model parameters.puzzle_emb_lr: Peak learning rate for the sparse puzzle embeddings.weight_decay: Weight decay for the main model and puzzle embeddings.
Architecture Configuration (arch/hrm_v1.yaml)
This file defines the structure and hyperparameters of the HierarchicalReasoningModel_ACTV1.
# config/arch/hrm_v1.yaml
name: hrm.hrm_act_v1@HierarchicalReasoningModel_ACTV1
loss:
name: losses@ACTLossHead
loss_type: stablemax_cross_entropy
halt_exploration_prob: 0.1
halt_max_steps: 16
H_cycles: 2
L_cycles: 2
H_layers: 4
L_layers: 4
hidden_size: 512
num_heads: 8
expansion: 4
puzzle_emb_ndim: ${.hidden_size}
pos_encodings: rope
Key Parameters:
name: The model class to instantiate.loss: Configuration for the loss function, including theACTLossHead.halt_max_steps: Maximum number of recurrent steps for the ACT mechanism.H_cycles,L_cycles: Number of update cycles for the high-level and low-level modules.H_layers,L_layers: Number of Transformer blocks in each module.hidden_size,num_heads: Standard Transformer dimensions.puzzle_emb_ndim: Dimensionality of the per-puzzle embeddings.
Overriding Configuration via Command Line
Hydra's primary benefit is the ability to easily override any configuration value from the command line when launching a script.
Syntax: python <script>.py path.to.key=value
Example: To run the Sudoku experiment, you might override the data path, batch size, and learning rate:
python pretrain.py \
data_path=data/sudoku-extreme-1k-aug-1000 \
global_batch_size=384 \
lr=7e-5
This command tells Hydra to use data/sudoku-extreme-1k-aug-1000 for data_path instead of the default value in cfg_pretrain.yaml. This makes it easy to manage and run multiple experiments without editing the YAML files directly.