Fine-Tuning DeepSeek Math to Solve Emoji Equations: A Step-by-Step Guide

March 8, 2024

This blog post guides you through the process of fine-tuning the DeepSeek Math model to solve math problems expressed using emojis. This is a fun and engaging way to explore the capabilities of large language models (LLMs) and their ability to understand and reason with abstract concepts.

Prerequisites

  • A Google Colab environment (or a similar environment with a GPU)
  • Basic understanding of Python and machine learning concepts
  • A Hugging Face account (optional, for easier model loading)

Setting Up the Environment

Install the required packages:

!pip install -qqq loralib==0.1.1
!pip install -qqq einops==0.6.1
!pip install -qqq transformers accelerate peft datasets bitsandbytes torch

Package Breakdown:

  • loralib: Low-rank adaptation (LoRA) for efficient fine-tuning.
  • einops: Tensor operations with Einstein notation.
  • transformers: Hugging Face library for pre-trained models.
  • accelerate: Distributed training and mixed-precision support.
  • peft: Parameter-Efficient Fine-Tuning.
  • datasets: Easy loading and manipulation of datasets.
  • bitsandbytes: 8-bit optimizer support.
  • torch: PyTorch deep learning framework.

Loading the Model and Tokenizer

import os, torch, re
import bitsandbytes as bnb
from transformers import (
    AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,
    TrainingArguments, Trainer, GenerationConfig
)
from peft import LoraConfig, get_peft_model
from datasets import Dataset

model_name = "deepseek-ai/deepseek-math-7b-base"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")

model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

Key Points:

  • AutoTokenizer & AutoModelForCausalLM: Load tokenizer and model.
  • torch_dtype=torch.bfloat16: Balanced performance and memory.
  • device_map="auto": Automatically assign layers to available GPUs.

Configuring LoRA for Efficient Fine-Tuning

def get_num_layers(model):
    numbers = set()
    for name, _ in model.named_parameters():
        for number in re.findall(r'\d+', name):
            numbers.add(int(number))
    return max(numbers) if numbers else None

def get_last_layer_linears(model):
    names = []
    num_layers = get_num_layers(model)
    if num_layers is None:
        return names
    for name, module in model.named_modules():
        if str(num_layers) in name and "encoder" not in name:
            if isinstance(module, torch.nn.Linear):
                names.append(name)
    return names

lora_config = LoraConfig(
    r=20,
    lora_alpha=40,
    target_modules=get_last_layer_linears(model),
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)

LoRA Parameters:

  • r: Rank of the LoRA matrices
  • lora_alpha: Scaling factor
  • target_modules: Linear layers of the last transformer block
  • lora_dropout: Dropout for regularization
  • bias: "none" = No bias term
  • task_type: "CAUSAL_LM" for language modeling

Preparing the Emoji Math Dataset

emoji_dataset_examples = [
    {"prompt": "🍎 + 🍎 + 🍎 = 12 β†’ 🍎", "completion": "4"},
    {"prompt": "πŸš— + πŸš— = 10 β†’ πŸš—", "completion": "5"},
    ...
    {"prompt": "πŸͺ‘ + πŸͺ‘ + πŸͺ‘ = 15 β†’ πŸͺ‘", "completion": "5"}
]

tokenizer.pad_token = tokenizer.eos_token
data = Dataset.from_list(emoji_dataset_examples)

Creating Prompts and Tokenizing the Data

def generate_prompt(example):
    return (
        f"Emoji Math Solver:\n"
        f"Problem: {example['prompt']}\n"
        f"Solution: {example['completion']}"
    )

def tokenize_function(example):
    full_prompt = generate_prompt(example)
    return tokenizer(full_prompt, truncation=True, padding="max_length", max_length=256)

data = data.map(tokenize_function)

Notes:

  • generate_prompt(): Formats data for input.
  • tokenizer(): Converts prompt into token IDs with padding/truncation.
  • data.map(): Applies tokenization across the dataset.

Let's train the model!

With the dataset prepared and tokenized, we can now fine-tune the model using the Trainer API from Hugging Face. We'll define training arguments and then launch the training process.

# Define training arguments
training_args = TrainingArguments(
    output_dir="./emoji-math-model",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    logging_steps=10,
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=True,  # Use mixed precision for faster training
    save_strategy="epoch",
    report_to="none"
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=data,
    tokenizer=tokenizer
)

# Start training
trainer.train()

Explanation of key parameters:

  • per_device_train_batch_size: Number of samples processed before a backward/update pass per device.
  • gradient_accumulation_steps: Helps simulate larger batches by accumulating gradients over multiple steps.
  • num_train_epochs: Number of times to iterate over the entire dataset.
  • fp16: Enables mixed-precision training for better performance on GPUs.
  • save_strategy: Saves the model checkpoint at the end of each epoch.
  • report_to="none": Disables logging to external tools like WandB or TensorBoard.

Evaluating the Fine-Tuned Model

Once training is complete, let’s test the model with new emoji math problems to see how well it has learned.

# Function to generate predictions
def predict(prompt):
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
    with torch.no_grad():
        output = model.generate(input_ids, max_new_tokens=20)
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Try a few examples
test_problems = [
    "πŸͺ + πŸͺ = 10 β†’ πŸͺ",
    "🧊 + 🧊 + 🧊 = 15 β†’ 🧊",
    "πŸ§ƒ + πŸ§ƒ + πŸ§ƒ + πŸ§ƒ = 32 β†’ πŸ§ƒ",
    "🎈 + 🎈 = 6 β†’ 🎈"
]

for problem in test_problems:
    prompt = f"Emoji Math Solver:\nProblem: {problem}\nSolution:"
    result = predict(prompt)
    print(f"{prompt}\n{result}\n{'-'*50}")

What to expect:

The model should return the correct numerical value for each emoji based on the problem. This is a simple but fun way to evaluate whether the model has successfully learned the pattern of emoji math.

Conclusion

You’ve now learned how to fine-tune the DeepSeek Math 7B model to solve emoji-based math puzzles. This walkthrough demonstrates how powerful and flexible modern LLMs can be, even in whimsical or abstract problem domains.