This blog post guides you through the process of fine-tuning the DeepSeek Math model to solve math problems expressed using emojis. This is a fun and engaging way to explore the capabilities of large language models (LLMs) and their ability to understand and reason with abstract concepts.
Prerequisites
- A Google Colab environment (or a similar environment with a GPU)
- Basic understanding of Python and machine learning concepts
- A Hugging Face account (optional, for easier model loading)
Setting Up the Environment
Install the required packages:
!pip install -qqq loralib==0.1.1
!pip install -qqq einops==0.6.1
!pip install -qqq transformers accelerate peft datasets bitsandbytes torch
Package Breakdown:
loralib
: Low-rank adaptation (LoRA) for efficient fine-tuning.einops
: Tensor operations with Einstein notation.transformers
: Hugging Face library for pre-trained models.accelerate
: Distributed training and mixed-precision support.peft
: Parameter-Efficient Fine-Tuning.datasets
: Easy loading and manipulation of datasets.bitsandbytes
: 8-bit optimizer support.torch
: PyTorch deep learning framework.
Loading the Model and Tokenizer
import os, torch, re
import bitsandbytes as bnb
from transformers import (
AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,
TrainingArguments, Trainer, GenerationConfig
)
from peft import LoraConfig, get_peft_model
from datasets import Dataset
model_name = "deepseek-ai/deepseek-math-7b-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id
Key Points:
AutoTokenizer
&AutoModelForCausalLM
: Load tokenizer and model.torch_dtype=torch.bfloat16
: Balanced performance and memory.device_map="auto"
: Automatically assign layers to available GPUs.
Configuring LoRA for Efficient Fine-Tuning
def get_num_layers(model):
numbers = set()
for name, _ in model.named_parameters():
for number in re.findall(r'\d+', name):
numbers.add(int(number))
return max(numbers) if numbers else None
def get_last_layer_linears(model):
names = []
num_layers = get_num_layers(model)
if num_layers is None:
return names
for name, module in model.named_modules():
if str(num_layers) in name and "encoder" not in name:
if isinstance(module, torch.nn.Linear):
names.append(name)
return names
lora_config = LoraConfig(
r=20,
lora_alpha=40,
target_modules=get_last_layer_linears(model),
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
LoRA Parameters:
r
: Rank of the LoRA matriceslora_alpha
: Scaling factortarget_modules
: Linear layers of the last transformer blocklora_dropout
: Dropout for regularizationbias
:"none"
= No bias termtask_type
:"CAUSAL_LM"
for language modeling
Preparing the Emoji Math Dataset
emoji_dataset_examples = [
{"prompt": "π + π + π = 12 β π", "completion": "4"},
{"prompt": "π + π = 10 β π", "completion": "5"},
...
{"prompt": "πͺ + πͺ + πͺ = 15 β πͺ", "completion": "5"}
]
tokenizer.pad_token = tokenizer.eos_token
data = Dataset.from_list(emoji_dataset_examples)
Creating Prompts and Tokenizing the Data
def generate_prompt(example):
return (
f"Emoji Math Solver:\n"
f"Problem: {example['prompt']}\n"
f"Solution: {example['completion']}"
)
def tokenize_function(example):
full_prompt = generate_prompt(example)
return tokenizer(full_prompt, truncation=True, padding="max_length", max_length=256)
data = data.map(tokenize_function)
Notes:
generate_prompt()
: Formats data for input.tokenizer()
: Converts prompt into token IDs with padding/truncation.data.map()
: Applies tokenization across the dataset.
Let's train the model!
With the dataset prepared and tokenized, we can now fine-tune the model using the Trainer
API from Hugging Face. We'll define training arguments and then launch the training process.
# Define training arguments
training_args = TrainingArguments(
output_dir="./emoji-math-model",
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
logging_steps=10,
num_train_epochs=3,
learning_rate=2e-4,
fp16=True, # Use mixed precision for faster training
save_strategy="epoch",
report_to="none"
)
# Initialize the Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=data,
tokenizer=tokenizer
)
# Start training
trainer.train()
Explanation of key parameters:
per_device_train_batch_size
: Number of samples processed before a backward/update pass per device.gradient_accumulation_steps
: Helps simulate larger batches by accumulating gradients over multiple steps.num_train_epochs
: Number of times to iterate over the entire dataset.fp16
: Enables mixed-precision training for better performance on GPUs.save_strategy
: Saves the model checkpoint at the end of each epoch.report_to="none"
: Disables logging to external tools like WandB or TensorBoard.
Evaluating the Fine-Tuned Model
Once training is complete, letβs test the model with new emoji math problems to see how well it has learned.
# Function to generate predictions
def predict(prompt):
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
with torch.no_grad():
output = model.generate(input_ids, max_new_tokens=20)
return tokenizer.decode(output[0], skip_special_tokens=True)
# Try a few examples
test_problems = [
"πͺ + πͺ = 10 β πͺ",
"π§ + π§ + π§ = 15 β π§",
"π§ + π§ + π§ + π§ = 32 β π§",
"π + π = 6 β π"
]
for problem in test_problems:
prompt = f"Emoji Math Solver:\nProblem: {problem}\nSolution:"
result = predict(prompt)
print(f"{prompt}\n{result}\n{'-'*50}")
What to expect:
The model should return the correct numerical value for each emoji based on the problem. This is a simple but fun way to evaluate whether the model has successfully learned the pattern of emoji math.
Conclusion
Youβve now learned how to fine-tune the DeepSeek Math 7B model to solve emoji-based math puzzles. This walkthrough demonstrates how powerful and flexible modern LLMs can be, even in whimsical or abstract problem domains.