Supervised Learning

Supervised learning (SL) in the context of language model fine-tuning means learning an input-output mapping from labeled data. Technically, this involves minimizing a weighted cross-entropy loss on token sequences—equivalently, maximizing the log-probability of specified target tokens.

Supervised Learning in LLM Pipelines

SL is a foundational technique in LLM fine-tuning, used for instruction tuning, domain adaptation, and behavior specialization. Bios makes SL training efficient through distributed GPU orchestration and LoRA parameter adaptation.

Common Use Cases for Supervised Learning

SL is commonly used in two key scenarios within LLM fine-tuning pipelines:

1. Instruction Tuning

The foundational step in post-training pipelines. Applied to UltraSafe expert models to enhance domain-specific capabilities while maintaining general reasoning and instruction-following abilities.

Purpose:

• Demonstrate correct response format and style
• Boost reasoning capabilities in specific domains
• Improve instruction-following for enterprise tasks
• Adapt general models to specialized use cases

2. Context Distillation / Prompt Distillation

Condense long system prompts or instructions into model behavior. When system messages become impractically long or start being ignored, distill them into the model's weights through supervised fine-tuning.

Purpose:

• Reduce token overhead from lengthy system prompts
• Internalize behavioral guidelines into model weights
• Create specialized models for narrow prompt distributions
• Improve adherence to complex, multi-part instructions

Instruction Tuning with Bios

Instruction tuning adapts UltraSafe expert models to follow task-specific instructions while maintaining their domain expertise. This is particularly valuable for enterprise applications requiring specialized behavior.

Instruction Tuning Example

Fine-tune a healthcare model to follow clinical documentation guidelines:

Quick Start: Running Your First SL Experiment

The Bios Cookbook provides a ready-to-run supervised learning implementation. Run your first training job in minutes using the built-in training loop.

Run the Example

The Cookbook includes sl_basic.py, which fine-tunes an UltraSafe model on a curated instruction-following dataset:

Run Basic SL Example

1# From bios-cookbook directory
2python -m bios_cookbook.recipes.sl_basic

What This Script Does

• Fine-tunes ultrasafe/usf-mini on a high-quality instruction dataset
• Uses the built-in training loop from train_cli.py
• Automatically configures LoRA with optimal hyperparameters
• Saves checkpoints and metrics to /tmp/bios-examples/sl_basic

What You'll See During Training

The training script provides real-time feedback on training progress:

Training Metrics

Each step prints train and test loss, along with timing statistics:

Step 0: train_loss=1.856, test_loss=1.823, time=2.3s
Step 10: train_loss=1.789, test_loss=1.791, time=2.1s
Step 20: train_loss=1.772, test_loss=1.776, time=2.2s
...

Data Visualization

The script visualizes training data with color-coded tokens:

<|im_start|>user: What is supervised learning?<|im_end|>(weight=0, context)

<|im_start|>assistant: Supervised learning trains models on labeled data...<|im_end|>(weight=1, completion)

Checkpoints

Automatic checkpointing at configurable intervals with paths displayed:

Saved checkpoint: bios://usf-mini-abc123/step_100
Saved checkpoint: bios://usf-mini-abc123/step_200

Understanding the Output Files

The training script writes logs and checkpoints to the log_path directory. Here's what each file contains:

metrics.jsonl

Training metrics that were printed to console. Each line is a JSON object with step number, losses, and timing.

Load and Visualize:

1import pandas as pd
2import matplotlib.pyplot as plt
3
4# Load metrics
5df = pd.read_json("/tmp/bios-examples/sl_basic/metrics.jsonl", lines=True)
6
7# Plot training curves
8plt.figure(figsize=(10, 6))
9plt.plot(df['train_mean_nll'], label='Train Loss', linewidth=2)
10plt.plot(df['test/nll'].dropna(), label='Test Loss', linewidth=2)
11plt.xlabel('Training Steps')
12plt.ylabel('Negative Log Likelihood')
13plt.title('Supervised Learning Training Progress')
14plt.legend()
15plt.grid(True, alpha=0.3)
16plt.show()

checkpoints.jsonl

All checkpoints saved during training with their paths and metadata. Contains two checkpoint types:

• /sampler_weights/ - Weights only (for inference testing)
• /weights/ - Full optimizer state (for resuming training)

Resume Training: If you interrupt and re-run the script, it will detect existing checkpoints and offer to resume from the last saved state.

config.json

Complete training configuration including model name, LoRA settings, learning rate, batch size, and all hyperparameters used for the run. Useful for reproducibility and experiment tracking.

Using Your Own Dataset

The sl_basic script includes example code for loading custom datasets in JSONL format:

Custom Dataset Format

1# conversations.jsonl format
2# Each line is a JSON object with a "messages" field
3
4{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
5{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
6...
7
8# Load custom dataset in your training script
9import json
10
11def load_custom_data(filepath):
12    conversations = []
13    with open(filepath, 'r') as f:
14        for line in f:
15            data = json.loads(line)
16            conversations.append(data['messages'])
17    return conversations
18
19# Use in training
20custom_data = load_custom_data('path/to/conversations.jsonl')
21# ... process and train as shown in examples above

Dataset Quality Matters

Supervised learning performance depends heavily on data quality. Use high-quality, diverse examples that demonstrate the desired behavior. A small dataset (100-1000 examples) of excellent quality often outperforms a large dataset with noisy or inconsistent examples.

Healthcare Instruction Tuning

1import bios
2from bios import types
3from bios_cookbook import renderers, tokenizer_utils
4
5# Initialize for healthcare domain
6service_client = bios.ServiceClient()
7training_client = service_client.create_lora_training_client(
8    base_model="ultrasafe/usf-healthcare",
9    rank=32
10)
11
12# Get renderer
13tokenizer = training_client.get_tokenizer()
14renderer = renderers.get_renderer('ultrasafe', tokenizer)
15
16# Instruction tuning dataset
17instruction_data = [
18    {
19        "messages": [
20            {"role": "system", "content": "Generate clinical notes following SOAP format."},
21            {"role": "user", "content": "Patient presents with persistent cough and fever."},
22            {"role": "assistant", "content": "S: Patient reports persistent cough x 5 days, fever (101°F)\nO: Lung auscultation reveals bilateral crackles\nA: Likely viral respiratory infection\nP: Supportive care, fluids, monitor for 48h"}
23        ]
24    },
25    # ... more instruction examples
26]
27
28# Process and train
29for example in instruction_data:
30    tokens, weights = renderer.build_supervised_example(example["messages"])
31    datum = types.Datum(
32        model_input=types.ModelInput.from_ints(tokens[:-1]),
33        loss_fn_inputs={'target_tokens': tokens[1:], 'weights': weights[1:]}
34    )
35    training_client.forward_backward([datum], "cross_entropy")
36    training_client.optim_step()
37
38# Save instruction-tuned model
39training_client.save_state(name="healthcare_soap_v1")

Instruction Tuning Benefits

→Format Consistency: Model learns to produce outputs in desired format (SOAP notes, reports, etc.)
→Domain Reasoning: Enhances model's ability to apply domain knowledge correctly
→Task Specialization: Adapts general expert model to specific enterprise workflows

Context Distillation

Context distillation addresses the challenge of lengthy system prompts. When system messages grow too long or models start ignoring parts of complex instructions, distill the behavior into model weights through targeted SL.

The Problem: Long System Prompts

Challenges with Lengthy Prompts

• System messages can become impractically long (500+ tokens)
• Models may ignore or forget parts of complex multi-part instructions
• Every inference call incurs token overhead from repeated system prompt
• Attention dilution across very long contexts reduces effectiveness

The Solution: Distill Into Weights

Create a supervised dataset on a narrow prompt distribution with shorter, targeted instructions:

Context Distillation Example

1import bios
2from bios import types
3from bios_cookbook import renderers
4
5# Original: Long system prompt (500 tokens)
6original_system = """
7You are a financial analyst assistant. Always include:
81. Risk assessment with confidence scores
92. Market context and relevant economic indicators
103. Compliance considerations for SEC regulations
114. Historical comparisons where applicable
125. Quantitative metrics with sources
136. Executive summary at the start
14... (continues for many more instructions)
15"""
16
17# Distilled: Short, targeted instruction
18distilled_system = "Provide financial analysis with risk metrics and compliance notes."
19
20# Create distillation dataset
21# Generate responses using LONG prompt, but train with SHORT prompt
22distillation_data = []
23for query in financial_queries:
24    # Generate ideal response using long prompt
25    long_prompt_messages = [
26        {"role": "system", "content": original_system},
27        {"role": "user", "content": query}
28    ]
29    # ... sample ideal response ...
30      {/* Training Loop Implementations */}
31      <section className="mb-12">
32        <h2 className="text-2xl font-bold mb-4">SL Training Loop Implementations</h2>
33        <p className="text-lg mb-6 text-gray-700 dark:text-gray-300">
34          The Bios Cookbook provides two training loop implementations to suit different needs—a simple self-contained 
35          version for learning, and an optimized version for production use.
36        </p>
37
38        <div className="grid grid-cols-1 md:grid-cols-2 gap-6">
39          <div className="bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg p-6 shadow-sm">
40            <div className="flex items-center gap-3 mb-4">
41              <div className="w-10 h-10 bg-blue-100 dark:bg-blue-900 rounded-lg flex items-center justify-center">
42                <svg className="w-5 h-5 text-blue-800 dark:text-blue-300" fill="none" stroke="currentColor" viewBox="0 0 24 24">
43                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 6.253v13m0-13C10.832 5.477 9.246 5 7.5 5S4.168 5.477 3 6.253v13C4.168 18.477 5.754 18 7.5 18s3.332.477 4.5 1.253m0-13C13.168 5.477 14.754 5 16.5 5c1.747 0 3.332.477 4.5 1.253v13C19.832 18.477 18.247 18 16.5 18c-1.746 0-3.332.477-4.5 1.253" />
44                </svg>
45              </div>
46              <h3 className="text-lg font-semibold">sl_loop.py</h3>
47            </div>
48            <p className="text-gray-700 dark:text-gray-300 text-sm mb-4">
49              Simple, self-contained training loop for learning and prototyping. Defines data loading inline without 
50              external dataset classes.
51            </p>
52            <div className="bg-blue-50 dark:bg-blue-900/20 rounded-lg p-3 mb-3">
53              <p className="text-xs font-semibold text-blue-900 dark:text-blue-300 mb-2">Best For:</p>
54              <ul className="space-y-1 text-xs text-blue-800 dark:text-blue-300">
55                <li>• Learning how Bios works under the hood</li>
56                <li>• Prototyping custom training logic</li>
57                <li>• Writing your own training loops</li>
58                <li>• Educational examples</li>
59              </ul>
60            </div>
61            <div className="bg-gray-50 dark:bg-gray-900 rounded-lg p-3">
62              <p className="text-xs font-mono text-gray-700 dark:text-gray-300">
63                📁 bios_cookbook/recipes/sl_loop.py
64              </p>
65            </div>
66          </div>
67
68          <div className="bg-white dark:bg-gray-800 border border-gray-200 dark:border-gray-700 rounded-lg p-6 shadow-sm">
69            <div className="flex items-center gap-3 mb-4">
70              <div className="w-10 h-10 bg-green-100 dark:bg-green-900 rounded-lg flex items-center justify-center">
71                <svg className="w-5 h-5 text-green-800 dark:text-green-300" fill="none" stroke="currentColor" viewBox="0 0 24 24">
72                  <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 10V3L4 14h7v7l9-11h-7z" />
73                </svg>
74              </div>
75              <h3 className="text-lg font-semibold">supervised/train.py</h3>
76            </div>
77            <p className="text-gray-700 dark:text-gray-300 text-sm mb-4">
78              Production-optimized training loop with performance enhancements, efficient data loading, 
79              and additional features like periodic evaluations.
80            </p>
81            <div className="bg-green-50 dark:bg-green-900/20 rounded-lg p-3 mb-3">
82              <p className="text-xs font-semibold text-green-900 dark:text-green-300 mb-2">Best For:</p>
83              <ul className="space-y-1 text-xs text-green-800 dark:text-green-300">
84                <li>• Production training pipelines</li>
85                <li>• Large-scale fine-tuning</li>
86                <li>• Maximum performance and efficiency</li>
87                <li>• Periodic evaluation during training</li>
88              </ul>
89            </div>
90            <div className="bg-gray-50 dark:bg-gray-900 rounded-lg p-3">
91              <p className="text-xs font-mono text-gray-700 dark:text-gray-300">
92                📁 bios_cookbook/supervised/train.py
93              </p>
94            </div>
95          </div>
96        </div>
97
98        <div className="mt-6 bg-blue-50 dark:bg-blue-900/20 border-l-4 border-blue-500 p-4 rounded-r-lg">
99          <h3 className="text-blue-800 dark:text-blue-300 font-semibold mb-2">Implementation Similarity</h3>
100          <p className="text-blue-700 dark:text-blue-400 text-sm">
101            Both implementations follow the same core logic—the difference is in code organization and optimization. 
102            The <code className="bg-blue-100 dark:bg-blue-900 px-2 py-0.5 rounded">sl_loop.py</code> version is easier 
103            to understand for learning, while <code className="bg-blue-100 dark:bg-blue-900 px-2 py-0.5 rounded">supervised/train.py</code> includes 
104            production optimizations like async pipelining and efficient batching.
105          </p>
106        </div>
107      </section>
108
109    
110    # Create training example with short prompt
111    short_prompt_messages = [
112        {"role": "system", "content": distilled_system},
113        {"role": "user", "content": query},
114        {"role": "assistant", "content": ideal_response}  # From long prompt
115    ]
116    distillation_data.append(short_prompt_messages)
117
118# Train on distillation dataset
119training_client = service_client.create_lora_training_client(
120    base_model="ultrasafe/usf-finance",
121    rank=32
122)
123
124tokenizer = training_client.get_tokenizer()
125renderer = renderers.get_renderer('ultrasafe', tokenizer)
126
127for messages in distillation_data:
128    tokens, weights = renderer.build_supervised_example(messages)
129    datum = types.Datum(
130        model_input=types.ModelInput.from_ints(tokens[:-1]),
131        loss_fn_inputs={'target_tokens': tokens[1:], 'weights': weights[1:]}
132    )
133    training_client.forward_backward([datum], "cross_entropy")
134    training_client.optim_step()
135
136# Now the model follows complex guidelines with minimal prompt
137training_client.save_state(name="finance_distilled_v1")

Distillation Benefits

• Token Efficiency: Reduce 500+ token system prompts to <50 tokens
• Improved Adherence: Model internalizes guidelines rather than referencing them
• Cost Reduction: Lower token costs on every inference call
• Faster Inference: Shorter prompts mean lower latency

Cross-Entropy Loss for SL

Supervised learning in Bios uses the cross_entropy loss function, which maximizes the log-probability of target tokens:

Mathematical Objective

L(θ) = -E_x[Σ weights_i · log p_θ(x_i)]

Where weights is 0 for prompt tokens and 1 for completion tokens, allowing you to train only on desired outputs.

Using Cross-Entropy Loss

1# Process supervised example
2tokens, weights = renderer.build_supervised_example(messages)
3
4# Create training datum
5datum = types.Datum(
6    model_input=types.ModelInput.from_ints(tokens[:-1]),
7    loss_fn_inputs={
8        'target_tokens': tokens[1:],
9        'weights': weights[1:]  # 0 for context, 1 for completion
10    }
11)
12
13# Train with cross-entropy
14training_client.forward_backward([datum], loss_fn="cross_entropy")
15training_client.optim_step()

Complete Supervised Learning Pipeline

Here's a production-ready supervised learning pipeline for fine-tuning UltraSafe expert models:

Production SL Pipeline

1import bios
2from bios import types
3from bios_cookbook import renderers, tokenizer_utils
4from bios_cookbook.hyperparam_utils import get_lora_lr_over_full_finetune_lr
5import asyncio
6
7async def supervised_learning_pipeline(
8    base_model: str,
9    training_conversations: list,
10    validation_conversations: list,
11    num_epochs: int = 3
12):
13    """
14    Complete supervised learning pipeline with Bios
15    """
16    # Initialize
17    service_client = bios.ServiceClient()
18    training_client = await service_client.create_lora_training_client_async(
19        base_model=base_model,
20        rank=32
21    )
22    
23    # Setup renderer
24    tokenizer = training_client.get_tokenizer()
25    renderer = renderers.get_renderer('ultrasafe', tokenizer)
26    
27    # Calculate optimal learning rate
28    full_ft_lr = 1e-5
29    lr_scale = get_lora_lr_over_full_finetune_lr(base_model)
30    lora_lr = full_ft_lr * lr_scale
31    
32    print(f"Training with LoRA LR: {lora_lr}")
33    
34    # Training loop
35    for epoch in range(num_epochs):
36        epoch_loss = 0
37        
38        for step, messages in enumerate(training_conversations):
39            # Render to tokens with weights
40            tokens, weights = renderer.build_supervised_example(messages)
41            
42            # Create datum
43            datum = types.Datum(
44                model_input=types.ModelInput.from_ints(tokens[:-1]),
45                loss_fn_inputs={
46                    'target_tokens': tokens[1:],
47                    'weights': weights[1:]
48                }
49            )
50            
51            # Training step
52            fwd_future = await training_client.forward_backward_async(
53                [datum], "cross_entropy"
54            )
55            opt_future = await training_client.optim_step_async(
56                types.AdamParams(learning_rate=lora_lr)
57            )
58            
59            # Get results
60            fwd_result = await fwd_future
61            await opt_future
62            
63            epoch_loss += fwd_result.loss
64            
65            if step % 100 == 0:
66                print(f"Epoch {epoch}, Step {step}: Loss = {fwd_result.loss:.4f}")
67        
68        # Validation
69        val_loss = await evaluate_model(
70            training_client, 
71            validation_conversations, 
72            renderer
73        )
74        print(f"Epoch {epoch} complete: Train Loss = {epoch_loss/len(training_conversations):.4f}, Val Loss = {val_loss:.4f}")
75        
76        # Checkpoint
77        checkpoint_path = training_client.save_state(
78            name=f"sl_epoch_{epoch}"
79        ).result().path
80        print(f"Saved checkpoint: {checkpoint_path}")
81    
82    return training_client
83
84# Run pipeline
85asyncio.run(supervised_learning_pipeline(
86    base_model="ultrasafe/usf-healthcare",
87    training_conversations=train_data,
88    validation_conversations=val_data,
89    num_epochs=3
90))

Supervised Learning Best Practices

✓ Do

• Use high-quality, diverse training examples
• Scale LoRA rank with dataset size (rule: LoRA params ≥ completion tokens)
• Use correct LR scaling (20-100x larger for LoRA)
• Apply LoRA to all weight matrices (attention + MLP)
• Validate on held-out data to prevent overfitting
• Use rendering for consistent prompt/completion boundaries

✗ Don't

• Don't use full fine-tuning LR for LoRA (will underperform)
• Don't train only on attention layers (include MLP)
• Don't use very large batch sizes with LoRA (sensitivity issue)
• Don't forget to check weight=1 tokens vs LoRA capacity
• Don't mix different weighting schemes without validation
• Don't skip evaluation on diverse test cases

Bios Cookbook Examples

The Bios Cookbook provides production-ready implementations of supervised learning techniques. Find complete examples in the supervised/ directory.

📁 Cookbook Structure

bios-cookbook/
  supervised/
    instruction_tuning.py
    context_distillation.py
    domain_adaptation.py
    multi_task_learning.py

🔧 Cookbook Features

• Pre-configured domain-specific examples
• Data preprocessing utilities
• Hyperparameter tuning helpers
• Evaluation and metrics tracking

📥 Clone the Bios Cookbook to access complete SL implementations:

1git clone https://github.com/ultrasafe-ai/bios-cookbook.git
2cd bios-cookbook/supervised
3python instruction_tuning.py --model ultrasafe/usf-finance

Next Steps

Explore advanced training techniques building on supervised learning:

RLHF Training →

Combine SL with reinforcement learning from human feedback

Message Rendering →

Deep dive into token-level weighting and message formatting