Prompt Distillation

Prompt distillation is a training technique that optimizes a model to behave as though it had been provided with a long, complex prompt—without requiring that prompt during inference. This dramatically reduces token overhead while preserving the behavioral guidance encoded in detailed instructions.

Two-Step Process

1.Create distillation data: Teacher model uses detailed prompt to generate high-quality responses
2.Train student model: Student learns to reproduce teacher behavior without the prompt

Mathematical Overview

Let fT and fS denote the teacher and student models, respectively. Given an instruction prompt P and a query qi, the teacher generates a response:

Distillation Formulation

Teacher Generation:

ri = fT([P, qi])

Distillation Dataset:

T = {(qi, ri) | 1 ≤ i ≤ D}

(Query-response pairs excluding the original prompt P)

Student Training Objective:

ℓ(fS(qi), ri) = ℓ(fS(qi), fT([P, qi]))

Student minimizes cross-entropy loss to match teacher outputs

Key Insight

The prompt P is concatenated with each query for teacher generation, but excluded from the training dataset. The student learns to implicitly reproduce the prompt's behavioral guidance through supervised learning on teacher outputs.

Example: Financial Analysis Distillation

The Bios Cookbook provides a prompt distillation recipe. We'll demonstrate with a financial analysis task, distilling a detailed analyst prompt into model weights.

Step 1: Generate Training Data

Create distillation data using the teacher model with a detailed financial analysis prompt:

Generate Distillation Dataset
1# Run the data generation script
2python -m bios_cookbook.recipes.prompt_distillation.create_data \
3  output_file=/tmp/bios-datasets/financial_analysis_distilled.jsonl \
4  teacher_model=ultrasafe/usf-finance \
5  num_examples=1000

What This Command Does:

  • Uses configured teacher model (ultrasafe/usf-finance) with detailed analyst prompt
  • Generates financial analysis examples on diverse queries
  • Saves distilled dataset to specified output file (JSONL format)
  • Creates training examples suitable for student model fine-tuning

Step 2: Train the Student Model

Fine-tune a student model on the distillation data to internalize the prompt guidance:

Train Student Model
1# Fine-tune student model
2python -m bios_cookbook.recipes.prompt_distillation.train \
3  data_file=/tmp/bios-datasets/financial_analysis_distilled.jsonl \
4  student_model=ultrasafe/usf-finance \
5  output_dir=/tmp/bios-models/distilled_analyst

Training Process:

  • Loads generated distillation dataset
  • Applies optimized training configurations (validated LR, batch size)
  • Fine-tunes student model using cross-entropy loss
  • Saves checkpoints and metrics for evaluation

Step 3: Test Your Distilled Model

Verify the distilled model's performance by sampling without the original prompt:

Test Distilled Model
1import bios
2from bios import types
3
4# Load distilled student model
5service_client = bios.ServiceClient()
6sampling_client = service_client.create_sampling_client(
7    model_path="bios://distilled_analyst/final"
8)
9
10# Test with minimal prompt (no lengthy instructions needed!)
11test_query = "Analyze Tesla's Q3 2024 earnings report"
12prompt = types.ModelInput.from_ints(
13    tokenizer.encode(test_query)
14)
15
16# Sample from distilled model
17result = sampling_client.sample(
18    prompt,
19    sampling_params=types.SamplingParams(max_tokens=512, temperature=0.3)
20).result()
21
22# Model provides detailed analysis without lengthy prompt
23print(tokenizer.decode(result.sequences[0].tokens))

✓ Distillation Success

The student model now provides comprehensive financial analysis (with risk metrics, compliance notes, etc.) using just the query—no 500+ token system prompt required. The behavioral guidelines have been internalized into the model weights.

Teacher-Student Model Configuration

The teacher and student models can be identical or different, depending on your requirements:

Same Model (Self-Distillation)

Use the same UltraSafe model as both teacher and student. The student learns to internalize complex prompt instructions into its base weights.

Example:

Teacher: ultrasafe/usf-finance
Student: ultrasafe/usf-finance

Different Models (Cross-Distillation)

Use a larger/more capable model as teacher to generate training data for a smaller/faster student model.

Example:

Teacher: ultrasafe/usf-finance
Student: ultrasafe/usf-mini

Complete Distillation Pipeline

Here's a complete implementation of prompt distillation for financial analysis:

Financial Analyst Distillation
1import bios
2from bios import types
3from bios_cookbook import renderers
4import asyncio
5
6# Define detailed teacher prompt (500+ tokens)
7TEACHER_PROMPT = """
8You are an expert financial analyst. For every query, provide:
9
101. RISK ASSESSMENT
11   - Quantify risk scores (0-100) with confidence intervals
12   - Identify systematic and idiosyncratic risks
13   - Reference relevant market conditions
14
152. COMPLIANCE ANALYSIS  
16   - Cite applicable SEC regulations
17   - Note disclosure requirements
18   - Flag potential compliance issues
19
203. QUANTITATIVE METRICS
21   - Calculate key financial ratios
22   - Provide historical comparisons (3-5 year trends)
23   - Include data sources and methodologies
24
254. EXECUTIVE SUMMARY
26   - Lead with 2-3 sentence overview
27   - Highlight critical findings
28   - Provide clear actionable insights
29
30Use professional terminology and cite sources.
31"""
32
33async def generate_distillation_data(
34    teacher_model: str,
35    queries: list[str],
36    output_file: str
37):
38    """Generate distillation dataset using teacher model"""
39    service_client = bios.ServiceClient()
40    
41    # Create sampling client for teacher
42    teacher_client = service_client.create_sampling_client(
43        base_model=teacher_model
44    )
45    
46    tokenizer = teacher_client.get_tokenizer()
47    renderer = renderers.get_renderer('ultrasafe', tokenizer)
48    
49    distillation_data = []
50    
51    for query in queries:
52        # Build teacher prompt with detailed instructions
53        teacher_messages = [
54            {"role": "system", "content": TEACHER_PROMPT},
55            {"role": "user", "content": query}
56        ]
57        
58        teacher_prompt = renderer.build_generation_prompt(teacher_messages)
59        stop_sequences = renderer.get_stop_sequences()
60        
61        # Generate teacher response
62        result = await teacher_client.sample_async(
63            teacher_prompt,
64            sampling_params=types.SamplingParams(
65                max_tokens=800,
66                temperature=0.3,
67                stop=stop_sequences
68            ),
69            num_samples=1
70        )
71        teacher_output = await result
72        teacher_response, _ = renderer.parse_response(
73            teacher_output.sequences[0].tokens
74        )
75        
76        # Create student training example (WITHOUT teacher prompt)
77        student_example = {
78            "messages": [
79                {"role": "user", "content": query},
80                {"role": "assistant", "content": teacher_response["content"]}
81            ]
82        }
83        distillation_data.append(student_example)
84        
85        print(f"Generated example {len(distillation_data)}/{len(queries)}")
86    
87    # Save distillation dataset
88    import json
89    with open(output_file, 'w') as f:
90        for example in distillation_data:
91            f.write(json.dumps(example) + '\n')
92    
93    print(f"Saved {len(distillation_data)} examples to {output_file}")
94
95# Generate data
96financial_queries = [
97    "Analyze Amazon's cloud computing revenue trends",
98    "Evaluate Microsoft's AI investment strategy",
99    # ... more queries
100]
101
102asyncio.run(generate_distillation_data(
103    teacher_model="ultrasafe/usf-finance",
104    queries=financial_queries,
105    output_file="/tmp/financial_distilled.jsonl"
106))

Step 2: Train Student Model

Student Training
1# Train student model on distilled data
2python -m bios_cookbook.recipes.prompt_distillation.train \
3  data_file=/tmp/financial_distilled.jsonl \
4  student_model=ultrasafe/usf-finance \
5  lora_rank=32 \
6  num_epochs=3 \
7  log_path=/tmp/distillation_logs

Advanced Configuration

Customize the distillation recipe for different scenarios:

Teacher Model Selection

Choose different base models based on capacity requirements. Larger teacher models can provide higher-quality responses but are more expensive to run.

Sampling Strategies

Adjust temperature and other generation parameters. Lower temperature (0.3-0.5) provides more consistent teacher outputs suitable for distillation.

Data Volume

Scale the number of generated examples based on task complexity. Simple tasks: 100-500 examples. Complex reasoning: 1000-5000 examples.

Training Hyperparameters

Fine-tune learning rate, batch size, and other settings. Use get_lr() for validated starting points.

Prompt Distillation Benefits

90%+

Token Reduction

Eliminate 500+ token system prompts, reducing to minimal task descriptions

3-5x

Faster Inference

Shorter prompts mean significantly lower latency per request

80%+

Cost Savings

Reduced token consumption translates to lower API costs