Rendering to Tokens
Rendering converts message-based conversation datatypes into token representations required for model training and inference. While conceptually similar to chat templates, Bios's rendering system is designed for the complete training lifecycle—supporting supervised learning, reinforcement learning, and production deployment.
The Renderer Class
The Renderer class is the main interface for message-to-token conversion. It provides Python-based rendering logic that's easier to write and maintain than template-based approaches, especially when handling token-level loss weights.
Available in: bios_cookbook.renderers
Example Conversation
We'll use this multi-turn conversation throughout the examples below:
1messages = [
2    {
3        'role': 'system', 
4        'content': 'Answer concisely; at most one sentence per response'
5    },
6    {
7        'role': 'user', 
8        'content': 'What is the longest-lived rodent species?'
9    },
10    {
11        'role': 'assistant', 
12        'content': 'The naked mole rat, which can live over 30 years.'
13    },
14    {
15        'role': 'user', 
16        'content': 'How do they live so long?'
17    },
18    {
19        'role': 'assistant', 
20        'content': 'They evolved multiple protective mechanisms including special hyaluronic acid that prevents cancer, extremely stable proteins, and efficient DNA repair systems that work together to prevent aging.'
21    }
22]Inference: Generating Messages
The renderer enables message-to-message mapping. To sample assistant responses, use three key methods:
build_generation_prompt()Convert conversation to sampling prompt
get_stop_sequences()Get model-specific stop tokens
parse_response()Convert tokens back to message
Initialize the Renderer
1from bios_cookbook import renderers, tokenizer_utils
2
3# Get tokenizer for UltraSafe model
4tokenizer = tokenizer_utils.get_tokenizer('ultrasafe/usf-mini')
5
6# Create renderer
7renderer = renderers.get_renderer('ultrasafe', tokenizer)Generate Alternative Response
Remove the last assistant message and generate an alternative response:
1# Build prompt from conversation (excluding last assistant turn)
2prompt = renderer.build_generation_prompt(messages[:-1])
3
4print("Prompt object:")
5print(prompt)
6print('-' * 50)
7print("Decoded prompt:")
8print(tokenizer.decode(prompt.to_ints()))Output shows the ModelInput structure and decoded tokens:
Prompt object: ModelInput(chunks=[EncodedTextChunk(tokens=[151644, 8948, 198, ...], type='encoded_text')]) -------------------------------------------------- Decoded prompt: <|im_start|>system Answer concisely; at most one sentence per response<|im_end|> <|im_start|>user What is the longest-lived rodent species?<|im_end|> <|im_start|>assistant The naked mole rat, which can live over 30 years.<|im_end|> <|im_start|>user How do they live so long?<|im_end|> <|im_start|>assistant
ModelInput Structure
The ModelInput object contains a list of chunks. For text-only data, you'll have EncodedTextChunk objects. This structure supports future multi-modal extensions.
Sampling and Parsing Responses
Sample from the model and parse the token output back into a message format:
1import bios
2from bios.types import SamplingParams
3
4# Create sampling client
5service_client = bios.ServiceClient()
6sampling_client = service_client.create_sampling_client(
7    base_model='ultrasafe/usf-mini'
8)
9
10# Get stop sequences from renderer
11stop_sequences = renderer.get_stop_sequences()
12print(f"Stop sequences: {stop_sequences}")
13
14# Configure sampling parameters
15sampling_params = SamplingParams(
16    max_tokens=100,
17    temperature=0.5,
18    stop=stop_sequences
19)
20
21# Sample from model
22output = sampling_client.sample(
23    prompt, 
24    sampling_params=sampling_params, 
25    num_samples=1
26).result()
27
28print(f"Sampled tokens: {output.sequences[0].tokens}")
29
30# Parse tokens back to message
31sampled_message, parse_success = renderer.parse_response(
32    output.sequences[0].tokens
33)
34
35print(f"Sampled message: {sampled_message}")
36print(f"Parse success: {parse_success}")Example output:
Stop sequences: [151645]
Sampled tokens: [45, 7741, 34651, 31410, 614, 4911, 76665, ...]
Sampled message: {
    'role': 'assistant', 
    'content': 'Naked mole rats have unique adaptations, including a highly efficient immune system and a very low metabolic rate, which contribute to their longevity.'
}
Parse success: TrueStop Sequences
The stop sequence (e.g., 151645) corresponds to the model's end-of-message token (like <|im_end|>). Using the correct stop sequences ensures clean message boundaries.
Training: Supervised Learning
For supervised learning (and algorithms like DPO), we need to distinguish between prompt tokens (context) and completion tokens (what the model should learn). The renderer provides per-token loss weights to achieve this.
Build Supervised Example
Use build_supervised_example() to get tokens with corresponding loss weights:
1from bios_cookbook import renderers, tokenizer_utils
2from bios_cookbook.utils.format_colorized import format_colorized
3
4# Initialize renderer
5tokenizer = tokenizer_utils.get_tokenizer('ultrasafe/usf-conversation')
6renderer = renderers.get_renderer('ultrasafe', tokenizer)
7
8# Build supervised example with weights
9tokens, weights = renderer.build_supervised_example(messages)
10
11# Visualize with color-coding
12print(format_colorized(tokens, weights, tokenizer))Visualizing Token Weights
The output shows prompt tokens (weight=0, green) and completion tokens (weight=1, red):
Answer concisely; at most one sentence per response<|im_end|>↵
<|im_start|>user↵
What is the longest-lived rodent species?<|im_end|>↵
<|im_start|>assistant↵
The naked mole rat, which can live over 30 years.<|im_end|>↵
<|im_start|>user↵
How do they live so long?<|im_end|>↵
<|im_start|>assistant↵
Legend: Green = Prompt (weight=0), Red = Completion (weight=1)
Note: ↵ indicates newlines for clarity (not actual tokens)
Key Insight: Completion Selection
Only the final assistant message is treated as the completion (weight=1). All previous context, including earlier assistant responses, becomes part of the prompt (weight=0). This trains the model to continue conversations contextually rather than just answer isolated questions.
Using Renderer in Training
Integrate the renderer into your training pipeline for proper message formatting:
1import bios
2from bios import types
3from bios_cookbook import renderers, tokenizer_utils
4
5# Setup
6service_client = bios.ServiceClient()
7training_client = service_client.create_lora_training_client(
8    base_model="ultrasafe/usf-conversation",
9    rank=16
10)
11
12# Get tokenizer and renderer
13tokenizer = training_client.get_tokenizer()
14renderer = renderers.get_renderer('ultrasafe', tokenizer)
15
16# Process conversation data
17def process_conversation(messages):
18    """Convert message list to training Datum"""
19    tokens, weights = renderer.build_supervised_example(messages)
20    
21    # Create model input and targets
22    input_tokens = tokens[:-1]
23    target_tokens = tokens[1:]  # Shifted for next-token prediction
24    weights = weights[1:]
25    
26    return types.Datum(
27        model_input=types.ModelInput.from_ints(tokens=input_tokens),
28        loss_fn_inputs={
29            'target_tokens': target_tokens,
30            'weights': weights
31        }
32    )
33
34# Process your conversation dataset
35training_data = [
36    process_conversation(conv) 
37    for conv in conversation_dataset
38]
39
40# Training loop
41for epoch in range(num_epochs):
42    for batch in training_data:
43        training_client.forward_backward([batch], "cross_entropy")
44        training_client.optim_step()
45
46print("Training complete!")Complete Message-Based Workflow
End-to-end example showing training on messages and sampling message responses:
1import bios
2from bios import types
3from bios_cookbook import renderers, tokenizer_utils
4
5# Initialize
6service_client = bios.ServiceClient()
7training_client = service_client.create_lora_training_client(
8    base_model="ultrasafe/usf-healthcare",
9    rank=16
10)
11tokenizer = training_client.get_tokenizer()
12renderer = renderers.get_renderer('ultrasafe', tokenizer)
13
14# Training data: medical Q&A conversations
15conversations = [
16    [
17        {'role': 'system', 'content': 'Provide medical information accurately.'},
18        {'role': 'user', 'content': 'What causes Type 2 diabetes?'},
19        {'role': 'assistant', 'content': 'Type 2 diabetes is caused by...'}
20    ],
21    # ... more conversations
22]
23
24# Process and train
25for conv in conversations:
26    tokens, weights = renderer.build_supervised_example(conv)
27    datum = types.Datum(
28        model_input=types.ModelInput.from_ints(tokens[:-1]),
29        loss_fn_inputs={'target_tokens': tokens[1:], 'weights': weights[1:]}
30    )
31    training_client.forward_backward([datum], "cross_entropy")
32    training_client.optim_step()
33
34# Save and create sampler
35sampling_client = training_client.save_weights_and_get_sampling_client(
36    name="medical_qa_v1"
37)
38
39# Inference: Generate response to new query
40new_conversation = [
41    {'role': 'system', 'content': 'Provide medical information accurately.'},
42    {'role': 'user', 'content': 'What are symptoms of hypoglycemia?'}
43]
44
45# Build prompt and sample
46prompt = renderer.build_generation_prompt(new_conversation)
47stop_sequences = renderer.get_stop_sequences()
48result = sampling_client.sample(
49    prompt,
50    sampling_params=types.SamplingParams(
51        max_tokens=150, 
52        temperature=0.3, 
53        stop=stop_sequences
54    ),
55    num_samples=1
56).result()
57
58# Parse response
59response_message, success = renderer.parse_response(result.sequences[0].tokens)
60print(f"Assistant: {response_message['content']}")
61print(f"Parse successful: {success}")Design Philosophy: Python Over Templates
Bios uses Python-based rendering instead of template engines (like Jinja2) for several technical advantages:
✓ Python Advantages
- • Precise control: Exact whitespace and special token handling
- • Type safety: IDE autocomplete and static type checking
- • Token-level weights: Native support for loss weighting
- • Debugging: Standard Python debugging tools work
- • Composability: Easy to extend for custom formats
- • Testing: Unit tests for rendering logic
✗ Template Limitations
- • Whitespace errors: Hard to get spacing exactly right
- • Limited logic: Complex rendering requires workarounds
- • No weight support: Can't specify token-level weights
- • Debugging difficulty: Template errors are cryptic
- • Training-specific: Not designed for SL/RL workflows
- • Maintenance burden: Jinja syntax vs familiar Python
Advanced Rendering Patterns
Custom Renderer Implementation
Create custom renderers for specialized message formats:
1from bios_cookbook.renderers import BaseRenderer
2from bios import types
3
4class FinancialRenderer(BaseRenderer):
5    """Custom renderer for financial analysis conversations"""
6    
7    def build_generation_prompt(self, messages):
8        """Add financial context and formatting"""
9        # Add system context for financial domain
10        if messages[0]['role'] != 'system':
11            messages = [
12                {'role': 'system', 'content': 'Expert financial analyst'},
13                *messages
14            ]
15        
16        return super().build_generation_prompt(messages)
17    
18    def build_supervised_example(self, messages):
19        """Custom weighting for financial terminology"""
20        tokens, weights = super().build_supervised_example(messages)
21        
22        # Increase weights on financial terms
23        # (custom logic to detect and upweight domain vocabulary)
24        weighted_tokens = self._apply_term_weighting(tokens, weights)
25        
26        return tokens, weighted_tokens
27    
28    def _apply_term_weighting(self, tokens, weights):
29        """Apply higher weights to financial terms"""
30        # Detect financial terms and increase their weights
31        # Implementation details...
32        return weights
33
34# Usage
35renderer = FinancialRenderer(tokenizer)
36tokens, weights = renderer.build_supervised_example(financial_conversation)Multi-Turn Conversation Context
Understanding how the renderer handles multi-turn contexts is crucial for effective training:
Context Window Structure
This weighting strategy teaches the model to:
- Maintain conversation context across multiple turns
- Reference previous exchanges when formulating responses
- Generate contextually appropriate follow-up answers
- Distinguish between initial questions and clarification requests
Rendering Best Practices
Validate Rendering Output
Always visualize a few examples with format_colorized()to verify correct prompt/completion boundaries.
1# Debug first few examples
2for i, conv in enumerate(conversations[:3]):
3    tokens, weights = renderer.build_supervised_example(conv)
4    print(f"\nExample {i}:")
5    print(format_colorized(tokens, weights, tokenizer))Handle Special Tokens Consistently
Ensure special tokens (<|im_start|>, <|im_end|>, etc.) are handled consistently between training and inference. The renderer manages this automatically when you use its methods.
Test Parse Success
Always check the parse_success flag when parsing model outputs to catch malformed responses:
1message, success = renderer.parse_response(tokens)
2if not success:
3    print("Warning: Failed to parse model output")
4    # Handle parsing failure
5else:
6    print(f"Parsed: {message['content']}")Next Steps
Master rendering for advanced training techniques: