Rendering to Tokens

Rendering converts message-based conversation datatypes into token representations required for model training and inference. While conceptually similar to chat templates, Bios's rendering system is designed for the complete training lifecycle—supporting supervised learning, reinforcement learning, and production deployment.

The Renderer Class

The Renderer class is the main interface for message-to-token conversion. It provides Python-based rendering logic that's easier to write and maintain than template-based approaches, especially when handling token-level loss weights.

Available in: bios_cookbook.renderers

Example Conversation

We'll use this multi-turn conversation throughout the examples below:

Sample Conversation

1messages = [
2    {
3        'role': 'system', 
4        'content': 'Answer concisely; at most one sentence per response'
5    },
6    {
7        'role': 'user', 
8        'content': 'What is the longest-lived rodent species?'
9    },
10    {
11        'role': 'assistant', 
12        'content': 'The naked mole rat, which can live over 30 years.'
13    },
14    {
15        'role': 'user', 
16        'content': 'How do they live so long?'
17    },
18    {
19        'role': 'assistant', 
20        'content': 'They evolved multiple protective mechanisms including special hyaluronic acid that prevents cancer, extremely stable proteins, and efficient DNA repair systems that work together to prevent aging.'
21    }
22]

Inference: Generating Messages

The renderer enables message-to-message mapping. To sample assistant responses, use three key methods:

build_generation_prompt()

Convert conversation to sampling prompt

get_stop_sequences()

Get model-specific stop tokens

parse_response()

Convert tokens back to message

Initialize the Renderer

Create Renderer

1from bios_cookbook import renderers, tokenizer_utils
2
3# Get tokenizer for UltraSafe model
4tokenizer = tokenizer_utils.get_tokenizer('ultrasafe/usf-mini')
5
6# Create renderer
7renderer = renderers.get_renderer('ultrasafe', tokenizer)

Generate Alternative Response

Remove the last assistant message and generate an alternative response:

Build Generation Prompt

1# Build prompt from conversation (excluding last assistant turn)
2prompt = renderer.build_generation_prompt(messages[:-1])
3
4print("Prompt object:")
5print(prompt)
6print('-' * 50)
7print("Decoded prompt:")
8print(tokenizer.decode(prompt.to_ints()))

Output shows the ModelInput structure and decoded tokens:

Prompt object:
ModelInput(chunks=[EncodedTextChunk(tokens=[151644, 8948, 198, ...], type='encoded_text')])
--------------------------------------------------
Decoded prompt:
<|im_start|>system
Answer concisely; at most one sentence per response<|im_end|>
<|im_start|>user
What is the longest-lived rodent species?<|im_end|>
<|im_start|>assistant
The naked mole rat, which can live over 30 years.<|im_end|>
<|im_start|>user
How do they live so long?<|im_end|>
<|im_start|>assistant

ModelInput Structure

The ModelInput object contains a list of chunks. For text-only data, you'll have EncodedTextChunk objects. This structure supports future multi-modal extensions.

Sampling and Parsing Responses

Sample from the model and parse the token output back into a message format:

Sample and Parse

1import bios
2from bios.types import SamplingParams
3
4# Create sampling client
5service_client = bios.ServiceClient()
6sampling_client = service_client.create_sampling_client(
7    base_model='ultrasafe/usf-mini'
8)
9
10# Get stop sequences from renderer
11stop_sequences = renderer.get_stop_sequences()
12print(f"Stop sequences: {stop_sequences}")
13
14# Configure sampling parameters
15sampling_params = SamplingParams(
16    max_tokens=100,
17    temperature=0.5,
18    stop=stop_sequences
19)
20
21# Sample from model
22output = sampling_client.sample(
23    prompt, 
24    sampling_params=sampling_params, 
25    num_samples=1
26).result()
27
28print(f"Sampled tokens: {output.sequences[0].tokens}")
29
30# Parse tokens back to message
31sampled_message, parse_success = renderer.parse_response(
32    output.sequences[0].tokens
33)
34
35print(f"Sampled message: {sampled_message}")
36print(f"Parse success: {parse_success}")

Example output:

Stop sequences: [151645]
Sampled tokens: [45, 7741, 34651, 31410, 614, 4911, 76665, ...]
Sampled message: {
    'role': 'assistant', 
    'content': 'Naked mole rats have unique adaptations, including a highly efficient immune system and a very low metabolic rate, which contribute to their longevity.'
}
Parse success: True

Stop Sequences

The stop sequence (e.g., 151645) corresponds to the model's end-of-message token (like <|im_end|>). Using the correct stop sequences ensures clean message boundaries.

Training: Supervised Learning

For supervised learning (and algorithms like DPO), we need to distinguish between prompt tokens (context) and completion tokens (what the model should learn). The renderer provides per-token loss weights to achieve this.

Build Supervised Example

Use build_supervised_example() to get tokens with corresponding loss weights:

Create Supervised Training Example

1from bios_cookbook import renderers, tokenizer_utils
2from bios_cookbook.utils.format_colorized import format_colorized
3
4# Initialize renderer
5tokenizer = tokenizer_utils.get_tokenizer('ultrasafe/usf-conversation')
6renderer = renderers.get_renderer('ultrasafe', tokenizer)
7
8# Build supervised example with weights
9tokens, weights = renderer.build_supervised_example(messages)
10
11# Visualize with color-coding
12print(format_colorized(tokens, weights, tokenizer))

Visualizing Token Weights

The output shows prompt tokens (weight=0, green) and completion tokens (weight=1, red):

They evolved multiple protective mechanisms including special hyaluronic acid that prevents cancer, extremely stable proteins, and efficient DNA repair systems that work together to prevent aging.<|im_end|>

Legend: Green = Prompt (weight=0), Red = Completion (weight=1)
Note: ↵ indicates newlines for clarity (not actual tokens)

Key Insight: Completion Selection

Only the final assistant message is treated as the completion (weight=1). All previous context, including earlier assistant responses, becomes part of the prompt (weight=0). This trains the model to continue conversations contextually rather than just answer isolated questions.

Using Renderer in Training

Integrate the renderer into your training pipeline for proper message formatting:

Complete Training Example with Renderer

1import bios
2from bios import types
3from bios_cookbook import renderers, tokenizer_utils
4
5# Setup
6service_client = bios.ServiceClient()
7training_client = service_client.create_lora_training_client(
8    base_model="ultrasafe/usf-conversation",
9    rank=16
10)
11
12# Get tokenizer and renderer
13tokenizer = training_client.get_tokenizer()
14renderer = renderers.get_renderer('ultrasafe', tokenizer)
15
16# Process conversation data
17def process_conversation(messages):
18    """Convert message list to training Datum"""
19    tokens, weights = renderer.build_supervised_example(messages)
20    
21    # Create model input and targets
22    input_tokens = tokens[:-1]
23    target_tokens = tokens[1:]  # Shifted for next-token prediction
24    weights = weights[1:]
25    
26    return types.Datum(
27        model_input=types.ModelInput.from_ints(tokens=input_tokens),
28        loss_fn_inputs={
29            'target_tokens': target_tokens,
30            'weights': weights
31        }
32    )
33
34# Process your conversation dataset
35training_data = [
36    process_conversation(conv) 
37    for conv in conversation_dataset
38]
39
40# Training loop
41for epoch in range(num_epochs):
42    for batch in training_data:
43        training_client.forward_backward([batch], "cross_entropy")
44        training_client.optim_step()
45
46print("Training complete!")

Complete Message-Based Workflow

End-to-end example showing training on messages and sampling message responses:

Message-Based Training & Inference

1import bios
2from bios import types
3from bios_cookbook import renderers, tokenizer_utils
4
5# Initialize
6service_client = bios.ServiceClient()
7training_client = service_client.create_lora_training_client(
8    base_model="ultrasafe/usf-healthcare",
9    rank=16
10)
11tokenizer = training_client.get_tokenizer()
12renderer = renderers.get_renderer('ultrasafe', tokenizer)
13
14# Training data: medical Q&A conversations
15conversations = [
16    [
17        {'role': 'system', 'content': 'Provide medical information accurately.'},
18        {'role': 'user', 'content': 'What causes Type 2 diabetes?'},
19        {'role': 'assistant', 'content': 'Type 2 diabetes is caused by...'}
20    ],
21    # ... more conversations
22]
23
24# Process and train
25for conv in conversations:
26    tokens, weights = renderer.build_supervised_example(conv)
27    datum = types.Datum(
28        model_input=types.ModelInput.from_ints(tokens[:-1]),
29        loss_fn_inputs={'target_tokens': tokens[1:], 'weights': weights[1:]}
30    )
31    training_client.forward_backward([datum], "cross_entropy")
32    training_client.optim_step()
33
34# Save and create sampler
35sampling_client = training_client.save_weights_and_get_sampling_client(
36    name="medical_qa_v1"
37)
38
39# Inference: Generate response to new query
40new_conversation = [
41    {'role': 'system', 'content': 'Provide medical information accurately.'},
42    {'role': 'user', 'content': 'What are symptoms of hypoglycemia?'}
43]
44
45# Build prompt and sample
46prompt = renderer.build_generation_prompt(new_conversation)
47stop_sequences = renderer.get_stop_sequences()
48result = sampling_client.sample(
49    prompt,
50    sampling_params=types.SamplingParams(
51        max_tokens=150, 
52        temperature=0.3, 
53        stop=stop_sequences
54    ),
55    num_samples=1
56).result()
57
58# Parse response
59response_message, success = renderer.parse_response(result.sequences[0].tokens)
60print(f"Assistant: {response_message['content']}")
61print(f"Parse successful: {success}")

Design Philosophy: Python Over Templates

Bios uses Python-based rendering instead of template engines (like Jinja2) for several technical advantages:

✓ Python Advantages

• Precise control: Exact whitespace and special token handling
• Type safety: IDE autocomplete and static type checking
• Token-level weights: Native support for loss weighting
• Debugging: Standard Python debugging tools work
• Composability: Easy to extend for custom formats
• Testing: Unit tests for rendering logic

✗ Template Limitations

• Whitespace errors: Hard to get spacing exactly right
• Limited logic: Complex rendering requires workarounds
• No weight support: Can't specify token-level weights
• Debugging difficulty: Template errors are cryptic
• Training-specific: Not designed for SL/RL workflows
• Maintenance burden: Jinja syntax vs familiar Python

Advanced Rendering Patterns

Custom Renderer Implementation

Create custom renderers for specialized message formats:

Custom Renderer

1from bios_cookbook.renderers import BaseRenderer
2from bios import types
3
4class FinancialRenderer(BaseRenderer):
5    """Custom renderer for financial analysis conversations"""
6    
7    def build_generation_prompt(self, messages):
8        """Add financial context and formatting"""
9        # Add system context for financial domain
10        if messages[0]['role'] != 'system':
11            messages = [
12                {'role': 'system', 'content': 'Expert financial analyst'},
13                *messages
14            ]
15        
16        return super().build_generation_prompt(messages)
17    
18    def build_supervised_example(self, messages):
19        """Custom weighting for financial terminology"""
20        tokens, weights = super().build_supervised_example(messages)
21        
22        # Increase weights on financial terms
23        # (custom logic to detect and upweight domain vocabulary)
24        weighted_tokens = self._apply_term_weighting(tokens, weights)
25        
26        return tokens, weighted_tokens
27    
28    def _apply_term_weighting(self, tokens, weights):
29        """Apply higher weights to financial terms"""
30        # Detect financial terms and increase their weights
31        # Implementation details...
32        return weights
33
34# Usage
35renderer = FinancialRenderer(tokenizer)
36tokens, weights = renderer.build_supervised_example(financial_conversation)

Multi-Turn Conversation Context

Understanding how the renderer handles multi-turn contexts is crucial for effective training:

Context Window Structure

Turn 1:

System message (weight=0)

Provides instruction context

Turn 2:

User question (weight=0)

First query

Turn 3:

Assistant response (weight=0)

Becomes context for next turn

Turn 4:

Follow-up question (weight=0)

References previous exchange

Turn 5:

Final assistant response (weight=1) ← TRAINING TARGET

Model learns to generate this given all context

This weighting strategy teaches the model to:

Maintain conversation context across multiple turns
Reference previous exchanges when formulating responses
Generate contextually appropriate follow-up answers
Distinguish between initial questions and clarification requests

Rendering Best Practices

Validate Rendering Output

Always visualize a few examples with format_colorized()to verify correct prompt/completion boundaries.

1# Debug first few examples
2for i, conv in enumerate(conversations[:3]):
3    tokens, weights = renderer.build_supervised_example(conv)
4    print(f"\nExample {i}:")
5    print(format_colorized(tokens, weights, tokenizer))

Handle Special Tokens Consistently

Ensure special tokens (<|im_start|>, <|im_end|>, etc.) are handled consistently between training and inference. The renderer manages this automatically when you use its methods.

Test Parse Success

Always check the parse_success flag when parsing model outputs to catch malformed responses:

1message, success = renderer.parse_response(tokens)
2if not success:
3    print("Warning: Failed to parse model output")
4    # Handle parsing failure
5else:
6    print(f"Parsed: {message['content']}")

Next Steps

Master rendering for advanced training techniques:

RLHF Training →

Use rendering in reinforcement learning workflows

Custom Loss Functions →

Combine rendering with custom training objectives