Different Ways to Generate Responses

When working with AI models, you can interact at different levels of detail—like how you can control a car using the steering wheel and pedals, or you could theoretically control every individual component of the engine. Most of the time, the higher-level interface is what you want, but sometimes you need finer control.

Two Levels of Control

Bios offers two ways to generate responses: a simple, conversation-based approach for most use cases, and a detailed, low-level approach for specialized training scenarios. Understanding which to use helps you build more efficiently.

Understanding the Two Approaches

Think of these as two different steering wheels for the same car—each optimized for different driving conditions:

💬

Conversation-Level (High-Level)

You provide messages in a conversation (like a chat interface), and the model responds with complete, natural language messages. This is what most applications use.

Best for:

• Regular chatbots and assistants
• Customer service applications
• Content generation
• Most production use cases

🔧

Token-Level (Low-Level)

You work with the individual building blocks (tokens) that make up text. This gives you precise control but requires understanding technical details.

Best for:

• Advanced RL training algorithms
• Custom training workflows
• Research and experimentation
• Very specialized applications

Which Approach Should You Use?

For the vast majority of applications, the conversation-level approach is what you want. Here's the breakdown:

✓ Use Conversation-Level

•Building applications: You're creating chatbots, assistants, or content generators
•Using pre-trained models: You're just using existing models via API
•Standard training: Regular fine-tuning on conversation data
•Prototyping: Testing ideas quickly without low-level complexity
•Production systems: Most real-world applications work at this level

⚙️ Use Token-Level

•Advanced RL training: Implementing custom reinforcement learning algorithms
•Research projects: Experimenting with novel training techniques
•Precise control needed: You need to manipulate individual parts of responses
•Custom algorithms: Building new training methods from scratch
•Deep expertise: You have ML research background and specific technical needs

A Simple Analogy

Think about controlling a car:

🚗

Driver's Interface

You use the steering wheel, pedals, and gear shift. You don't think about fuel injection timing or transmission hydraulics.

= Conversation-Level (what most people need)

🔧

Mechanic's Interface

You can adjust timing belts, tune fuel mixtures, and modify individual engine parameters. More control, but you need expertise.

= Token-Level (for specialists only)

What This Means for Your Project

In practice, here's how this affects your work:

Most Applications: Conversation-Level

If you're building a chatbot, customer service tool, content generator, or any typical AI application, you'll work with conversations—just like talking to a person. You send questions or instructions, get back responses.

Example: You send: "Write a product description for noise-canceling headphones" → You get back: A complete, natural language product description.

Special Cases: Token-Level

Only needed when you're implementing advanced RL training algorithms or conducting research that requires manipulating the fundamental building blocks of text generation.

When you'd care: You're a researcher implementing a new training algorithm, or you need very specific control over text generation that conversation-level doesn't provide.

Practical Implications

Here's what this means for common scenarios:

🤖 Building Apps

Use: Conversation-level

Why: Simple, intuitive, matches how users think

Benefit: Fast development, easy maintenance

🎓 Standard Training

Use: Conversation-level

Why: Training data is naturally in conversation format

Benefit: Matches how you organize data

🔬 RL Research

Use: Token-level

Why: RL algorithms need precise control

Benefit: Full flexibility for custom algorithms

The Good News: Bios Handles Both

You don't need to understand the technical details of how these work—Bios provides both interfaces and handles the complexity:

What You Control

• Choosing which level fits your needs
• Providing inputs (conversations or tokens)
• Deciding what to do with outputs
• Setting basic parameters (temperature, length)

What Bios Handles

• Converting between formats automatically
• Managing technical details of generation
• Optimizing performance and efficiency
• Ensuring consistency and reliability

Start Simple

Unless you have a specific technical reason to use token-level control (like implementing a custom RL algorithm), stick with the conversation-level approach. It's simpler, more intuitive, and covers 95% of use cases.

The Bottom Line

Bios gives you options for how to interact with AI models—a simple conversation-level interface for everyday use, and a detailed token-level interface for specialized scenarios. Most projects never need to think about this distinction.

Use the conversation-level approach unless you're implementing advanced RL algorithms or conducting research. It's simpler, more natural to work with, and what your users expect from AI interactions.

Learn About RL Training →Understanding Conversations