Different Ways to Generate Responses
When working with AI models, you can interact at different levels of detail—like how you can control a car using the steering wheel and pedals, or you could theoretically control every individual component of the engine. Most of the time, the higher-level interface is what you want, but sometimes you need finer control.
Two Levels of Control
Bios offers two ways to generate responses: a simple, conversation-based approach for most use cases, and a detailed, low-level approach for specialized training scenarios. Understanding which to use helps you build more efficiently.
Understanding the Two Approaches
Think of these as two different steering wheels for the same car—each optimized for different driving conditions:
Conversation-Level (High-Level)
You provide messages in a conversation (like a chat interface), and the model responds with complete, natural language messages. This is what most applications use.
Best for:
- • Regular chatbots and assistants
- • Customer service applications
- • Content generation
- • Most production use cases
Token-Level (Low-Level)
You work with the individual building blocks (tokens) that make up text. This gives you precise control but requires understanding technical details.
Best for:
- • Advanced RL training algorithms
- • Custom training workflows
- • Research and experimentation
- • Very specialized applications
Which Approach Should You Use?
For the vast majority of applications, the conversation-level approach is what you want. Here's the breakdown:
✓ Use Conversation-Level
- •Building applications: You're creating chatbots, assistants, or content generators
- •Using pre-trained models: You're just using existing models via API
- •Standard training: Regular fine-tuning on conversation data
- •Prototyping: Testing ideas quickly without low-level complexity
- •Production systems: Most real-world applications work at this level
⚙️ Use Token-Level
- •Advanced RL training: Implementing custom reinforcement learning algorithms
- •Research projects: Experimenting with novel training techniques
- •Precise control needed: You need to manipulate individual parts of responses
- •Custom algorithms: Building new training methods from scratch
- •Deep expertise: You have ML research background and specific technical needs
A Simple Analogy
Think about controlling a car:
Driver's Interface
You use the steering wheel, pedals, and gear shift. You don't think about fuel injection timing or transmission hydraulics.
= Conversation-Level (what most people need)
Mechanic's Interface
You can adjust timing belts, tune fuel mixtures, and modify individual engine parameters. More control, but you need expertise.
= Token-Level (for specialists only)
What This Means for Your Project
In practice, here's how this affects your work:
Most Applications: Conversation-Level
If you're building a chatbot, customer service tool, content generator, or any typical AI application, you'll work with conversations—just like talking to a person. You send questions or instructions, get back responses.
Example: You send: "Write a product description for noise-canceling headphones" → You get back: A complete, natural language product description.
Special Cases: Token-Level
Only needed when you're implementing advanced RL training algorithms or conducting research that requires manipulating the fundamental building blocks of text generation.
When you'd care: You're a researcher implementing a new training algorithm, or you need very specific control over text generation that conversation-level doesn't provide.
Practical Implications
Here's what this means for common scenarios:
🤖 Building Apps
Use: Conversation-level
Why: Simple, intuitive, matches how users think
Benefit: Fast development, easy maintenance
🎓 Standard Training
Use: Conversation-level
Why: Training data is naturally in conversation format
Benefit: Matches how you organize data
🔬 RL Research
Use: Token-level
Why: RL algorithms need precise control
Benefit: Full flexibility for custom algorithms
The Good News: Bios Handles Both
You don't need to understand the technical details of how these work—Bios provides both interfaces and handles the complexity:
What You Control
- • Choosing which level fits your needs
- • Providing inputs (conversations or tokens)
- • Deciding what to do with outputs
- • Setting basic parameters (temperature, length)
What Bios Handles
- • Converting between formats automatically
- • Managing technical details of generation
- • Optimizing performance and efficiency
- • Ensuring consistency and reliability
Start Simple
Unless you have a specific technical reason to use token-level control (like implementing a custom RL algorithm), stick with the conversation-level approach. It's simpler, more intuitive, and covers 95% of use cases.
The Bottom Line
Bios gives you options for how to interact with AI models—a simple conversation-level interface for everyday use, and a detailed token-level interface for specialized scenarios. Most projects never need to think about this distinction.
Use the conversation-level approach unless you're implementing advanced RL algorithms or conducting research. It's simpler, more natural to work with, and what your users expect from AI interactions.