What Your Model Learns to Optimize
When you train a model, it needs to know what "success" looks like. Should it match training examples exactly? Should it maximize user satisfaction? Should it prefer responses that humans liked? The training objective (sometimes called the "loss function") defines this goal.
The Core Idea
Think of it like giving a salesperson clear targets: are they measured on total sales, customer satisfaction, repeat business, or some combination? Different goals lead to different behaviors. Same with AI models—the training objective shapes what the model learns to prioritize.
Standard Training Objectives
Bios provides three built-in objectives that cover most use cases. Each optimizes for different goals:
Learning from Examples
The model learns to reproduce the responses in your training data as accurately as possible. Like teaching by showing someone exactly what to do, repeatedly, until they can do it themselves.
When to use: Regular training where you have good examples of correct responses. This is the default for most fine-tuning tasks.
Maximizing Rewards
The model learns to generate responses that maximize some score or reward. Like training a salesperson to maximize customer satisfaction scores rather than just following a script.
When to use: Reinforcement learning where you can score quality but don't have perfect examples to copy.
Balanced Improvement
A more cautious version of reward maximization that prevents the model from changing too dramatically at once. Like allowing a sales team to try new approaches but keeping guardrails so they don't stray too far from proven methods.
When to use: Advanced RL training where stability is important, especially for production systems.
When Do You Need Something Custom?
The built-in objectives cover 95% of use cases. Custom objectives are only needed for very specific research or business requirements:
✓ Standard Objectives Work
- •Regular training: Learning from example input-output pairs
- •RL training: Optimizing based on scores or preferences
- •Preference learning: Training on human comparison data
- •Most production systems: Standard goals align with business needs
🔬 Custom Needed When
- •Novel research: Experimenting with new training methods from academic papers
- •Unique business metrics: Optimizing for very specific KPIs not captured by standard methods
- •Multi-objective optimization: Balancing several competing goals with custom weighting
- •Advanced techniques: Implementing cutting-edge algorithms from recent research
Understanding the Trade-offs
Custom objectives give you flexibility but come with costs:
Standard Objectives
Custom Objectives
The Speed Cost
Custom objectives require extra computation during training, making each training iteration 2-3x slower. For long training runs, this adds up—what takes 8 hours with a standard objective might take 16-24 hours with a custom one.
Making the Right Choice
Here's how to decide between standard and custom objectives:
Start with Standard
Unless you have a very specific reason to use a custom objective, start with the built-in options. They're fast, well-tested, and handle the vast majority of training scenarios successfully.
Consider Custom If
You've tried standard methods and they don't quite capture what you need, or you're implementing a specific research paper that requires a novel training approach.
Reality check: If you're asking whether you need custom objectives, you probably don't. Teams that need them usually know exactly why from the start.
Weigh the Cost
Custom objectives make training 2-3x slower. Ask yourself: will the benefit of this specific objective outweigh the extra time and compute cost? Often, the answer is no, and standard objectives work fine.
Real-World Scenarios
Here's what teams typically use for different applications:
💬 Chatbots
Objective: Standard (learning from examples)
Why: You have conversation data to learn from
Works well: 99% of chat applications
🎨 Creative Content
Objective: Reward maximization or preference learning
Why: Quality is subjective, learned from feedback
Works well: Writing, design, recommendations
🔬 Research Projects
Objective: Sometimes custom
Why: Testing novel training algorithms
Trade-off: Slower, but enables new techniques
The Bottom Line
The training objective is what your model learns to optimize for—whether that's matching examples exactly, maximizing rewards, or following human preferences. For most real-world applications, the standard objectives work excellently.
Custom objectives are powerful for specialized research or unique business requirements, but they come with a 2-3x training speed penalty. Only go custom if you have a clear, specific reason that standard objectives can't address.