What Your Model Learns to Optimize

When you train a model, it needs to know what "success" looks like. Should it match training examples exactly? Should it maximize user satisfaction? Should it prefer responses that humans liked? The training objective (sometimes called the "loss function") defines this goal.

The Core Idea

Think of it like giving a salesperson clear targets: are they measured on total sales, customer satisfaction, repeat business, or some combination? Different goals lead to different behaviors. Same with AI models—the training objective shapes what the model learns to prioritize.

Standard Training Objectives

Bios provides three built-in objectives that cover most use cases. Each optimizes for different goals:

📚

Learning from Examples

The model learns to reproduce the responses in your training data as accurately as possible. Like teaching by showing someone exactly what to do, repeatedly, until they can do it themselves.

When to use: Regular training where you have good examples of correct responses. This is the default for most fine-tuning tasks.

🎯

Maximizing Rewards

The model learns to generate responses that maximize some score or reward. Like training a salesperson to maximize customer satisfaction scores rather than just following a script.

When to use: Reinforcement learning where you can score quality but don't have perfect examples to copy.

⚖️

Balanced Improvement

A more cautious version of reward maximization that prevents the model from changing too dramatically at once. Like allowing a sales team to try new approaches but keeping guardrails so they don't stray too far from proven methods.

When to use: Advanced RL training where stability is important, especially for production systems.

When Do You Need Something Custom?

The built-in objectives cover 95% of use cases. Custom objectives are only needed for very specific research or business requirements:

✓ Standard Objectives Work

•Regular training: Learning from example input-output pairs
•RL training: Optimizing based on scores or preferences
•Preference learning: Training on human comparison data
•Most production systems: Standard goals align with business needs

🔬 Custom Needed When

•Novel research: Experimenting with new training methods from academic papers
•Unique business metrics: Optimizing for very specific KPIs not captured by standard methods
•Multi-objective optimization: Balancing several competing goals with custom weighting
•Advanced techniques: Implementing cutting-edge algorithms from recent research

Understanding the Trade-offs

Custom objectives give you flexibility but come with costs:

Standard Objectives

✓Optimized for speed

✓Well-tested and stable

✓Cover most use cases

✓Easy to use

Custom Objectives

✓Full flexibility

✓Novel approaches possible

✗2-3x slower training

✗Requires deep expertise

The Speed Cost

Custom objectives require extra computation during training, making each training iteration 2-3x slower. For long training runs, this adds up—what takes 8 hours with a standard objective might take 16-24 hours with a custom one.

Making the Right Choice

Here's how to decide between standard and custom objectives:

Start with Standard

Unless you have a very specific reason to use a custom objective, start with the built-in options. They're fast, well-tested, and handle the vast majority of training scenarios successfully.

Consider Custom If

You've tried standard methods and they don't quite capture what you need, or you're implementing a specific research paper that requires a novel training approach.

Reality check: If you're asking whether you need custom objectives, you probably don't. Teams that need them usually know exactly why from the start.

Weigh the Cost

Custom objectives make training 2-3x slower. Ask yourself: will the benefit of this specific objective outweigh the extra time and compute cost? Often, the answer is no, and standard objectives work fine.

Real-World Scenarios

Here's what teams typically use for different applications:

💬 Chatbots

Objective: Standard (learning from examples)

Why: You have conversation data to learn from

Works well: 99% of chat applications

🎨 Creative Content

Objective: Reward maximization or preference learning

Why: Quality is subjective, learned from feedback

Works well: Writing, design, recommendations

🔬 Research Projects

Objective: Sometimes custom

Why: Testing novel training algorithms

Trade-off: Slower, but enables new techniques

The Bottom Line

The training objective is what your model learns to optimize for—whether that's matching examples exactly, maximizing rewards, or following human preferences. For most real-world applications, the standard objectives work excellently.

Custom objectives are powerful for specialized research or unique business requirements, but they come with a 2-3x training speed penalty. Only go custom if you have a clear, specific reason that standard objectives can't address.

Learn Standard Training →Explore RL Training