API Reference

Complete reference documentation for the Bios distributed training API. This guide covers all training functions, configuration options, and best practices for fine-tuning UltraSafe expert models.

API Design Philosophy

Bios provides a minimal, composable API for distributed training. You write simple training loops through our API, and Bios handles everything on UltraSafe's proprietary GPU infrastructure with distributed orchestration, gradient synchronization, and fault tolerance automatically.

Quick Navigation

Supported Models

UltraSafe expert models available for fine-tuning: healthcare, finance, code, conversation, and general-purpose models.

Training Functions

Simplified high-level train() API for distributed training on UltraSafe's GPU cloud, plus sampling and checkpoint management.

LoRA Configuration

Configure LoRA parameters for efficient fine-tuning: rank, alpha, target modules, and dropout settings.

State Management

Checkpoint creation, restoration, and model export for production deployment and experiment tracking.

Core API Components

ServiceClient

The entry point to Bios. Discovers available models and creates training clients.

1import bios
2
3# Initialize service client (auto-detects BIOS_API_KEY)
4service_client = bios.ServiceClient()
5
6# Or with explicit API key
7service_client = bios.ServiceClient(api_key="your-key-here")
8
9# Discover available models
10capabilities = service_client.get_server_capabilities()
11for model in capabilities.supported_models:
12    print(model.model_name)

TrainingClient

Represents a fine-tuning session. Execute training operations, manage state, and sample from trained models.

1# Create LoRA training client
2training_client = service_client.create_lora_training_client(
3    base_model="ultrasafe/usf-finance",
4    rank=16,  # LoRA rank
5    alpha=32  # LoRA alpha (typically 2x rank)
6)
7
8# Simplified training API - runs on UltraSafe's GPU cloud
9result = training_client.train(data, loss_fn="cross_entropy", learning_rate=1e-4)
10
11# Get results
12fwd_result = fwd_future.result()
13opt_result = opt_future.result()

SamplingClient

Generate text from your fine-tuned model. Created from saved training checkpoints.

1# Save weights and get sampling client
2sampling_client = training_client.save_weights_and_get_sampling_client(
3    name="my-finance-model"
4)
5
6# Generate text
7result = sampling_client.sample(
8    prompt=bios.types.ModelInput.from_ints(tokens),
9    sampling_params=bios.types.SamplingParams(
10        max_tokens=256,
11        temperature=0.7,
12        top_p=0.9
13    ),
14    num_samples=4
15).result()

Essential Training Functions

The core functions you'll use in every training script:

Function	Purpose	Returns
`forward_backward()`	Execute forward pass and compute gradients with custom loss function	Future[ForwardBackwardResult]
`optim_step()`	Apply accumulated gradients to update model parameters	Future[OptimStepResult]
`sample()`	Generate text from current model state or saved checkpoint	Future[SampleResult]
`save_state()`	Create checkpoint with model weights and optimizer state	Future[StateResult]
`load_state()`	Restore training session from checkpoint	TrainingClient
`get_tokenizer()`	Get the tokenizer for the base model	Tokenizer

Asynchronous Operations

Bios API functions return futures immediately, allowing efficient pipelining of operations:

Synchronous API (Simple)

1# Queue forward/backward
2fwd_future = training_client.forward_backward(data, "cross_entropy")
3
4# Queue optimizer step
5opt_future = training_client.optim_step(adam_params)
6
7# Wait for completion
8fwd_result = fwd_future.result()  # Blocks until complete
9opt_result = opt_future.result()  # Already completed (pipelined)

Async API (Advanced)

1import asyncio
2
3async def train_step(training_client, data):
4    # Use async version for better concurrency
5    result = await training_client.train_async(
6        data,
7        loss_fn="cross_entropy",
8        learning_rate=1e-4
9    )
10    
11    # Await results
12    fwd_result = await fwd_future
13    opt_result = await opt_future
14    
15    return fwd_result.loss
16
17# Run async training
18asyncio.run(train_step(training_client, batch))

Performance Tip

Always submit both forward_backward() and optim_step() before calling .result(). This allows Bios to pipeline the operations for maximum GPU utilization.

Type System

Bios uses a strongly-typed API through the bios.types module:

Data Types

Datum - Single training example
ModelInput - Input tokens
ModelOutput - Model predictions

Parameter Types

AdamParams - Adam optimizer config
SamplingParams - Generation settings
LoraConfig - LoRA configuration

Complete Training Workflow

A typical training workflow combines all API components:

Full Training Pipeline

1import bios
2from bios import types
3
4# 1. Initialize
5service_client = bios.ServiceClient()
6training_client = service_client.create_lora_training_client(
7    base_model="ultrasafe/usf-finance",
8    rank=16,
9    alpha=32
10)
11
12# 2. Prepare data
13tokenizer = training_client.get_tokenizer()
14# ... tokenize and prepare training data ...
15
16# 3. Training loop
17for epoch in range(num_epochs):
18    for batch in dataloader:
19        # Single train() call handles everything
20        result = training_client.train(
21            batch,
22            loss_fn="cross_entropy",
23            learning_rate=1e-4
24        )
25        
26        # Get results
27        fwd_result = fwd_future.result()
28        opt_result = opt_future.result()
29        
30        print(f"Loss: {fwd_result.loss:.4f}")
31    
32    # 4. Save checkpoint
33    checkpoint = training_client.save_state(
34        name=f"epoch_{epoch}"
35    ).result()
36    print(f"Saved: {checkpoint.path}")
37
38# 5. Deploy model
39sampling_client = training_client.save_weights_and_get_sampling_client(
40    name="production-model"
41)
42
43# 6. Generate predictions
44result = sampling_client.sample(
45    prompt=types.ModelInput.from_ints(prompt_tokens),
46    sampling_params=types.SamplingParams(
47        max_tokens=256,
48        temperature=0.7
49    )
50).result()
51
52print(tokenizer.decode(result.sequences[0].tokens))

Explore the API

Dive deeper into specific API components:

Supported Models →Training Functions →LoRA Configuration →State Management →