Model choice

Routing is an eval-backed rollout system

1chat should never switch models because a benchmark says a model is cheaper. It should switch because your labels and traces say the model is good enough for your use case.

Policy fields

{
  "mode": "balanced",
  "aggressiveness": 48,
  "qualityFloor": 0.96,
  "shadowTrafficPercent": 12,
  "conversationStickiness": true
}

The policy lives at the project level. API keys point to projects, so teams can run a conservative production project and a more experimental staging project from the same account.

Routing modes

offPin exactly what the request asked for. Traces and billing still work.
observeCollect traces, labels, costs, and recommendations without changing production output.
shadowRun candidate models out of band on a sampled slice of traffic to build evidence.
conservativeMove tiny traffic slices only after candidate models clear a high confidence threshold.
balancedTrade meaningful savings for monitored quality, with rollbacks when labels degrade.
aggressivePrioritize cost and latency improvement while accepting a higher chance of model churn.

Aggressiveness

Aggressiveness controls how quickly 1chat explores cheaper or faster models after the evidence is promising. It should influence candidate selection, rollout percentage, rollback tolerance, and how much recent data is required before increasing traffic.

0-25

High caution. Collect evidence, shadow more, route only low-risk categories.

26-60

Default startup posture. Move tasks with stable labels and clear cost deltas.

61-100

Cost-forward. Explore faster, monitor labels tightly, and keep rollback thresholds explicit.

Candidate model scoring

The router should score each candidate by a weighted blend of label pass rate, task similarity, confidence, cost delta, latency delta, provider health, context support, tool compatibility, and prior regressions.

score =
  qualityConfidence * 0.45 +
  taskSimilarity * 0.20 +
  costSavings * 0.15 +
  latencyGain * 0.10 +
  providerHealth * 0.05 +
  compatibility * 0.05

Conversation stickiness

Stickiness keeps a multi-turn conversation on the same selected model unless there is a deliberate boundary. This prevents a user from getting subtly different behavior midway through a support, tax, or agent workflow.

Default recommendation

Keep conversation stickiness on. Let routing change between conversations or after explicit project-level rollout events, not randomly in the middle of a user session.

Rollout guardrails

  • Do not route if the account has no relevant eval set for the task class.
  • Do not route if recent human labels fall below the configured quality floor.
  • Do not route if the candidate lacks required context length, tool calling, JSON mode, or image support.
  • Pause a candidate when inferred negative labels spike after the model answers.
  • Expose every route decision in the trace viewer with the selected model, alternatives, and reason.