How Clipie picks the right model for each shot
Behind every Auto model choice is a small router that looks at job_type, resolution, duration, and live provider health. Here's the logic.
When you submit a job with model: "auto", the router chooses the cheapest model whose max_resolution covers your input and whose job_type matches. For text→video it additionally filters on max_duration_s.
Priority
Capabilities in the catalog carry a priorityvalue (lower is preferred). Ties are broken by the most recent health probe — a provider that 401'd in the last five minutes falls below an otherwise equal-priority rival.
Region
We track a region tag per capability (us / eu / apac / global). When the request carries a geo hint from the LB header, we prefer in-region providers to keep latency low; otherwise we fall through to global.
Health probes
At resolve-time we normally call provider.health() — except in cross-region hot paths (US worker → CN provider endpoint) where the probe itself is the slow thing. There we use skip_health=True and rely on the Temporal retry policy to catch transient provider outages at the submit step.
Why not pick the "best" model?
Because "best" depends on what you're shooting. Seedance is a clearer win for cinematic 5-10s clips; Kling's human motion is unmatched; Sora is the only option if you need 20 seconds with native audio. Auto just keeps you out of the cheapest / most-available bucket unless you've explicitly asked for a specific model.
