Features

Everything you need to ship with AI.

A composable platform: pre-built agents, raw model access, tool calling, streaming — under one roof.

Specialised agents

Pre-built workers, ready to plug in.

Each agent ships with a tuned system prompt, tool set, and output format. Use as-is or override.

Researcher

Searches the web, dedupes sources, builds an evidence-based answer with footnoted citations. Best for fact-finding and competitive intel.

Writer

Long-form drafting with style controls — tone, audience, length, structure. Outputs Markdown by default. Great for posts, emails, proposals.

Coder

Generates, refactors, and explains code in TypeScript, Python, Go, Rust, SQL and more. Returns fenced blocks with context-aware imports.

Analyst

Reads CSVs and tables, extracts insights, builds summary tables, suggests visualisations. Useful for ad-hoc data questions.

Critic

Reviews drafts, flags weak claims, suggests edits with reasons. Good as a second pair of eyes before you ship.

Translator

Translation that preserves tone, idiom, and formatting. 80+ languages. Optionally returns a back-translation for verification.

Conversationalist

Open-ended chat with persistent context, friendly tone, and clarifying follow-ups. Default agent for the Playground.

Scheduler

Triggers agents on a cadence. "Every Monday, summarise last week's news in my niche." Send results via email or webhook.

Orchestrator

Chains agents into multi-step workflows. Researcher feeds Writer feeds Critic — one prompt, three passes, a final draft.

Tools & primitives

The building blocks underneath.

Streaming responses

Token-by-token streaming over Server-Sent Events. Drop-in compatible with the OpenAI SDK pattern — no SDK lock-in. Cancel mid-stream when the user stops reading.

Tool calling

Define tools as JSON schemas. The model decides when to call them, you execute and return the result. Loop until the agent stops asking.

Long context

Up to 200k tokens of context on Smart and Pro tiers. Drop in a whole codebase or a 300-page PDF and ask questions about it.

Structured output

Force responses to fit a JSON schema you specify. No more parsing prose — get strongly-typed objects you can pipe straight into your app.

System prompts

Bake in role, voice, format, and constraints. Reusable across every conversation an agent has. Override per-request when needed.

Conversation memory

Multi-turn history that survives across sessions. Server-side persistence optional — by default everything is in-memory and ephemeral.

Production-ready

The boring stuff, done right.

Secrets stay server-side

Your API key never ships to the browser. The frontend talks to your backend, the backend talks to Ruflo.

Predictable latency

Tiered models with documented p50/p95 latency budgets. Swap tiers without code changes.

Automatic retries

Transient upstream errors are retried with exponential backoff so they never reach your users.

Be first to try Ruflo.

Join the waitlist for early access.