Docs — Ruflo

Quickstart

The fastest way to try Ruflo is the Playground — no signup, no setup. To use the API from your own code, send a POST request to /api/chat:

curl https://ruflo.io/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ruflo-free",
    "messages": [
      { "role": "user", "content": "Say hello in 5 words." }
    ]
  }'

Responses follow the standard chat-completions shape:

{
  "choices": [
    {
      "message": { "role": "assistant", "content": "Hi there, how are you?" }
    }
  ]
}

Model tiers

Four tiers, one consistent shape. Pick the cheapest one that does the job.

ruflo-free — quick answers, no credit needed. Good default.
ruflo-fast — sub-second responses, ideal for chat UIs.
ruflo-smart — balanced reasoning, most everyday tasks.
ruflo-pro — deepest reasoning, hard multi-step problems.

Authentication

The hosted Playground uses a key stored on the server — you never see it. For your own deployment, set OPENCODE_API_KEY as an environment variable on the backend that proxies to Ruflo.

Never embed the key in browser-side JavaScript. Anyone viewing your page would be able to read and reuse it.

Chat completions

Endpoint:

POST /api/chat

Request body:

{
  "model": "ruflo-smart",
  "messages": [
    { "role": "system",    "content": "You are a concise editor." },
    { "role": "user",      "content": "Trim this paragraph in half: ..." }
  ],
  "max_tokens": 800,
  "stream": false
}

Fields:

model — one of the four tier ids.
messages — array of {role, content}. Roles: system, user, assistant.
max_tokens — cap on the response length.
stream — set true for token-by-token SSE streaming.

Streaming

Add "stream": true. The response is a stream of data: SSE events; each contains a JSON delta. Render the partial content as it arrives:

const res = await fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ model: 'ruflo-fast', messages, stream: true })
});

const reader = res.body.getReader();
const decoder = new TextDecoder();
let buf = '';
for (;;) {
  const { value, done } = await reader.read();
  if (done) break;
  buf += decoder.decode(value, { stream: true });
  for (const evt of buf.split('\n\n')) {
    const line = evt.split('\n').find(l => l.startsWith('data:'));
    if (!line) continue;
    const data = line.slice(5).trim();
    if (data === '[DONE]') return;
    const json = JSON.parse(data);
    process.stdout.write(json.choices?.[0]?.delta?.content || '');
  }
}

Errors

Errors come back as JSON with HTTP status >= 400:

{ "error": { "message": "Body must include `model` and `messages[]`" } }

Common cases:

400 — malformed body or unknown tier id.
413 — conversation too long for the public demo.
429 — rate limit hit. Slow down or upgrade your tier.
500 — server-side configuration issue. Check /api/health.

Rate limits

The hosted Playground enforces a soft limit of ~16 KB of input per request and reasonable per-IP throttling. For higher limits, run your own backend with the same shape — Ruflo's tier names work identically there.

Privacy

The Playground keeps conversations in your browser tab only. Nothing is logged to disk on our side beyond standard request metrics (status code, latency, no message contents). Close the tab and the conversation is gone.

Need something the docs don't cover? Open the Playground and ask.