Date: 2026-02-22
Author: Ely (via Claude Code)
Aivy (the Aivible team assistant on LINE) needed a model replacement. The previous models had critical issues:
| Model | Provider | Problem |
|---|---|---|
| Llama 3.3 70B | DO AI Inference | Unreliable tool use, hallucinated, stalled on file reads |
| Gemini 2.5 Flash | Google (via proxy) | Works great but free tier = 20 requests/day |
openai-completions provider"store": false in API requests — Gemini rejects with 400127.0.0.1:19999 to strip store field before forwardinggemini-proxy.service (systemd) for persistenceResearched alternatives after Gemini quota exhaustion:
| Provider | Model | Free Tier | Notes |
|---|---|---|---|
| Cerebras | GPT-OSS 120B | 1M tokens/day | Custom silicon, fastest inference |
| Groq | Llama 3.3 70B | 1K req/day | Same problematic model |
| SambaNova | Llama 3.1 405B | 200K tokens/day | Larger model but lower quota |
| Gemini | 2.5 Flash | 20 req/day | Too low for team of 4 |
Signed up at cloud.cerebras.ai (free, no CC required).
Discovered available models (Llama 3.3 70B no longer on Cerebras):
llama3.1-8b — small, basicgpt-oss-120b — 120B params, built-in reasoning, fastqwen-3-235b-a22b-instruct-2507 — huge MoE (not accessible on free tier)zai-glm-4.7 — GLM variantTested both accessible models:
| Test | llama3.1-8b | gpt-oss-120b |
|---|---|---|
| Basic response | Pass | Pass |
| Tool use (function calling) | Pass | Pass + reasoning |
| Response time | 59ms | 32ms |
| Model size | 8B | 120B |
Selected gpt-oss-120b — bigger, faster, has chain-of-thought reasoning.
Added Cerebras as custom provider in ~/.openclaw/openclaw.json:
"cerebras": {
"baseUrl": "https://api.cerebras.ai/v1/",
"apiKey": "<CEREBRAS_API_KEY>",
"api": "openai-completions",
"models": [
{
"id": "gpt-oss-120b",
"name": "GPT-OSS 120B (Cerebras)",
"reasoning": true,
"contextWindow": 131072,
"maxTokens": 8192
}
]
}
Set as default: agents.defaults.model.primary = cerebras/gpt-oss-120b
API key stored in ~/.openclaw/.env as CEREBRAS_API_KEY.
Via openclaw agent CLI:
| Test | Result |
|---|---|
| Identity ("who are you?") | Responds as Aivy |
| Project context (elevator pitch, deadlines) | Correct, detailed |
| Tool use (read TEAM.md) | Read file, formatted table correctly |
| Temporal awareness (today's date, weekly plan) | Correct date, actionable plan |
Via LINE (live DM):
| Metric | Value |
|---|---|
| Provider | cerebras |
| Model | gpt-oss-120b |
| Message 1 response time | ~1 second |
| Message 2 response time | ~4 seconds (includes memory_search) |
| Errors | None (isError=false) |
| Tool use | memory_search called successfully |
LINE → Cloudflare Tunnel → OpenClaw Gateway (port 18789)
↓
Cerebras API (gpt-oss-120b)
https://api.cerebras.ai/v1/
↓
Memory Search → Gemini Embeddings
(via proxy :19999)
store field fine)| Component | Monthly Cost |
|---|---|
| Cerebras API | $0 (free tier, 1M tokens/day) |
| DO Droplet | $24 |
| Gemini Embeddings | $0 (separate free tier) |
| LINE | $0 (free tier) |
| Total | $24/month |
/home/openclaw/.openclaw/openclaw.json — added Cerebras provider, set as default/home/openclaw/.openclaw/.env — added CEREBRAS_API_KEYgemini-proxy.service — still running for embeddings