Local SQL generation
Irodori Table can generate SQL from natural language with a lightweight local model, designed so correctness comes from a real SQL grammar rather than from model size. A tiny quantized model runs entirely on-device; every token it emits is forced through a schema-specialized SQL grammar, and the result is parsed back and validated against the schema. Deterministic completion stays the default — this is an opt-in, offline generation path.
Why it’s correct by construction
A small model is unreliable on its own. The design removes that risk:
- Grammar-backed SQL structure — a real, dialect-aware SQL parser + AST + GBNF grammar
(
irodori-sql:ast.rs,parser.rs,grammar.rs). - Grammar-constrained decoding — the model can only sample tokens the GBNF grammar allows, so output is always syntactically valid.
- Schema projection — table/column names become closed grammar terminals, so a hallucinated relation is literally unsamplable.
- Planning before decoding — tables and foreign-key joins are resolved deterministically before the model runs, shrinking its job so a 0.5B model suffices.
- Scoped grammar — on large schemas the GBNF is projected down to just the planned tables (+ their FK neighbors), so the grammar a big DB produces stays small: faster decode, lower memory, and the model can’t even mention an irrelevant table.
- Validate + repair — output is parsed and every identifier is proven to exist; anything else is rejected, not returned.
Pipeline
NL prompt + connection schema
→ project (GenSchema → GBNF grammar + SchemaIndex)
→ plan (mentioned tables + FK joins + compact prompt)
→ decode (llama.cpp, grammar-constrained, greedy)
→ verify (parse + schema-validate + canonical re-render)
→ SQL inserted into the editor (never executed)
Crates
irodori-sql—ast,parser,grammar(the shared syntax tree). Pure Rust, dialect-parameterized.irodori-generate—project,plan,verify,runtimeand the orchestrator. Thellamafeature adds the embedded runtime (llama.rs). Without it (or without a model) the engine returnsunsupported/not found.- desktop (
src-tauri/src/ai) — theai_generate_sql,ai_engine_status,ai_set_provider, andai_get_providercommands. The schema comes from the existing completion metadata cache; the dialect from the connection engine.
Runtime & model
- Embedded llama.cpp via
llama-cpp-2, CPU only, behind thellamacargo feature (the C++ build stays opt-in, likeduckdb). - Default model: Qwen2.5-Coder-0.5B-Instruct Q4_K_M (~0.4 GB), strong at text-to-SQL at its size.
- Lightness: mmapped weights, model loaded once and reused, a fresh context per
request (KV cache freed between calls), small
n_ctx, capped threads, and the grammar prunes the token space so fewer decode steps are needed.
Providers — connect any model (extensibility)
Everything talks to one trait, GrammarModel (runtime.rs), so backends are
fully pluggable. The verify gate is what makes every provider safe: output is
parsed and schema-validated regardless of source, so a backend that can’t honor
the GBNF grammar just gets its mistakes rejected instead of prevented — never
returned as invalid/hallucinated SQL.
Built-in providers:
| Provider | Crate item | Feature | Notes |
|---|---|---|---|
| Embedded llama.cpp | LlamaSqlModel | llama | GBNF-constrained, fully local/offline |
| Ollama | OllamaModel | http | any local Ollama model (7B/14B/32B…) |
| OpenAI-compatible API | OpenAiCompatModel | http | OpenAI, Azure, OpenRouter, gateways, many self-hosted/Anthropic-compatible |
| External CLI | CommandModel | (none) | Claude Code, Codex, Copilot, or any command — reuses your subscription |
| Echo (tests) | EchoModel | (none) | deterministic stand-in |
The desktop selects a provider at runtime via ai_set_provider /
ai_get_provider (AiProviderConfig { kind, model, endpoint, apiKey, program,
args }), surfaced in the generate dialog’s “Model provider” section. The http
and CLI providers are compiled in by default, so external/subscription models
work without the heavy llama build; the embedded model stays opt-in. API keys
are held in memory only (persist via the OS keychain through
security_store_secret).
Adding another provider (e.g. Anthropic-native, a local server, a queue) is a new
impl GrammarModel plus one AiProviderKind arm — no pipeline changes.
Build & use
# Standard build: AI generation compiled out, everything else unaffected.
cargo build -p irodori-table-desktop
# With local generation (compiles llama.cpp; set TMPDIR for the C++ build):
TMPDIR=.irodori-local/cc-tmp cargo build -p irodori-table-desktop --features llama
The embedded model is read only when it already exists in the app data dir under
models/. The desktop UI does not start a background download job; preinstall
the model for local generation, or use Ollama / API / CLI providers.
Safety
AI generation is opt-in, only ever inserts SQL into the editor (it never runs it), and references only objects that exist in the connected schema. This keeps the A5SQL-style read-only / AI-disabled posture intact.
Follow-ups
- Optional admin tooling for preinstalling local models outside the desktop UI.
- Expand the grammar/AST beyond
SELECT(DML/DDL) over later iterations. - Optional idle-unload timer to drop the model from memory after inactivity.