SOTA, general text2sql pipeline based on research results.

Python 100%

Find a file

Friedrich 32b43d59a6 Fix and rerun test script		2026-04-22 14:00:37 +02:00
.chromadb	Fix and rerun test script	2026-04-22 14:00:37 +02:00
data	Enabled liteLLM, add first evaluation	2026-04-01 16:11:24 +02:00
results	Fix and rerun test script	2026-04-22 14:00:37 +02:00
scripts	Enabled liteLLM, add first evaluation	2026-04-01 16:11:24 +02:00
src	Fix and rerun test script	2026-04-22 14:00:37 +02:00
tests	Initial Commit	2026-04-01 12:07:45 +02:00
.env.example	Enabled liteLLM, add first evaluation	2026-04-01 16:11:24 +02:00
.gitignore	Initial Commit	2026-04-01 12:07:45 +02:00
adaptation-plan.md	Initial Commit	2026-04-01 12:07:45 +02:00
pyproject.toml	Enabled liteLLM, add first evaluation	2026-04-01 16:11:24 +02:00
README.md	Initial Commit	2026-04-01 12:07:45 +02:00

README.md

t2sql-framework

An explicit-stage text-to-SQL pipeline with tracing, evaluation harness, and a data-driven iteration loop.

Conceptual Goal

Current text-to-SQL libraries wrap everything in an agent loop (see https://github.com/Text2SqlAgent/text2sql-framework, accessed 01.04.2026). This project takes the opposite approach: four hard stage boundaries enforced in code, so every failure is isolable, every stage is independently testable, and token budgets are deterministic, that was derived from systematically accessing the capabilities in a scientific review, updated to modern LLM infrastructure.

This costs flexibility but buys testability, deterministic token budgets, and isolable failure points. If observability matters less than shipping speed, the text2sql-framework agent loop is a legitimate alternative, but for now post-poned.

The pipeline flow is:

question
  → [stage 2] schema linking — retrieve relevant DDL + FK hints
  → [stage 3] generation    — assemble prompt + call LLM + extract SQL
  → [stage 4] refinement    — execute → if error, correct (up to 3 rounds)
  → (sql, dataframe, trace)

Every ask() call returns a structured QueryTrace. Accumulated traces feed AnalysisEngine, which surfaces retrieval noise, synonym mismatches, and schema gaps — the evidence base for deciding what to improve next.

The LLM abstraction (llm/base.py) targets OpenAI-compatible endpoints, covering self-hosted models, rented inference servers, and OpenAI cloud via a configurable base_url. The embedding function is injectable, so schema content can stay on your infrastructure.

2. Repository Structure

src/
├── pipeline/
│   ├── __init__.py
│   ├── orchestrator.py       # ask() — calls stages 2→3→4 in sequence, returns (sql, df, trace)
│   ├── stage2_schema.py      # Schema linking: entity extraction → filtered DDL + FK annotation
│   ├── stage3_generate.py    # Prompt assembly + LLM call + SQL extraction
│   └── stage4_refine.py      # Execution feedback: run → error → correct (max 3 rounds)
│
├── prompt/
│   ├── __init__.py
│   ├── builder.py            # Assembles message list from stage 2+3 outputs
│   ├── templates.py          # System instruction, FK hints, type-specific hints
│   └── dialects.py           # Per-dialect catalog SQL templates (from text2sql-framework)
│
├── retrieval/
│   ├── __init__.py
│   ├── base.py               # Abstract interface + injectable embed_fn
│   └── <backend>.py          # Concrete implementation: qdrant.py OR pgvector.py
│
├── llm/
│   ├── __init__.py
│   ├── base.py               # Abstract interface: call(system, messages) → str
│   ├── openai.py             # OpenAI implementation (primary — covers self-hosted via base_url)
│   └── anthropic.py          # Anthropic implementation (secondary — if direct API access needed)
│
├── schema/
│   ├── __init__.py
│   └── types.py              # TableDefinition, ColumnDefinition dataclasses (from Vanna)
│
├── tracing/
│   ├── __init__.py
│   ├── tracer.py             # QueryTrace dataclass + Tracer (from text2sql-framework)
│   └── analyze.py            # AnalysisEngine: trace mismatch detection (from text2sql-framework)
│
├── connection/
│   ├── __init__.py
│   └── database.py           # SQLAlchemy wrapper + _is_read_only() guard (from text2sql-framework)
│
├── eval/
│   ├── __init__.py
│   ├── harness.py            # EX/EM scoring
│   └── dataset.py            # Test set loader: list of {question, expected_sql}
│
└── utils/
    ├── __init__.py
    ├── tokens.py             # Token counting (provider-aware)
    └── sql.py                # extract_sql, is_read_only

Setup

Requirements: Python ≥ 3.11, uv

# clone and enter the repo
git clone <repo-url>
cd t2sql-framework

# create virtual environment and install dependencies
uv venv
uv sync

# optional extras
uv sync --extra pgvector    # PostgreSQL + pgvector backend
uv sync --extra qdrant      # Qdrant backend
uv sync --extra anthropic   # Anthropic API support
uv sync --extra dev         # pytest

Activate the environment:

source .venv/bin/activate

Roadmap

M0 — Scaffold + Baseline
- M0.1 — repo structure, pyproject.toml, package skeleton
- M0.2 — port foundation: utils/, schema/, connection/, tracing/, llm/, retrieval/
- M0.3 — wire orchestrator.ask() (retrieval → prompt → LLM → SQL)
- M0.4 — connection layer + read-only guard
- M0.5 — tracing: ask() returns (sql, df, trace)
- M0.6 — eval harness: 30–50 labeled Q/SQL pairs, baseline EX score
M1 — Execution Feedback Loop — stage4_refine, retry on error, trace attempt count
M2 — Prompt Cleanup (skippable) — k=4, FK hint section, tiktoken token counts
M3 — Schema Handling — dynamic discovery (Path A) or static linking (Path B), synonym fixes
M4 — Query Type Classifier (validate before building) — only if hardcoded hints move EX
M5 — Multi-Candidate Sampling (opt-in) — N candidates, majority-vote on result hash

README.md Unescape Escape