Falsafa by the numbers

Every claim in the launch writeup deep-links to its source. Audit anything.

Corpus

total works

deep-links to the catalogue

3,087,298

total words across all variants

view raw manifest

Eras & languages

French, German, Kawi, Old English, Sanskrit, Urdu

view languages

eras spanned

view era index

Eval

146

cases run, paper-grade post-patch

view raw eval JSON

94%

mechanical pass rate

view raw eval JSON

917

paragraph citations resolved against the corpus

audit each one

verse-marker hallucinations (target: 0)

why this matters

Sample size is small (post-anti-cheat-patch). Scoring is deterministic — citation match against expected_works, no LLM judge.

Benchmark vs hybrid RAG

Citation-validity rate

Falsafa forthcoming

Hybrid forthcoming

p95 latency

Falsafa forthcoming

Hybrid forthcoming

One-time embedding cost

Falsafa $0

Hybrid forthcoming

Hybrid RAG baseline (apps/baseline/) is in development. Falsafa's approach reads markdown directly through MCP tools, so there are no embeddings to compute. Hybrid figures land when the baseline runs against the same 1,000-question pool.

Build

818

logical chapters

view manifest

2,053

variant entries (translation, transliteration, original)

view manifest

76,210

cited paragraphs with stable hash IDs

view corpus tree