Falsafa by the numbers
Every claim in the launch writeup deep-links to its source. Audit anything.
Corpus
Eras & languages
Eval
Sample size is small (post-anti-cheat-patch). Scoring is deterministic — citation match against expected_works, no LLM judge.
Benchmark vs hybrid RAG
Citation-validity rate
Falsafa forthcoming
Hybrid forthcoming
p95 latency
Falsafa forthcoming
Hybrid forthcoming
One-time embedding cost
Falsafa $0
Hybrid forthcoming
Hybrid RAG baseline (apps/baseline/) is in development.
Falsafa's approach reads markdown directly through MCP tools, so
there are no embeddings to compute. Hybrid figures land when the
baseline runs against the same 1,000-question pool.