Sample Diagnostic Reports

Two sample reports, one per engagement, so you can see exactly what you'll receive before booking.

AI Startups

Retrieval Audit Sprint: sample report

Dataset: BEIR / FiQA-2018 (18 pages)

Walkthrough of a retrieval pipeline audit: chunking and embedding review, judgment-set construction, NDCG@10, MRR, hit-rate@k, and faithfulness with bootstrap 95% confidence intervals, plus three prioritized remediation recommendations.

Sample report built on public benchmark data. Real client engagements produce identical structure on the client's own retrieval pipeline.

See full scope

E-commerce

Search Relevance Diagnostic: sample report

Dataset: WANDS (Wayfair, ECIR 2022) (24 pages)

Walkthrough of an independent relevance diagnostic: query coverage analysis, NDCG@10 and click-position-1 measurement, vendor-metric-vs-independent-metric gap analysis, conversion-correlated relevance findings, and a Q4 readiness appendix.

Sample report built on public benchmark data. Real client engagements produce identical structure on the client's own retrieval pipeline.

See full scope

Ready to run this against your own pipeline?

Book a 30-minute discovery call. I'll walk through your retrieval pipeline, your current evaluation methodology, and the trigger that brought you here.

Book a 30-min discovery call