Sample Diagnostic Reports

Two sample reports, one per engagement, so you can see exactly what you'll receive before booking.

AI Startups
Retrieval Audit Sprint: sample report
Dataset: BEIR / FiQA-2018 (18 pages)

Walkthrough of a retrieval pipeline audit: chunking and embedding review, judgment-set construction, NDCG@10, MRR, hit-rate@k, and faithfulness with bootstrap 95% confidence intervals, plus three prioritized remediation recommendations.

Sample report built on public benchmark data. Real client engagements produce identical structure on the client's own retrieval pipeline.

We'll email occasionally with new sample reports and methodology updates. Unsubscribe anytime.

E-commerce
Search Relevance Diagnostic: sample report
Dataset: WANDS (Wayfair, ECIR 2022) (24 pages)

Walkthrough of an independent relevance diagnostic: query coverage analysis, NDCG@10 and click-position-1 measurement, vendor-metric-vs-independent-metric gap analysis, conversion-correlated relevance findings, and a Q4 readiness appendix.

Sample report built on public benchmark data. Real client engagements produce identical structure on the client's own retrieval pipeline.

We'll email occasionally with new sample reports and methodology updates. Unsubscribe anytime.

Ready to run this against your own pipeline?

Book a 30-minute discovery call. I'll walk through your retrieval pipeline, your current evaluation methodology, and the trigger that brought you here.