Services

Two fixed-fee engagements designed to land in 7-10 business days, then a ladder of remediation and retainer options once I know what your pipeline actually needs.

Start here: the diagnostic engagements

Pick the engagement that matches your buyer. Both are fixed-fee, time-boxed, and produce a written report plus a prioritized remediation roadmap.

$7,500Fixed-fee

Retrieval Audit Sprint

7 business daysFounder, CTO, Head of ML/Eval at 5-30 person AI startups (Series Seed through Series B)

Independent retrieval pipeline audit with bootstrap-CI metrics and a runnable Python notebook you keep.

See full scope Book discovery call

$7,500-$9,500Fixed-fee

Search Relevance Diagnostic

10 business daysVP Ecommerce, Director of Digital, Head of Site Search at $20M-$200M GMV Shopify Plus / Adobe Commerce brands

Independent NDCG@10 + conversion-correlated relevance diagnostic against your existing search vendor.

See full scope Book discovery call

What happens after the diagnostic

When the diagnostic surfaces work worth doing, these are the named remediation SKUs the engagement can extend into.

Tier 2$20,000

Production-Grade Eval Harness

4 weeksAI startups that need procurement-grade evaluation infrastructure wired into CI before fundraising or enterprise pilots.

Deliverables

Stratified dataset (500-1,500 query/document pairs by head/torso/tail and query intent)
CI integration (GitHub Actions or GitLab CI) with configurable thresholds that block deploys on regressions
Dashboard: NDCG@k, MRR, hit-rate@k, faithfulness, custom metrics, per-query-class breakdowns, regression alerts
Statistical rigor layer: bootstrap confidence intervals and paired significance tests
Methodology runbook plus 1-2 weeks of Slack/email handoff support

Why $20K

The deliverable is the procurement-grade evaluation infrastructure that investors and enterprise procurement increasingly ask for as part of AI due diligence (Andreessen Horowitz, 2025 CIO Survey). Against a Series B round or a first enterprise contract in six to seven figures, the fee is a rounding error on the outcome the harness defends. Fixed-fee, four weeks, deliverable ships on the SOW date.

Discuss Eval Harness

Tier 2$25,000

Q4 Readiness Audit

4 weeksMid-market brands preparing for BFCM with one search vendor in place and an open question about whether to renew or replace.

Deliverables

Multi-vendor benchmark (NDCG@10, click-position-1, conversion-correlated relevance) on top 1,000-2,000 head and torso queries against 1-2 alternative search vendors
Full merchandising review: boost/bury rules, synonyms, redirects, banners, manual curation overlays
Q4 stress test against last year's BFCM query mix, promotion-aware relevance, out-of-stock handling
A/B test design for Q3 with hypothesis, sample-size math, and success criteria
60-minute executive presentation

Why Q4 Readiness

Adobe Analytics reported double-digit YoY growth in BFCM 2025 dollars and a meaningful conversion lift among AI-influenced shoppers. A diagnostic delivered in Q1-Q2 feeds directly into the Q3 testing window before peak.

Discuss Q4 Audit

Tier 2$25,000 - $75,000

Search Relevance Optimization

4-8 weeksTeams with an existing production search stack that need a measurable lift in retrieval quality without rebuilding the platform.

Deliverables

Hybrid retrieval implementation (BM25 plus dense vectors with reciprocal rank fusion or learned fusion)
Query understanding layer covering intent classification and entity extraction on the client query log
Cross-encoder reranking pipeline over the top-N candidates from the hybrid stage
Relevance evaluation framework with NDCG@10, MRR, and recall reported with bootstrap confidence intervals (DHSS, statistical-rigor methodology)
A/B testing infrastructure with hypothesis, sample-size math, and stop conditions

Why $25K-$75K

Hybrid retrieval and cross-encoder reranking typically add 5-15 NDCG@10 points over a tuned BM25 baseline on out-of-domain benchmarks (Thakur et al., BEIR, NeurIPS 2021; Santhanam et al., ColBERTv2, NAACL 2022). The engagement delivers production code, an evaluation harness, and a documented handoff, not a notebook proof of concept. The price band tracks scope: query log size, number of indexed corpora, and whether reranking ships behind a feature flag or as default.

Discuss Your Search

Tier 3$50,000 - $150,000

Custom Embedding Development

6-12 weeksTeams running generic API embeddings on a specialized corpus where domain terminology, product attributes, or user language is not well covered by general-purpose models.

Deliverables

Domain-adapted embedding model fine-tuned on client query-document pairs (sentence-transformers / SBERT methodology; Reimers & Gurevych, EMNLP 2019)
Training pipeline on proprietary data with held-out evaluation splits stratified by query class
Benchmark of the fine-tuned model against the prior generic baseline on a held-out test set, reported with bootstrap confidence intervals
Deployment artifacts: serving image, embedding-version registry, and rollback procedure
Retraining runbook with cadence and trigger criteria tied to data drift

Why $50K-$150K

Domain fine-tuning of embedding models is a published lift mechanism for specialized retrieval: SBERT (Reimers & Gurevych, EMNLP 2019) is the canonical reference, and the MTEB leaderboard (Muennighoff et al., EACL 2023) shows domain-specialized encoders consistently outperform general-purpose models on domain tasks. Scope covers training data curation, model selection, fine-tuning runs, evaluation, and a deployable serving path. The price band tracks dataset size, whether contrastive pairs need to be mined from logs or supplied as labels, and whether the model is served self-hosted or behind a managed inference endpoint.

Evaluate Your Embedding Needs

Tier 4$75,000 - $150,000

RAG Pipeline Development

8-12 weeksTeams building an AI assistant, knowledge system, or document Q&A product where retrieval quality is the dominant lever on output quality and procurement now asks for the eval methodology.

Deliverables

End-to-end RAG architecture with documented component boundaries (indexing, retrieval, reranking, generation, guardrails)
Hybrid retrieval layer with cross-encoder reranking over the top candidate set
Chunking strategy and metadata schema derived from the client corpus, not a default
Evaluation framework covering retrieval quality (NDCG@k, recall, MRR) and generation quality (faithfulness, answer relevance) with bootstrap confidence intervals
Production deployment with observability hooks and a regression alerting path

Why $75K-$150K

Retrieval is the dominant failure mode in production RAG: a recent due-diligence review of RAG evaluation practice (Martinon et al., arXiv 2507.21753, 2025) frames retrieval rigor as the prerequisite to defensible generation metrics, and DHSS (the hybrid-retrieval methodology underlying this engagement) treats the retrieval layer as the testable contract that downstream generation depends on. Scope covers the full pipeline end-to-end, with the retrieval and evaluation layers built to the same standard as the Production-Grade Eval Harness. The price band tracks corpus complexity, number of document types, and whether generation runs against a hosted API or a self-hosted model.

Plan Your RAG System

Retainers

For clients who completed an audit or diagnostic and want ongoing oversight without hiring a full-time IR engineer.

Tier 5$7,500/mo, 3-mo min

Fractional IR Advisor Retainer

OngoingAI startups that completed a Retrieval Audit Sprint and want ongoing IR oversight without hiring a full-time retrieval engineer.

Deliverables

Monthly retrieval-metric review (NDCG@10, MRR, hit-rate@k, faithfulness) against the eval harness
Async architecture review of pipeline changes (chunking, embeddings, reranking)
Pre-fundraising or pre-enterprise-pilot evaluation memo refresh
Priority Slack / email response during business hours

Discuss Retainer

Tier 5$5,000/mo, 3-mo min

Relevance Retainer

OngoingMid-market e-commerce brands that completed a Search Relevance Diagnostic and want monthly oversight of vendor relevance quality.

Deliverables

Monthly NDCG@10 + click-position-1 measurement against the existing search vendor
Quarterly judgment-set refresh for the top 500 head and torso queries
Quick-win recommendation review (config, synonyms, merchandising overlays)
Pre-peak-season and pre-vendor-renewal advisory

Discuss Retainer

Compare All Engagements

Engagement	Vertical	Price	Timeline	You Get	Best For
Retrieval Audit Sprint	AI Startups	$7,500	7 days	Independent retrieval audit + report + judgment set	Pre-fundraise / pre-pilot
Search Relevance Diagnostic	E-commerce	$7.5-9.5K	10 days	NDCG@10 + conversion-correlated diagnostic	Post-BFCM / vendor renewal
Production-Grade Eval Harness	AI Startups	$20K	4 wks	Stratified dataset, CI integration, statistical-rigor dashboard	Procurement-grade eval before fundraise
Q4 Readiness Audit	E-commerce	$25K	4 wks	Multi-vendor benchmark + merchandising review + Q3 A/B design	Pre-BFCM, vendor renew/replace
Relevance Optimization	Both	$25-75K	4-8 wks	Hybrid retrieval + cross-encoder reranking in production	Existing search needs a measurable lift
Custom Embeddings	Both	$50-150K	6-12 wks	Domain-adapted embedding model + training pipeline	Generic API embeddings miss domain terminology
RAG Pipeline Development	AI Startups	$75-150K	8-12 wks	End-to-end RAG with hybrid retrieval and reranking	Retrieval is the dominant lever on output quality
Fractional IR Advisor	AI Startups	$7,500/mo	3-mo min	Monthly metric review + async advisory	Post-diagnostic IR oversight
Relevance Retainer	E-commerce	$5,000/mo	3-mo min	Monthly NDCG@10 + judgment-set refresh	Vendor-quality oversight

Frequently Asked Questions

Not sure which engagement fits?

Book a 30-minute discovery call. I'll walk through your pipeline and route you to the right engagement, or to a resource that does more for you than I would.

Book Discovery Call

Services

Start here: the diagnostic engagements

What happens after the diagnostic

Deliverables

Why $20K

Deliverables

Why Q4 Readiness

Deliverables

Why $25K-$75K

Deliverables

Why $50K-$150K

Deliverables

Why $75K-$150K

Retainers

Deliverables

Deliverables

Compare All Engagements

Frequently Asked Questions

How do I know which tier I need?

Why $7,500? Isn't that low for IR consulting?

Which vertical am I?

Can I see a sample report before booking?

Do you offer fixed-price or time & materials?

How do you handle our data?

Who's the vendor and what law governs the SOW?

Not sure which engagement fits?