The foundation layer of the intelligence economy.
“Model serving, vector databases, and fine-tuning pipelines are becoming as essential as cloud compute was a decade ago.”
AI firms captured 61% of global venture capital in 2025, pulling in $258.7 billion according to OECD estimates. The top five hyperscalers have committed between $660 and $690 billion in capital expenditure for 2026 alone. These are not speculative bets on what AI might become. They are infrastructure commitments on the scale of national utilities.
The constraint has shifted. Model capability is no longer the bottleneck. The limiting factor is infrastructure throughput: how quickly can you serve a model at latency targets, how reliably can you retrieve context at query time, and how efficiently can you adapt a foundation model to domain-specific tasks. The historical parallel is AWS from 2006 to 2012, when cloud compute transitioned from experimental to essential. AI infrastructure is making that same transition now.
We believe the most durable companies in this cycle will be those that own critical nodes in the AI stack: the serving layer, the retrieval layer, and the adaptation layer. These are not features. They are the plumbing that everything else depends on.
01
The economics of AI are shifting from training to inference. Together AI raised a $305 million Series B, vLLM is seeking $160 million, and Fireworks AI has reached approximately $4 billion in valuation. The companies that can serve models faster and cheaper will capture the margin layer of the AI stack. Groq's LPU architecture delivers 185 tokens per second, rewriting assumptions about what inference hardware should look like.
02
Retrieval-augmented generation is a $1.96 billion market growing at 35% CAGR, projected to reach $40 billion by 2035. But simple vector retrieval is already insufficient. Graph RAG structures knowledge relationships. Agentic RAG lets models decide what to retrieve and when. The retrieval layer is becoming an intelligence layer in its own right.
03
Anthropic's Model Context Protocol has been adopted by the major IDE vendors. By 2026, an estimated 75% of AI gateway vendors will integrate agent-orchestration primitives. The tools for building, deploying, and monitoring autonomous agents are being built now. This is the equivalent of the container orchestration moment for AI.
04
Predibase's reinforcement fine-tuning (RFT) demonstrated that domain-adapted models can outperform general-purpose models at a fraction of the cost. Meanwhile, Lamini's struggles offer a cautionary lesson: fine-tuning infrastructure needs to solve the last-mile problem of data quality and evaluation, not just provide compute access. The winners will own the feedback loop.
The critical path between a trained model and a production response. vLLM has become the de facto open-source serving framework. Ray Serve and Anyscale provide the distributed compute layer. BentoML packages models for deployment. KServe is incubating within CNCF, signaling that model serving is becoming standardized infrastructure.
Purpose-built infrastructure for running models at scale. Together AI, Fireworks AI, Modal, and Replicate are competing to become the inference utility layer. Groq's custom LPU silicon delivers 185 tokens per second, challenging the assumption that inference must run on GPUs.
The memory layer for AI applications. Pinecone at $750 million valuation, Weaviate with $50 million Series C, Qdrant growing rapidly in the open-source community. LanceDB raised $30 million betting on embedded vector search. Milvus offers the most mature open-source option at scale.
Adapting foundation models to domain-specific tasks. Predibase pioneered reinforcement fine-tuning. Modal provides the compute substrate. The winners will own the workflow from data curation through evaluation, not just the training step.
Monitoring, debugging, and optimizing AI systems in production. Helicone, Langfuse, Weights & Biases, and Braintrust are building the observability stack for a new category of software. As AI moves from demo to production, the demand for operational tooling compounds.
The physical layer. NVIDIA Blackwell sets the performance ceiling. AMD MI300X provides competitive alternatives. Groq, Cerebras, and Tenstorrent are building purpose-designed silicon for inference workloads. The hardware layer determines the cost floor for everything above it.
“We back the builders who understand that infrastructure is a long game.”
Founders with hyperscaler or research lab backgrounds who have operated infrastructure at scale. They understand failure modes that only emerge at millions of requests per second.
GitHub stars and Docker pulls are 12-18 month leading indicators of commercial demand. The best infrastructure companies build community before they build sales teams.
Quantifiable, reproducible performance advantages. Not marketing claims. Latency, throughput, and cost-per-token metrics that can be independently verified.
Systems that connect training, serving, evaluation, and iteration into a single feedback loop. The most defensible infrastructure companies own the cycle, not just a step.
SOC 2 compliance, SLAs, and production workloads. The transition from developer adoption to enterprise procurement is the clearest signal that infrastructure has become essential.
Building in AI
infrastructure?