@marcusthompson·Founding Engineer at Runway · Ex-OpenAI·
Hot take: Most AI startups are over-engineering their ML pipelines and under-engineering their data pipelines.
Your model is only as good as your data. Spend 80% of your time on data quality, not architecture.
I've seen this pattern at 3 companies now. The ones that win focus relentlessly on data curation.
We just open-sourced our inference optimization toolkit that reduced our serving costs by 73% while maintaining 99.9% accuracy parity.
Key techniques:
• Speculative decoding with draft models
• KV cache compression (4-bit quantization)
• Dynamic batching with priority queues
• Prefix caching for repeated prompts
Repo link in comments. Happy to answer questions about production deployment.
New paper alert: "Fairness Across 47 Languages: How Safety Guardrails Fail in Low-Resource Settings"
Our most concerning finding: models that score well on English safety benchmarks fail catastrophically in low-resource languages. The safety gap between English and languages like Yoruba or Bengali is enormous.
This is a massive blind spot in the industry. Thread with key findings below.
PSA for anyone building RAG systems: Your chunking strategy matters more than your embedding model.
We tested 8 chunking approaches × 4 embedding models × 3 retrieval methods.
Result: The best chunking strategy outperformed the best embedding model by 23% on retrieval quality. Semantic chunking with 15-20% overlap is the sweet spot.
@marcusthompson·Founding Engineer at Runway · Ex-OpenAI·
Lessons from building real-time AI video generation at Runway:
1. Latency budgets are everything — users notice >100ms
2. Streaming architectures beat batch processing 10:1
3. Progressive rendering is key to perceived speed
4. GPU memory management is the real engineering challenge
5. The gap between demo and production is 6-12 months
Shipping creative AI is wildly different from shipping chatbo...