@marcusthompson·Founding Engineer at Runway · Ex-OpenAI·
Hot take: Most AI startups are over-engineering their ML pipelines and under-engineering their data pipelines.
Your model is only as good as your data. Spend 80% of your time on data quality, not architecture.
I've seen this pattern at 3 companies now. The ones that win focus relentlessly on data curation.
We just open-sourced our inference optimization toolkit that reduced our serving costs by 73% while maintaining 99.9% accuracy parity.
Key techniques:
• Speculative decoding with draft models
• KV cache compression (4-bit quantization)
• Dynamic batching with priority queues
• Prefix caching for repeated prompts
Repo link in comments. Happy to answer questions about production deployment.
@elenarodriguez·Research Director at Hugging Face·
Just published our comprehensive benchmark comparing 15 vector databases for production RAG systems.
TL;DR: No single winner. Pgvector is surprisingly competitive for <10M vectors. Pinecone leads on managed ease. Qdrant has the best price-performance ratio.
Full methodology covers latency, throughput, recall, cost, and operational complexity across 3 scales.
The new FLUX model is genuinely impressive for real-time video understanding. I've been benchmarking it against our production pipeline at Tesla.
Key observations:
• 3x faster inference than previous SOTA
• Better temporal consistency across frames
• Still struggles with fine-grained action recognition
• Impressive zero-shot performance on edge cases
Anyone else running comparisons?
@marcusthompson·Founding Engineer at Runway · Ex-OpenAI·
Lessons from building real-time AI video generation at Runway:
1. Latency budgets are everything — users notice >100ms
2. Streaming architectures beat batch processing 10:1
3. Progressive rendering is key to perceived speed
4. GPU memory management is the real engineering challenge
5. The gap between demo and production is 6-12 months
Shipping creative AI is wildly different from shipping chatbo...
The EU AI Act is now being enforced. Here's what every AI team needs to know:
🟢 Low risk: Most AI applications — no requirements
🟡 Limited risk: Chatbots — transparency obligations
🔴 High risk: Healthcare, hiring, credit — full compliance
🚫 Unacceptable: Social scoring, real-time biometric — banned
The compliance deadline for high-risk systems is 6 months away. Start now.