New paper alert: "Fairness Across 47 Languages: How Safety Guardrails Fail in Low-Resource Settings"
Our most concerning finding: models that score well on English safety benchmarks fail catastrophically in low-resource languages. The safety gap between English and languages like Yoruba or Bengali is enormous.
This is a massive blind spot in the industry. Thread with key findings below.
@sarahchen·ML Engineering Lead · Previously DeepMind·
Curating the best open-source AI tools released in Q1 2026:
1. Llama 4 Scout — Meta's most capable open model yet
2. Stable Diffusion 4 — Incredible image quality improvements
3. DeepSeek-R1 — Best reasoning in open source
4. Whisper V4 — Near-human transcription quality
5. Moshi by Kyutai — Real-time multimodal conversation
What am I missing? Drop your favorites below. 👇
Controversial: The 'bigger is better' era of foundation models is ending.
Our latest research shows that smaller, specialized models (7-13B parameters) consistently outperform 70B+ generalists on domain-specific tasks when properly fine-tuned.
The future isn't one mega-model. It's an ecosystem of specialized experts.
@weizhang·Head of AI Safety Research at Anthropic·
If you're an AI researcher feeling burned out, you're not alone.
The pace of this field is unsustainable. New papers every day, pressure to publish, constant paradigm shifts.
Here's what's helping me:
• Blocking 2 hours daily for deep reading (no Slack)
• Saying no to 80% of speaking invitations
• Accepting that you can't read everything
• Finding 2-3 people you trust for paper summaries
Your m...
@sophiakim·AI Research Scientist at Google DeepMind·
Fascinating result from our experiments on in-context learning:
We found that the order of few-shot examples matters dramatically — sometimes more than the examples themselves.
Optimal ordering improved accuracy by 15-30% across 12 benchmarks. We're calling it 'positional priming' and working on a paper.
Has anyone else observed this?
@marcusthompson·Founding Engineer at Runway · Ex-OpenAI·
Lessons from building real-time AI video generation at Runway:
1. Latency budgets are everything — users notice >100ms
2. Streaming architectures beat batch processing 10:1
3. Progressive rendering is key to perceived speed
4. GPU memory management is the real engineering challenge
5. The gap between demo and production is 6-12 months
Shipping creative AI is wildly different from shipping chatbo...
The EU AI Act is now being enforced. Here's what every AI team needs to know:
🟢 Low risk: Most AI applications — no requirements
🟡 Limited risk: Chatbots — transparency obligations
🔴 High risk: Healthcare, hiring, credit — full compliance
🚫 Unacceptable: Social scoring, real-time biometric — banned
The compliance deadline for high-risk systems is 6 months away. Start now.
A practical guide to model monitoring in production:
1. Track output distribution shifts (not just accuracy)
2. Monitor latency at p50, p95, and p99
3. Set up automatic fallbacks to simpler models
4. Log all inputs/outputs (with PII handling)
5. Create canary deployments for model updates
Most teams skip monitoring until something breaks. Don't be most teams.