Anika Patel (@anikapatel)

@anikapatelVP Engineering at AnthropicMar 19

We just open-sourced our inference optimization toolkit that reduced our serving costs by 73% while maintaining 99.9% accuracy parity. Key techniques: • Speculative decoding with draft models • KV cache compression (4-bit quantization) • Dynamic batching with priority queues • Prefix caching for repeated prompts Repo link in comments. Happy to answer questions about production deployment.

#Natural Language Processing

190 reactions10 reposts

Anika Patel

Top Badges