The new FLUX model is genuinely impressive for real-time video understanding. I've been benchmarking it against our production pipeline at Tesla. Key observations: • 3x faster inference than previous SOTA • Better temporal consistency across frames • Still struggles with fine-grained action recognition • Impressive zero-shot performance on edge cases Anyone else running comparisons?
38 reactions23 reposts