Discussion about this post

User's avatar
ToxSec's avatar

“It offers the highest intelligence-per-dollar ratio currently available. However, this intelligence comes with a hidden “tax” in the form of token bloat.”

great call out.

i will say gemini flash is probably my favorite model right now. is does feel like a pro model at flash speed..

The AI Founder's avatar

The 91% hallucination rate on refusals is the buried lead here, and I'm surprised it's not getting more attention given that it's a regression from prior versions. An ultra-sparse MoE activating 5-30B of 1.2T parameters sounds like it should have more retrieval robustness, not less — so the failure mode seems like it's in the routing, not the raw capacity. Is this a training objective problem (the model wasn't rewarded for expressing uncertainty) or does the sparsity architecture itself create conditions where low-confidence signals don't propagate reliably through whichever expert path gets activated? That distinction matters a lot for whether Google can fix this through post-training or whether it's architectural.

4 more comments...

No posts

Ready for more?