Vol. I · No. 69SAT, JUN 27, 2026
Archive

The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Why SSMs struggle in parameter-constrained training: empirical findings at 25M parameters [R]

After \~3 weeks of experimentation in OpenAI's Parameter Golf competition, I wrote up why SSMs are structurally disadvantaged relative to transformers in a time- and size-constrained regime (10 min training, 16MB artifact, 25M parameters) on 8xH100s: [https://mradassaad.github.io/posts/why-ssms-struggle-in-parameter-golf/](https://mradassaad.github.io/posts/why-ssms-struggle-in-parameter-golf/) Main findings: 1. SSM in\_proj weights compress up to 3.26x worse than attention QKV under LZMA, directly taxing the compressed parameter budget 2. Architectural wins validated at SP4096 flipped sign...

··

If only this was a real game

Reddit post speculating about a hypothetical AI-themed game; lacks substantive technical or industry content.

··
30 stories