Vol. I · No. 53THU, JUN 11, 2026
Archive

The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

How does Anthropic actually measure over-refusal? (genuine question after watching their safety video)

So I watched the recent Anthropic video on how they test Claude for safety, and it got me thinking. The testing they showed looks solid for catching one specific failure, which is the model helping with something genuinely harmful. Fine, that matters. But the whole time I was watching, I kept thinking about the other side of this that nobody really talks about. What about all the times Claude refuses or gets weirdly cautious about completely normal questions? A nurse asking about medication thresholds. A security person trying to understand how an exploit works so they can defend against it...

··

You can now read Gemma 3's mind

Anthropic & Neuronpedia release Natural Language Autoencoders (NLA) to interpret Gemma 3 27B's internal activations via learned encoder-decoder LLM pairs.

··

Natural Language Autoencoders: Turning Claude’s thoughts into text

This is incredible research. I'm only halfway through the post but I'm already racing. Could I/an average person build a tool to help with a normal person using the findings? Could it be paired with one of Anthropic's earlier tools to identify the "emotions" Claude is feeling when it uses certain language, almost like a lie detector? Could we look at the patterns in the language when hiding misalignment and see if Claude falls back to certain syntax? Also, it's such an interesting addition to the 10 ft wall, 11 ft ladder problem. We can read its thoughts, but sometimes it hides its th...

··

Anthropic just got 220,000 GPUs from the man who called Claude "misanthropic and evil" Three months ago....

The compute is real. The implications are stranger than the headline suggests. Colossus 1 which is 220,000 Nvidia GPUs, 300+ megawatts, is now running Claude inference. Anthropic moved fast: Claude Code limits doubled overnight, peak-hour caps removed, Opus API rates up. For anyone who's been hitting walls, this is immediately tangible. But the deal deserves more scrutiny than it's getting. Musk included a clause reserving SpaceX's right to reclaim the compute if Claude "engages in actions that harm humanity." That's not standard infrastructure boilerplate. That's a kill switch written int...

··

Be aware of this!

User reports Anthropic revoked Claude Code access mid-subscription with unexplained cutoff and support requesting renewal despite remaining credits.

··

Anthropic Just Secured a Reserve.

Anthropic secures partnership with SpaceX for 300MW+ compute at Colossus 1, adding 220k+ NVIDIA GPUs within one month.

··

Double limits!!

Partnership with spaceX, anthropic just doubled the limits, source: https://x.ai/news/anthropic-compute-partnership

··
30 matches