Vol. I · No. 68FRI, JUN 26, 2026
Archive

The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

How does Anthropic actually measure over-refusal? (genuine question after watching their safety video)

So I watched the recent Anthropic video on how they test Claude for safety, and it got me thinking. The testing they showed looks solid for catching one specific failure, which is the model helping with something genuinely harmful. Fine, that matters. But the whole time I was watching, I kept thinking about the other side of this that nobody really talks about. What about all the times Claude refuses or gets weirdly cautious about completely normal questions? A nurse asking about medication thresholds. A security person trying to understand how an exploit works so they can defend against it...

··

Why you can never get your doctor to call you back

Like many AI companies automating work that humans currently do, Basata will eventually face a harder question about where the line is between augmenting workers and displacing them. For now, the founders say the administrative staff they work with aren't worried about that; they're more worried about drowning.

·

I am showing how claude code is editing my codebase in real time

I think Claude Code is amazing, however very hard to track what exactly has been changed without having to look through a 10k line diff on git. My friends and I started this open-source proejct to visualize software architectures. We found out that we are also curious how big of an effect does each agent change have, this way we can stop Claude Code early as soon as we notice it messed up, without having to read every line (saving also on tokens and time). Our project is based on static analysis alongside LLMS and you can find it on github: [https://github.com/CodeBoarding/CodeBoarding](h...

··

You can now read Gemma 3's mind

Anthropic & Neuronpedia release Natural Language Autoencoders (NLA) to interpret Gemma 3 27B's internal activations via learned encoder-decoder LLM pairs.

··

Benchmark Qwen 3.6 27B MTP on 2x3090 NVLINK

Benchmark shows TP=2 pinned to NVLink GPU pairs yields +25–53% throughput vs PCIe on Qwen 3.6 27B; TP=4 degrades performance due to cross-pair PCIe bottleneck.

··

Collected the infinity stones

Engineer building heterogeneous inference cluster with 2.3TB RAM, 400+ vCores, Blackwell GPUs, and RDMA; seeks Tinygrad driver expertise.

··
30 stories