The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations

Softmax attention is the cornerstone of modern large language models, but its memory scales linearly and compute quadratically with sequence length. Linear recurrent models, such as linear attention and state space models, have become widely studied as alternatives to attention due to their linear compute and constant memory. While these sub-quadratic token mixing methods, or mixers, achieve promising efficiency gains and competitive results on a wide range of benchmarks, current linear recurrent models still lag behind on tasks that require long-context retrieval or in-context learning. A gr...

Kevin Y. Li·24 days ago

The Archive

Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations

Principled Algorithms for Optimizing Generalized Metrics in Multi-Label Learning

SwarmHarness: Skill-Based Task Routing via Decentralized Incentive-Aligned AI Agent Networks

CubePart: An Open-Vocabulary Part-Controllable 3D Generator

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

LLM Zeroth-Order Fine-Tuning is an Inference Workload

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

Qwen3.6 35B-A3B successfully completed the FoodTruck Bench!

Preference-Shaped Expected Hypervolume and R2 Improvement: Exact Computation and Monotonicity

Stance Detection in Prediction Markets: Addressing Imbalanced Trader Commentary via Counterfactual Augmentation and Market Context

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Reverse Probing: Supervised Token-level Uncertainty Quantification for Large Language Models in Clinical Text

What’s New for Game Developers in NVIDIA RTX: DLSS 4.5 for UE5 and Multilingual AI Characters

BIRDNet: Mining and Encoding Boolean Implication Knowledge Graphs as Interpretable Deep Neural Networks

Claude Code has zero idea what your codebase looks like structurally (Open source with benchmarks)

Code as a Weapon: A Consensus-Labeled Prompt Bank for Measuring Coding-Model Compliance with Malicious-Code Requests

Utility-Aware Multimodal Contrastive Learning for Product Image Generation

MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems

AlphaTransit: Learning to Design City-scale Transit Routes

Beyond Lipschitz: Data-Driven Robustness via Discrete Modulus of Continuity

How VLAs Fail Differently: Black-Box Action Monitoring Reveals Architecture-Specific Failure Signatures

Multi-Adapter Representation Interventions via Energy Calibration

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

OpenURMA: A Clean-Room Open Implementation of the Unified Bus Protocol

I think Anthropic and OpenAI have found product-market fit

IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents

Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor

SWE-rebench Leaderboard (March, April and May 2026): GPT-5.5, Opus 4.7, Cursor (Composer 2.5), Kimi K2.6 and More

AI-generated CUDA kernels silently break training and inference [R]

Stage-wise Distortion-Perception Traversal in Zero-shot Inverse Problems with Diffusion Models