The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Assign and Add: A Mechanistic Study of Compositional Arithmetic

Large language models are able to compose skills in order to perform complex tasks, many of which might not have been seen during training. The details of how exactly this composition occurs remain elusive. In this paper, we study a mechanism for compositional generalization in transformers by considering a simple controlled setting involving variable assignment and modular addition. By partitioning our training data into disjoint sets, we observe that small transformers are able to generalize to previously unseen combinations of variables and numbers. Our mechanistic analysis shows that the ...

Brady Exoo·21 days ago

Ars Technica AI· PRESS

Startup offers free home cleaning—if it can record it all for robot training

The latest twist in paying humans to wear head cameras for robot training data.

Jeremy Hsu ·21 days ago

The Archive

Assign and Add: A Mechanistic Study of Compositional Arithmetic

Startup offers free home cleaning—if it can record it all for robot training

Consolidating Rewarded Perturbations for LLM Post-Training

Cognition’s Scott Wu says AI coding agents shouldn’t replace humans

LinTree: Improving LLM Reasoning with Explicitly Structured Search Histories

Are Full Rollouts Necessary for On-Policy Distillation?

Graphical einops: bridging tensor networks and computation graphs

Balanced LoRA: Removing Parameter Invariance to Accelerate Convergence

BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali

Language Models Can Resolve Reference Compositionally, But It's Not Their Native Strength: The Case of the Personal Relation Task

Knowledge Boundary Probing and Demand-Guided Intervention for LLM-Based Power System Code Generation

Scaling Conversational Hungarian ASR: The BEA-Dialogue+ Corpus

AutoSci: A Memory-Centric Agentic System for the Full Scientific Research Lifecycle

How to Automate AI Model Documentation with the NVIDIA MCG Toolkit

GPU Forecasters: Language Models as Selective Surrogates for Kernel Runtime Optimization

PithTrain: A Compact and Agent-Native MoE Training System

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

Translation Analytics for Freelancers II: Benchmarking Local LLMs for Confidential Translation Workflows

Fine-grained Verification via Diagnostic Reasoning Supervision for Aspect Sentiment Triplet Extraction

Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information

Answer-Set-Programming-based Abstractions for Reinforcement Learning

Modeling Covariate Transition for Efficient Estimation of Longitudinal Treatment Effects in Randomized Experiments

Flow map learning in nonlinear vector autoregressive models: influence of the feature-library structure on the training error

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs

DG-CoLearn: An Efficient Collaborative Learning Framework for Dynamic Graphs

Today is the last day to apply to speak at TechCrunch Disrupt 2026

Does your CEO have AI psychosis? Aaron Levie thinks most of them do.

DeMaVLA: A Vision-Language-Action Foundation Model for Generalizable Deformable Manipulation

SAM for Robust Mitochondria Instance Segmentation in Fluorescence Microscopy