The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Natural Language Query to Configuration for Retrieval Agents

Modern retrieval agents expose many configuration choices -- LLM, retriever, number of documents, number of hops, and synthesis strategy -- each shaping both answer quality and serving cost. Today, these pipelines are typically hand-tuned once per workload, leaving substantial per-query optimization untapped. We formulate the problem: given a natural-language query and either an accuracy or a budget target, select from a predefined pipeline catalog the configuration that minimizes cost or maximizes accuracy at inference time. We propose **BRANE**, which uses an LLM to convert each query into ...

Melissa Z. Pan·15 days ago

The Archive

Natural Language Query to Configuration for Retrieval Agents

GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesis, Research, and Testing

Turning local agents into self-optimizing agents

FinHarness: An Inline Lifecycle Safety Harness for Finance LLM Agents

Modeling Agentic Technical Debt and Stochastic Tax: A Standalone Framework for Measurement, Simulation, and Dashboarding

SIA: Self Improving AI with Harness & Weight Updates

ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agents

Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments

Microsoft Copilot Cowork Exfiltrates Files

Rethinking organizational design in the age of agentic AI

Sundar Pichai on AI, the future of search, and what’s happening to the web

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

VeriTrace: Evolving Mental Models for Deep Research Agents

Automated Benchmark Auditing for AI Agents and Large Language Models

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

What ClickUp’s mass layoff tells us about the future of work

When Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level Validation

Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents

Has anyone else noticed certain words make AI agents actually listen?

PapersWithCode new features - week 1 [P]

Deterministic multi-subagent orchestration - what's new in CC 2.1.146 (+4,755 tokens)

LLM-driven design of physics-constrained constitutive models: two agents are better than one

OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents

Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents

Goal-Conditioned Agents that Learn Everything All at Once

OpenAI named a Leader in enterprise coding agents by Gartner