The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Agentic MIP Research: Accelerated Constraint Handler Generation

Agentic framework embeds LLM agents in SCIP solver harness to auto-generate and benchmark constraint handlers.

Liding Xu·1 month ago

Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

Selective imitation learning framework enables agents to abstain from acting when demonstrations are uninformative under dynamics shift.

Surbhi Goel·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

CIVeX: Causal Intervention Verification for Language Agents

CIVeX verifies causal effects of tool-use actions in LLM agents via structural causal queries and identifiability checks.

Fabio Rovai·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

MCP-Cosmos integrates world models into Model Context Protocol agents to bridge planning-execution gap via predictive task automation.

Giridhar Ganapavarapu·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Communication-Theoretic Framework for LLM Agents: Cost-Aware Adaptive Reliability

Framework maps LLM reliability techniques (retry, voting, self-consistency) to Shannon coding theory operators as stochastic channel reliability methods.

Hamed Omidvar·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Token Economics for LLM Agents: A Dual-View Study from Computing and Economics

First comprehensive survey unifying token economics for LLM agents, framing tokens as production factors and analyzing computational-economic trade-offs.

Yuxi Chen·1 month ago

r/ClaudeAI· COMMUNITY

Anthropic's Claude Certified Architect, Worth it?

Reddit discussion questioning the long-term value of Anthropic's Claude Certified Architect credential as AI agents automate architecture decisions.

u/No_Agency8722·1 month ago·22 pts / 22 comm

r/ClaudeAI· COMMUNITY

I built a Pokémon-styled multi-agent dashboard to manage all Claude Code sessions

Like many others here, I got frustrated with managing all my different claude/codex sessions, so i built Pokegents, which is an open source multi-agent workspace for coding agents. It has a Pokemon-themed dashboard/chat interface plus a local orchestration server for managing agent sessions (currently supports Claude Code in iTerm2, plus Claude and Codex through ACP-based chat runtimes), persistent agent identities, mcp messaging between agents, notifications, session cloning, and more. This was mostly a vibe-coded side project, but I've been using it constantly in my day-to-day workflow as ...

u/girishkumama·1 month ago·31 pts / 5 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

Study shows expanded context windows degrade multi-agent cooperation in LLMs across 7 models; mechanism is eroding forward-looking intent rather than increased distrust.

Jiayuan Liu·1 month ago

NVIDIA Dev Blog· INFRA

Improving Bash Generation in Small Language Models with Grammar-Constrained Decoding

Bash is one of the most flexible and powerful interfaces exposed to AI agents. In the right system, a model that emits grep, curl, tar, or a shell pipeline is... Bash is one of the most flexible and powerful interfaces exposed to AI agents. In the right system, a model that emits , , , or a shell pipeline is producing an executable action that can read files, mutate a workspace, open network connections, and chain tools together. For the NVIDIA AI Red Team, this makes command generation a useful research target. If smaller language models can be guided… Source

Joseph Lucas·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Learning CLI Agents with Structured Action Credit under Selective Observation

Proposes action-credit RL for CLI agents using structured command attributes and selective observation over filesystems.

Haoyang Su·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Interpreting Reinforcement Learning Agents with Susceptibilities

Susceptibilities technique extends neural network interpretability to deep RL agents, revealing parameter-space development undetectable from policy analysis alone.

Chris Elliott·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Ask Early, Ask Late, Ask Right: When Does Clarification Timing Matter for Long-Horizon Agents?

Measures optimal timing for clarification requests in long-horizon agent execution via injection framework across 4 dimensions.

Anmol Gulati·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

DRIP-R: A Benchmark for Decision-Making and Reasoning Under Real-World Policy Ambiguity in the Retail Domain

DRIP-R benchmark evaluates LLM agents on real-world retail policy ambiguities with multiple valid interpretations, addressing evaluation gaps in agent robustness.

Hsuvas Borkakoty·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The Endogeneity of Miscalibration: Impossibility and Escape in Scored Reporting

Autonomous agent oversight reveals endogeneity: non-affine approval functions needed to screen dishonest agents violate truthful reporting conditions.

Lauri Lovén·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents

PhoneSafety benchmark (700 examples) distinguishes genuine safety understanding from task failure in phone-use agents via fine-grained outcome categorization.

Zhengyang Tang·1 month ago

r/LocalLLaMA· COMMUNITY

Are local models becoming “good enough” faster than expected?

Local LLMs reaching production-grade performance on routine tasks (coding, summarization, agents), driving adoption of hybrid cloud-local workload strategies.

u/qubridInc·1 month ago·43 pts / 46 comm

TechCrunch AI· PRESS

Perplexity’s Personal Computer is now available everyone on Mac

Perplexity's Personal Computer brings AI agents to your Mac, and is now open to everyone.

Sarah Perez·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

BAMI: Training-Free Bias Mitigation in GUI Grounding

BAMI mitigates precision and ambiguity bias in GUI grounding agents without retraining using masked prediction distribution attribution.

Borui Zhang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

SIRA framework improves retrieval-augmented agents by modeling expert search priors, reducing retrieval rounds and latency for organizational knowledge bases.

Zeyu Yang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Recursive Agent Optimization

RAO trains recursive agents to delegate sub-tasks recursively, enabling divide-and-conquer inference scaling for longer contexts and harder problems.

Apurva Gandhi·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents

Framework for parsing and verifying source attribution in LLM research agents; evaluates citation accuracy via AST parsing and reproducible verification.

Hailey Onweller·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SkillOS: Learning Skill Curation for Self-Evolving Agents

SkillOS: self-evolving LLM-agent framework learning long-horizon skill curation policies from streaming task interactions.

Siru Ouyang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents

AI CFD Scientist demonstrates LLM-based agents automating computational fluid dynamics discovery with physics validation loops.

Nithin Somasekharan·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

Patch2Vuln reconstructs security vulnerabilities from Linux binary patches using LLM agents without source code access.

Isaac David·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Cross-Modal Navigation with Multi-Agent Reinforcement Learning

CRONA applies multi-agent RL to cross-modal embodied navigation, decomposing monolithic models into modality-specialized agents for flexible deployment.

Shuo Liu·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research

NeuroAgent automates heterogeneous neuroimaging preprocessing and analysis via LLM-driven agentic framework coordinating modality-specific toolchains.

Lujia Zhong·1 month ago

The Verge AI· PRESS

OpenClaw and Claude can put your AI-generated podcasts in Spotify

Save to Spotify is a new command-line tool designed specifically for AI agents like OpenClaw, Claude Code, or OpenAI Codex. If you're the kind of person who collects research on a topic, then feeds it through their AI of choice to create audio summaries and personal podcasts, this lets you save them right alongside the latest episode of The Vergecast and Welcome to Night Vale on Spotify. To set it up, you need to download and install the Save to Spotify CLI from GitHub. Then you just prompt your AI agent as normal, but tack on "and save to Spotify," and it should show up right in your podcast...

Terrence O’Brien·1 month ago

OpenAI· FRONTIER

Parloa builds service agents customers want to talk to

Parloa uses OpenAI models to build voice-driven customer service agents with simulation and real-time deployment capabilities for enterprises.

OpenAI·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

LongSeeker proposes Context-ReAct paradigm for elastic context management in long-horizon search agents, maintaining trajectory at variable detail levels.

Yijun Lu·1 month ago

← Front Page30 matches

← Newer Older →