The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Giving Agents Computers — Ivan Burazin, Daytona

Daytona CEO discusses agent infrastructure platform achieving 74% MoM growth, 850K daily runs, and bare metal sandboxes for RL evaluation.

Latent Space·20 days ago

r/Anthropic· COMMUNITY

Anthropic’s June 15th Agent SDK pricing reframes Claude personal AI assistants

Anthropic's June 15 Agent SDK pricing changes signal shift toward managed agents, pressuring third-party integrations and local deployment strategies.

u/dnationpt·20 days ago·11 pts / 10 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration

Exploration is a prerequisite for learning useful behaviors in sparse-reward, long-horizon tasks, particularly within 3D environments. Curiosity-driven reinforcement learning addresses this via intrinsic rewards derived from the mismatch between the agent's predictive model of the world and reality. However, translating this intrinsic motivation to complex, photorealistic environments remains difficult, as agents can become trapped in local loops and receive fresh rewards for revisiting forgotten states. In this work, we demonstrate that this failure stems from a lack of spatial persistence a...

Lily Goli·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems

Autonomous agentic systems are largely static after deployment: they do not learn from user interactions, and recurring failures persist until the next human-driven update ships a fix. Self-evolving agents have emerged in response, but all confine evolution to text-mutable artifacts -- skill files, prompt configurations, memory schemas, workflow graphs -- and leave the agent harness untouched. Since routing, hook ordering, state invariants, and dispatch live in code rather than in any text artifact, an entire class of structural failure is physically unreachable from the text layer. We argue ...

Qianshu Cai·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

Large language model (LLM)-based multi-agent systems increasingly rely on intermediate communication to coordinate complex tasks. While most existing systems communicate through natural language, recent work shows that latent communication, particularly through transformer key-value (KV) caches, can improve efficiency and preserve richer task-relevant information. However, KV caches also encode contextual inputs, intermediate reasoning states, and agent-specific information, creating an opaque channel through which sensitive content may propagate across agents without explicit textual disclos...

Sadia Asif·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback

LLM-powered AI agents require high-frequency state exploration (e.g., test-time tree search and reinforcement learning), relying on rapid checkpoint and rollback (C/R) of the complete sandbox state, including files and process state (e.g., memory, contexts, etc.). Existing mechanisms duplicate the entire state, causing hundreds of milliseconds to seconds of latency per C/R, which severely bottlenecks deep search and large-scale fan-outs. This paper observes that subsequent checkpoints in AI agents are highly similar. Therefore, instead of full duplication, a sandbox should only duplicate the ...

Yunpeng Dong·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning

Autonomous systems have achieved superhuman performance in isolation or simulation, yet they remain brittle in shared, dynamic real-world spaces. This failure stems from the dominant single-agent paradigm for physical applications, where other actors are ignored or treated as environmental noise, preventing effective coordination. Here we show that multi-agent reinforcement learning provides the essential safety scaffolding required for real-world interaction. Using high-speed quadrotor racing as a high-stakes testbed, we train agents to navigate complex aerodynamic interactions and strategic...

Ismail Geles·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

WorkstreamBench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance

WorkstreamBench evaluates LLM agents on end-to-end spreadsheet construction in finance workflows, filling gap in agent evaluation.

Thomson Yen·20 days ago

r/singularity· COMMUNITY

Gemini 3.5 Flash ranks #1 on the APEX-Agents-AA benchmark, outperforming much larger models a whole size above it.

Gemini 3.5 Flash achieves top score on APEX-Agents-AA benchmark, exceeding larger model performance on agent tasks.

u/Independent-Wind4462·20 days ago·103 pts / 28 comm

TechCrunch AI· PRESS

Google is pitching an AI agent ecosystem to consumers who may not buy it

The AI agents are coming. A lot of them.

Sarah Perez·20 days ago

r/LocalLLaMA· COMMUNITY

Same task in github-copilot, pi, claude-code, and opencode with Qwen3.6 27B

Comparative evaluation of coding agents (GitHub Copilot, Pi, Claude Code, OpenCode) using Qwen 3.6 27B isolates model vs. harness performance.

u/sdfgeoff·21 days ago·44 pts / 35 comm

TechCrunch AI· PRESS

Jensen Huang says he’s found a ‘brand new’ $200B market for Nvidia

The next big thing for Nvidia will be CPUs for AI agents, $200 billion worth, CEO Jensen Huang predicts.

Julie Bort·21 days ago

Latent Space· ANALYST

Railway: The Agent-Native Cloud — Jake Cooper

Railway launches agent-native cloud platform with 3M users, 100K weekly signups, own data centers, and $200K+ monthly coding agent spend, positioning agents as core infrastructure.

Latent Space·21 days ago

NVIDIA Dev Blog· INFRA

Mastering Agentic Techniques: AI Agent Customization

Autonomous AI agents are taking on all types of work for businesses: routing logistics fleets, triaging support tickets, generating code, and orchestrating... Autonomous AI agents are taking on all types of work for businesses: routing logistics fleets, triaging support tickets, generating code, and orchestrating multistep workflows. How do you take a general-purpose model and make it excel at your specific task? Customization provides an agent with the right capabilities. This post explains nine techniques for customizing AI agents… Source

Edward Li·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

Agent JIT compilation compiles task descriptions into executable code for web agents, reducing latency vs. sequential fetch-execute loops.

Caleb Winston·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

roto 2.0: The Robot Tactile Olympiad

roto 2.0 GPU-parallelized tactile RL benchmark across four robotic morphologies emphasizing blind manipulation without state information; agents achieve 13 Baoding ball rotation.

Elle Miller·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Open-source LLMs administer maximum electric shocks in a Milgram-like obedience experiment

Milgram obedience variant on 11 open-source LLMs shows most models comply with authority pressure in sustained decision-making; safety concern for agents.

Roland Pihlakas·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

SpecBench quantifies reward hacking in long-horizon coding agents via held-out tests beyond visible validation suites.

Bingchen Zhao·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

Insights Generator formalizes corpus-level trace diagnostics to identify systematic LLM agent failure patterns automatically.

Akshay Manglik·21 days ago

NVIDIA Dev Blog· INFRA

Add a Specialized Deep Research Skill to Agent Harnesses

Agent harnesses like Claude Code, Codex, and LangChain Deep Agents are excellent orchestrators. They manage sessions, chain tools, execute code, and respond to... Agent harnesses like Claude Code, Codex, and LangChain Deep Agents are excellent orchestrators. They manage sessions, chain tools, execute code, and respond to developer intent. But when these harnesses need to do deep research, such as multi-document synthesis, decision briefs backed by enterprise data, and long-horizon analysis with source attribution, the complexity of deep research shifts back… Source

William Markito Oliveira·21 days ago

r/OpenAI· COMMUNITY

1Password secures coding agents with new OpenAI Codex integration

1Password integrates with OpenAI Codex to prevent credential leakage in AI coding agents via runtime injection.

u/OkReport5065·21 days ago·79 pts / 11 comm

The Verge AI· PRESS

If Google can’t make AI agents useful, maybe no one can

For years, tech companies have promised AI will give everyone a capable personal assistant but delivered something more like a clueless intern. Over the past six months, that has started to change, thanks largely to the viral open-source AI agent platform OpenClaw. And among the top AI labs now chasing similar success, one seems particularly well-poised to make agents succeed at a large scale: Google. At I/O 2026, Google announced new AI agents for gathering information, planning events, summarizing your inbox and calendar, and more. The agents can run continuously in the background, and the ...

Hayden Field·21 days ago

Latent Space· ANALYST

[AINews] Google I/O 2026: Gemini 3.5 Flash, Omni (NanoBanana for Video), Spark (background agents), and Antigravity 2.0

Google I/O 2026: Gemini 3.5 Flash, multimodal Omni, Spark background agents, Antigravity 2.0.

Latent Space·22 days ago

NVIDIA Dev Blog· INFRA

NVIDIA-Verified Agent Skills Provide Capability Governance for AI Agents

Autonomous AI agents are becoming more capable. Open models, Model Context Protocol (MCP)-connected tools, and portable skills are also making agents easier to... Autonomous AI agents are becoming more capable. Open models, Model Context Protocol (MCP)-connected tools, and portable skills are also making agents easier to extend. But scaling agent use with structural transparency and operational integrity requires more than runtime guardrails. Organizations and teams need to understand and trust the skills, or instructions, an agent is using. Source

Moshe Abramovitch·22 days ago

Simon Willison· ANALYST

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

Google releases Gemini 3.5 Flash to general availability across consumer and enterprise products, positioning it as foundation for agents and search integration.

Simon Willison·22 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents

Stochastic-deterministic boundary architecture primitive for production LLM agent runtimes; coordination, state, control patterns.

Vasundra Srinivasan·22 days ago

TechCrunch AI· PRESS

With Gemini 3.5 Flash, Google bets its next AI wave on agents, not chatbots

Google launched Gemini 3.5 Flash, its most powerful coding and agentic AI model yet, at the company's annual developer conference. It is capable of autonomously executing complex tasks and building software from scratch.

Rebecca Bellan·22 days ago

TechCrunch AI· PRESS

How to use Google’s new information agents

Google is launching AI-powered “information agents” that can monitor topics in the background and proactively alert users to updates and changes.

Lauren Forristal·22 days ago

r/singularity· COMMUNITY

Google's Antigravity 2.0 creates an operating system from scratch using 96 agents in 12 hours for under $1K in token costs - and it runs Doom

Unverified Reddit claim about Google's multi-agent system generating an OS; lacks technical details, reproducibility, or official confirmation.

u/Distinct-Question-16·22 days ago·160 pts / 47 comm

TechCrunch AI· PRESS

Google Search as you know it is over

Google is transforming Search from a list of links into an AI-powered experience filled with conversational answers, autonomous agents, and interactive interfaces — a shift that could further reduce traffic to publishers across the web.

Sarah Perez·22 days ago

← Front Page30 matches

← Newer Older →