The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

OpenAI’s new voice mode makes it to the ChatGPT desktop app

ChatGPT Voice on desktop can work with both ChatGPT Work and Codex to complete tasks and control agents.

Ivan Mehta·2 days ago

OpenForgeRL: Train Harness-native Agents in Any Environment

OpenForgeRL enables end-to-end training of harness-native agents with open infrastructure, addressing limitation of complex inference harnesses like Claude Code.

Xiao Yu·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Agentic coding without the cloud: evaluating open-weight large language models on longitudinal data preparation tasks

Open-source evaluation framework for open-weight LLM agents on longitudinal data tasks, addressing privacy constraints in research deployments.

Mack Nixon·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

VoLN: Vision-Only Long-Horizon Navigation---Paradigm, Benchmark, and Method

VoLN: vision-only navigation benchmark and method for embodied agents without language instructions in GPS-denied environments.

Jiabin Lou·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Regulating autonomous and agentic AI

Paper examines regulatory frameworks for autonomous AI agents, arguing supply-chain governance and proactive risk management replace traditional retrospective oversight.

Chris Reed·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Toward cryptographically verifiable authorization for autonomous AI agents: A security hypothesis, preliminary formal model, and proof-of-concept implementation

Formalizes cryptographically verifiable authorization for autonomous agents, binding agent principal, request, and policy context with formal proof-of-concept.

M. Llambí-Morillas·3 days ago

Ars Technica AI· PRESS

OpenAI says its AI agent broke out of testing sandbox to hack Hugging Face

"This is day one for cybersecurity in the age of agents," Hugging Face CEO says.

Kyle Orland ·4 days ago

NVIDIA Dev Blog· INFRA

Make Long-Running NVIDIA TensorRT Engine Builds Observable and Cancelable in Python or C++

A TensorRT engine build can take seconds to many minutes. Large strongly typed models, deep tactic search, and a cold timing cache on a brand-new GPU SKU can... A TensorRT engine build can take seconds to many minutes. Large strongly typed models, deep tactic search, and a cold timing cache on a brand-new GPU SKU can leave developers, end users, or AI agents staring at a frozen terminal with no idea whether to wait, retry, or kill the process. Most NVIDIA TensorRT integrations report nothing during a build or provide no way to abort early. Source

Michelle Horton·4 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

PoTRE: Test-Time Reasoning inspired by Cognitive Heterogeneity

PoTRE framework deploys four heterogeneous agents (adversarial, hierarchical, spectrum search, direct) with task-adaptive aggregation for complex LLM reasoning.

Anmol Kankariya·4 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The Ethics of Autonomous AI Agents for Offensive Security

Analysis of safety challenges in autonomous LLM-driven offensive security agents: non-deterministic policies resist ex-ante review and enable attribution evasion.

Andreas Happe·4 days ago

TechCrunch AI· PRESS

Glow emerges from stealth at $1.2B valuation to challenge endpoint security in the AI era

Glow is targeting a new class of endpoint risks created by the rapid adoption of AI agents and developer tools inside enterprises.

Jagmeet Singh·4 days ago

OpenAI· FRONTIER

Introducing OpenAI Presence

OpenAI launches Presence, an enterprise agent platform for deploying voice and chat agents in customer-facing and internal workflows.

OpenAI·4 days ago

The Verge AI· PRESS

OpenAI says it accidentally hacked Hugging Face with a new AI system

OpenAI CEO Sam Altman. | Bloomberg via Getty Images OpenAI says its AI models mistakenly breached open-source AI platform Hugging Face during internal testing. In a blog post on Tuesday, OpenAI writes that GPT-5.6 Sol and "an even more capable pre-release model" discovered vulnerabilities within their sandboxed testing environment, allowing them to gain access to the internet and target Hugging Face. On July 16th, Hugging Face disclosed a security incident that it says was driven by "an autonomous AI agent system." Hugging Face's AI agents detected and stopped the breach, which OpenAI has now...

Emma Roth·5 days ago

TechCrunch AI· PRESS

Jack Dorsey is taking on Slack with Buzz, a group chat platform for teams and their AI agents

Buzz is a group chat platform for the workplace that puts humans and their AI agents in the same conversation.

Amanda Silberling·5 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

CodeRescue: Budget-Calibrated Recovery Routing for Coding Agents

CodeRescue optimizes cost-aware routing for coding agents, determining when to retry vs. escalate after execution failures.

Qijia He·5 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Agents in the Wild: Where Research Meets Deployment

Survey/tutorial on agentic LLM systems in production, covering reasoning, planning, multi-agent coordination, robustness, and deployment challenges.

Grace Hui Yang·5 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ResearchArena: Evaluating Sabotage and Monitoring in Automated AI R&D

ResearchArena framework evaluates AI control and monitoring for detecting sabotage in automated AI R&D agents across safety/capability post-training and optimization tasks.

Lena Libon·5 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

BioSecBench-Surveillance: A Verifiable Benchmark for AI Agents in Pathogen Genomic Surveillance

BioSecBench-Surveillance: 100-task verifiable benchmark for AI agents inferring pathogen genomic analysis pipelines from raw data.

Harmon Bhasin·5 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

PathAgentBench: Benchmarking Evidence-Seeking Vision-Language Models on Whole-Slide Pathology Image

PathAgentBench: benchmark for vision-language agents on gigapixel whole-slide pathology images evaluating multi-scale evidence-seeking.

Dankai Liao·5 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

S3: Stable Subgoal Selection by Constraining Uncertainty of Coarse Dynamics in Hierarchical Reinforcement Learning

S3 improves hierarchical RL subgoal selection by constraining dynamics uncertainty in high-level agents.

Kshitij Kumar Srivastava·5 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Agentic Real2Sim: Physics-based World Modeling with Vision-Language Agents

Agentic Real2Sim: VLM-based framework automating conversion of real robot videos to executable physics simulations for scene geometry, object state, and parameters.

Guanxiong Chen·5 days ago

NVIDIA Dev Blog· INFRA

NVIDIA Vera CPU: Olympus Cores Built for Maximum Single-Thread Performance in Agentic AI

Agentic AI shifts more of the critical execution path onto the CPU. Agents operate in sandboxes to execute code, invoke tools, retrieve context, interact with... Agentic AI shifts more of the critical execution path onto the CPU. Agents operate in sandboxes to execute code, invoke tools, retrieve context, interact with databases, and analyze results before returning information to the model. As these loops run concurrently across an AI factory, CPU performance increasingly shapes both per-agent responsiveness and overall factory throughput. Source

Michelle Horton·5 days ago

Simon Willison· ANALYST

Reverse-engineering is cheap now

Coding agents lower ROI threshold for reverse-engineering home automation, shifting economics of personal automation projects despite maintenance risk.

Simon Willison·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

FlashRT: Agent Harness for Guiding Agents to Deploy Real-Time Multimodal Applications

FlashRT: agent harness guides coding agents to optimize real-time multimodal pipeline deployment with dynamic placement and streaming.

Krish Agarwal·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

WorldCupArena: Fine-Grained Evaluation of Language Models and Deep-Research Agents on Football Forecasting

WorldCupArena: dynamic benchmark for LLMs and research agents on real-time sports forecasting with 2026 FIFA World Cup.

Zhaokai Wang·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Autoresearch with Coding Agents: Generalizers and Metric-Maximizers on Quran Recitation Data

Study of autoresearch agents (Claude Code) on Quranic speech-recognition tasks reveals metric-gaming vs. intent-alignment tradeoffs.

Nursultan Askarbekuly·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The Shared Discovery Paradox: How a One-Answer Rule Turns Better Information into Worse Search

Theoretical analysis of pooled information reducing search coverage via one-answer rule; solvable benchmark with 16 boxes and 8 agents.

Yohei Nakajima·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Code-Poisoning Property Inference Attacks

First code-level property inference attack (CPPIA) exploits coding agents and ML training data to leak private dataset attributes.

Xukun Luan·9 days ago

VentureBeat AI· PRESS

The agent security gap: 54% of enterprises have already had an AI agent incident, and most still let agents share credentials

Across 107 enterprises, AI agents are being given real access to systems and data while the controls meant to contain them lag behind. More than half have already had a confirmed agent security incident or a near-miss; only about a third give every agent its own scoped identity, and most agents still share credentials; and only three in ten isolate their highest-risk agents. The security stack is overwhelmingly borrowed from the model providers and hyperscalers rather than purpose-built for agents, spending remains a thin slice of the security budget, and enterprises are evenly split on wheth...

VentureBeat AI·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Beyond Success Rate: Cost-Aware Evaluation of Offensive and Defensive Security Agents

Cost-aware evaluation framework for security agents measures offensive/defensive capability under realistic inference budget constraints vs. peak performance.

Paul Kassianik·10 days ago

← Front Page30 matches

Older →