Our framework for developing safe and trustworthy agents
Anthropic publishes framework for developing safe and trustworthy autonomous agents with specified governance principles.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
Anthropic publishes framework for developing safe and trustworthy autonomous agents with specified governance principles.
Outtake uses GPT-4.1 and OpenAI o3 agents to detect security threats 100x faster.
Model ML CEO discusses AI-native infrastructure and autonomous agents for financial services transformation.
Genspark built $36M ARR no-code agent product in 45 days using GPT-4.1 and OpenAI Realtime API.
Devstral: Mistral AI open-source model optimized for autonomous coding agents and software development.
OpenAI introduces BrowseComp benchmark for evaluating web browsing agent capabilities.
PaperBench: new benchmark measuring AI agents' ability to replicate state-of-the-art research papers.
OpenAI shifts from intent-based bots to proactive AI agents architecture.
Hebbia's AI platform claims to automate 90% of finance and legal work tasks using OpenAI models.
OpenAI released advanced text-to-speech and speech-to-text APIs with customizable voice instructions for voice agents.
xAI unveils early preview of Grok 3, emphasizing advanced reasoning and agentic capabilities.
Google DeepMind presents NeurIPS 2024 research spanning adaptive agents, 3D scene generation, and LLM training safety.
MLE-bench introduces benchmark for evaluating AI agents on machine learning engineering tasks.
MavenAGI launches GPT-4-powered customer service agent; Tripadvisor, Clickup, Rho deploy for support automation.
Klarna is using AI to revolutionize personal shopping, customer service, and employee productivity.
We trained a neural network to play Minecraft by Video PreTraining (VPT) on a massive unlabeled video dataset of human Minecraft play, while using only a small amount of labeled contractor data. With fine-tuning, our model can learn to craft diamond tools, a task that usually takes proficient humans over 20 minutes (24,000 actions). Our model uses the native human interface of keypresses and mouse movements, making it quite general, and represents a step towards general computer-using agents.