Agentic MIP Research: Accelerated Constraint Handler Generation
Agentic framework embeds LLM agents in SCIP solver harness to auto-generate and benchmark constraint handlers.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
Agentic framework embeds LLM agents in SCIP solver harness to auto-generate and benchmark constraint handlers.
Selective imitation learning framework enables agents to abstain from acting when demonstrations are uninformative under dynamics shift.
CIVeX verifies causal effects of tool-use actions in LLM agents via structural causal queries and identifiability checks.
MCP-Cosmos integrates world models into Model Context Protocol agents to bridge planning-execution gap via predictive task automation.
Framework maps LLM reliability techniques (retry, voting, self-consistency) to Shannon coding theory operators as stochastic channel reliability methods.
First comprehensive survey unifying token economics for LLM agents, framing tokens as production factors and analyzing computational-economic trade-offs.
Reddit discussion questioning the long-term value of Anthropic's Claude Certified Architect credential as AI agents automate architecture decisions.
Like many others here, I got frustrated with managing all my different claude/codex sessions, so i built Pokegents, which is an open source multi-agent workspace for coding agents. It has a Pokemon-themed dashboard/chat interface plus a local orchestration server for managing agent sessions (currently supports Claude Code in iTerm2, plus Claude and Codex through ACP-based chat runtimes), persistent agent identities, mcp messaging between agents, notifications, session cloning, and more. This was mostly a vibe-coded side project, but I've been using it constantly in my day-to-day workflow as ...
Study shows expanded context windows degrade multi-agent cooperation in LLMs across 7 models; mechanism is eroding forward-looking intent rather than increased distrust.
Bash is one of the most flexible and powerful interfaces exposed to AI agents. In the right system, a model that emits grep, curl, tar, or a shell pipeline is... Bash is one of the most flexible and powerful interfaces exposed to AI agents. In the right system, a model that emits , , , or a shell pipeline is producing an executable action that can read files, mutate a workspace, open network connections, and chain tools together. For the NVIDIA AI Red Team, this makes command generation a useful research target. If smaller language models can be guided… Source
Proposes action-credit RL for CLI agents using structured command attributes and selective observation over filesystems.
Susceptibilities technique extends neural network interpretability to deep RL agents, revealing parameter-space development undetectable from policy analysis alone.
Measures optimal timing for clarification requests in long-horizon agent execution via injection framework across 4 dimensions.
DRIP-R benchmark evaluates LLM agents on real-world retail policy ambiguities with multiple valid interpretations, addressing evaluation gaps in agent robustness.
Autonomous agent oversight reveals endogeneity: non-affine approval functions needed to screen dishonest agents violate truthful reporting conditions.
PhoneSafety benchmark (700 examples) distinguishes genuine safety understanding from task failure in phone-use agents via fine-grained outcome categorization.
Local LLMs reaching production-grade performance on routine tasks (coding, summarization, agents), driving adoption of hybrid cloud-local workload strategies.
Perplexity's Personal Computer brings AI agents to your Mac, and is now open to everyone.
BAMI mitigates precision and ambiguity bias in GUI grounding agents without retraining using masked prediction distribution attribution.
SIRA framework improves retrieval-augmented agents by modeling expert search priors, reducing retrieval rounds and latency for organizational knowledge bases.
RAO trains recursive agents to delegate sub-tasks recursively, enabling divide-and-conquer inference scaling for longer contexts and harder problems.
Framework for parsing and verifying source attribution in LLM research agents; evaluates citation accuracy via AST parsing and reproducible verification.
SkillOS: self-evolving LLM-agent framework learning long-horizon skill curation policies from streaming task interactions.
AI CFD Scientist demonstrates LLM-based agents automating computational fluid dynamics discovery with physics validation loops.
Patch2Vuln reconstructs security vulnerabilities from Linux binary patches using LLM agents without source code access.
CRONA applies multi-agent RL to cross-modal embodied navigation, decomposing monolithic models into modality-specialized agents for flexible deployment.
NeuroAgent automates heterogeneous neuroimaging preprocessing and analysis via LLM-driven agentic framework coordinating modality-specific toolchains.
Save to Spotify is a new command-line tool designed specifically for AI agents like OpenClaw, Claude Code, or OpenAI Codex. If you're the kind of person who collects research on a topic, then feeds it through their AI of choice to create audio summaries and personal podcasts, this lets you save them right alongside the latest episode of The Vergecast and Welcome to Night Vale on Spotify. To set it up, you need to download and install the Save to Spotify CLI from GitHub. Then you just prompt your AI agent as normal, but tack on "and save to Spotify," and it should show up right in your podcast...
Parloa uses OpenAI models to build voice-driven customer service agents with simulation and real-time deployment capabilities for enterprises.
LongSeeker proposes Context-ReAct paradigm for elastic context management in long-horizon search agents, maintaining trajectory at variable detail levels.