The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

A Benchmark for Interactive World Models with a Unified Action Generation Framework

iWorld-Bench is a 330k-clip dataset and benchmark for training interactive world models on perception, reasoning, and physical interaction capabilities.

Jianjie Fang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The Counterexample Game: Iterated Conceptual Analysis and Repair in Language Models

Study finds LMs can iteratively refine conceptual definitions through counterexample generation, but accept invalid counterexamples at 2× the human acceptance rate.

Daniel Drucker·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Towards Open World Sound Event Detection

Open-World Sound Event Detection paradigm extends closed-world audio classifiers to detect unknown events and incrementally learn from them.

P. H. Hai·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Magic-Informed Quantum Architecture Search

Magic-informed quantum architecture search uses GNN-guided Monte Carlo Tree Search to control quantum resource utilization in circuit design.

Vincenzo Lipardi·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

PHALAR: Phasors for Learned Musical Audio Representations

PHALAR contrastive framework for stem retrieval uses learned spectral pooling and phase-equivariant biases, achieving 70% relative accuracy gain with fewer parameters.

Davide Marincione·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes

Randomized algorithm for PAC policy identification in MDPs combines posterior sampling with online learning for asymptotically optimal sample complexity.

Cyrille Kone·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Exact ReLU realization of tensor-product refinement iterates

Theoretical proof that dyadic refinement iterates on R² admit exact ReLU realizations with fixed width and O(n) depth for piecewise-linear functions.

Tsogtgerel Gantumur·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Atomic Fact-Checking Increases Clinician Trust in Large Language Model Recommendations for Oncology Decision Support: A Randomized Controlled Trial

RCT of 356 clinicians shows atomic fact-checking (decomposing LLM recommendations into verifiable claims) increases trust from 27% to 67% vs. traditional explainability methods.

Lisa C. Adams·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Ecologically-Constrained Task Arithmetic for Multi-Taxa Bioacoustic Classifiers Without Shared Data

Task vector arithmetic on BEATs encoders composes 661-species bioacoustic classifier without data sharing; task vectors near-orthogonal, geometry aligns with acoustic niche hypothesis.

Ragib Amin Nihal·2 months ago

r/LocalLLaMA· COMMUNITY

Gemma 4 MTP released

Google releases Gemma 4 multi-token prediction drafters in 4 quantized sizes for local deployment.

u/rerri·2 months ago·86 pts / 23 comm

Google AI (Gemma)· FRONTIER

Google is partnering with XPRIZE and Range Media Partners on the $3.5 million Future Vision film competition.

Google partners with XPRIZE and Range Media on $3.5M Future Vision film competition.

Google AI (Gemma)·2 months ago

Anthropic· FRONTIER

Agents for financial services

Anthropic releases ten Cowork and Claude Code plugins plus Microsoft 365 integrations and MCP app for financial services.

Anthropic·2 months ago

NVIDIA Dev Blog· INFRA

How to Build In-Vehicle AI Agents with NVIDIA: From Cloud to Car

The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and... The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and acting. In most vehicles on the road today, in-vehicle assistants still rely on fixed command-response patterns: interpret a phrase, trigger an action, reset. While effective for well-defined tasks, this approach doesn’t scale to modern… Source

Felix Friedmann·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Steer Like the LLM: Activation Steering that Mimics Prompting

Framework shows popular activation steering methods misalign with prompt steering mechanics; proposes distilling prompt behavior into interpretable models to close performance gap.

Geert Heyman·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing

CC-OCR V2 benchmark for real-world enterprise document OCR with LMMs; addresses gap between lab tasks and practical heterogeneous acquisition conditions.

Zhipeng Xu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Graph Neural Networks in the Wilson Loop Representation of Abelian Lattice Gauge Theories

Gauge-invariant GNN architecture for Abelian lattice gauge theories using Wilson loop representations; application to condensed matter and quantum systems.

Ali Rayat·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Contextual Multi-Objective Optimization: Rethinking Objectives in Frontier AI Systems

Argues frontier AI failures in open-ended tasks (scientific assistance, agents, personalization) stem from objective ambiguity rather than capability gaps; proposes contextual multi-objective optimization.

Jie Zhou·2 months ago

r/LocalLLaMA· COMMUNITY

Use Qwen3.6 right way -> send it to pi coding agent and forget

Reddit post on using Qwen3.6 with pi.dev harness and agent tooling for local coding and admin tasks.

u/Willing-Toe1942·2 months ago·40 pts / 45 comm

NVIDIA Dev Blog· INFRA

Building for the Rising Complexity of Agentic Systems with Extreme Co-Design

Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different. Agents don't... Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different. Agents don’t follow a pre-determined sequence of actions. They call tools, spawn sub-agents with different tasks and models, retain information in memory, manage their own context window, and decide for themselves when they’re finished. In doing so… Source

Eduardo Alvarez·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

From Data Lifting to Continuous Risk Estimation: A Process-Aware Pipeline for Predictive Monitoring of Clinical Pathways

Process-aware pipeline for continuous predictive monitoring of clinical pathways using prefix-based representations on COVID-19 ICU admission prediction.

Pasquale Ardimento·2 months ago

r/singularity· COMMUNITY

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA)

Link: x.com

u/Scared_Bluebird_7243·2 months ago·113 pts / 43 comm

r/LocalLLaMA· COMMUNITY

Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding- Google Developers Blog

Google demonstrates 3X LLM inference speedup on TPUs using diffusion-style speculative decoding technique.

u/eternviking·2 months ago·41 pts / 11 comm

TechCrunch AI· PRESS

PayPal says it’s ‘becoming a technology company again.’ That means AI.

PayPal is pitching an AI-led turnaround, tying automation and restructuring to $1.5B in savings as it cuts jobs and works to modernize its tech stack.

Sarah Perez·2 months ago

r/ClaudeAI· COMMUNITY

"Stream ended without a final message" in Claude Design

User reports 'Stream ended without a final message' error in Claude Design, a feature for sketching animations.

u/mazthepa·2 months ago·20 pts / 39 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Raising the Ceiling: Better Empirical Fixation Densities for Saliency Benchmarking

Proposes improved empirical fixation density estimation methods beyond fixed-bandwidth Gaussian KDE for saliency benchmarking and per-image model evaluation.

Susmit Agrawal·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs

QKVShare framework for quantized KV-cache handoff between multi-agent LLMs on edge devices; token-level mixed-precision allocation reduces memory vs. full-precision transfer.

Pratik Honavar·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Deco: Extending Personal Physical Objects into Pervasive AI Companion through a Dual-Embodiment Framework

Dual-Embodiment Companion Framework extends AI capabilities to personal physical objects (plush toys); formative study derives design principles for emotional continuity.

Zhihan Jiang·2 months ago

r/LocalLLaMA· COMMUNITY

ProgramBench: Can we really rebuild huge binaries from scratch? (doesn't look like it)

ProgramBench: 200-task evaluation showing agents struggle to rebuild large binaries from scratch without cheating vulnerabilities.

u/klieret·2 months ago·41 pts / 18 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models

DMGD proposes training-free dataset distillation using diffusion models with semantic-distribution matching guidance.

Qichao Wang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Spatiotemporal Convolutions on EEG signal -- A Representation Learning Perspective on Efficient and Explainable EEG Classification with Convolutional Neural Nets

Study compares 2D spatiotemporal convolutions vs. concatenated 1D convolutions for EEG signal classification with CNNs.

Laurits Dixen·2 months ago

← Front Page30 stories

← Newer Older →

The Archive

A Benchmark for Interactive World Models with a Unified Action Generation Framework

The Counterexample Game: Iterated Conceptual Analysis and Repair in Language Models

Towards Open World Sound Event Detection

Magic-Informed Quantum Architecture Search

PHALAR: Phasors for Learned Musical Audio Representations

Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes

Exact ReLU realization of tensor-product refinement iterates

Atomic Fact-Checking Increases Clinician Trust in Large Language Model Recommendations for Oncology Decision Support: A Randomized Controlled Trial

Ecologically-Constrained Task Arithmetic for Multi-Taxa Bioacoustic Classifiers Without Shared Data

Gemma 4 MTP released

Google is partnering with XPRIZE and Range Media Partners on the $3.5 million Future Vision film competition.

Agents for financial services

How to Build In-Vehicle AI Agents with NVIDIA: From Cloud to Car

Steer Like the LLM: Activation Steering that Mimics Prompting

CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing

Graph Neural Networks in the Wilson Loop Representation of Abelian Lattice Gauge Theories

Contextual Multi-Objective Optimization: Rethinking Objectives in Frontier AI Systems

Use Qwen3.6 right way -&gt; send it to pi coding agent and forget

Building for the Rising Complexity of Agentic Systems with Extreme Co-Design

From Data Lifting to Continuous Risk Estimation: A Process-Aware Pipeline for Predictive Monitoring of Clinical Pathways

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA)

Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding- Google Developers Blog

PayPal says it’s ‘becoming a technology company again.’ That means AI.

"Stream ended without a final message" in Claude Design

Raising the Ceiling: Better Empirical Fixation Densities for Saliency Benchmarking

QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs

Deco: Extending Personal Physical Objects into Pervasive AI Companion through a Dual-Embodiment Framework

ProgramBench: Can we really rebuild huge binaries from scratch? (doesn't look like it)

DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models

Spatiotemporal Convolutions on EEG signal -- A Representation Learning Perspective on Efficient and Explainable EEG Classification with Convolutional Neural Nets

Use Qwen3.6 right way -> send it to pi coding agent and forget