The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

arXiv (cs.AI/CL/LG)· ACADEMIA

Dynamics-Level Watermarking of Flow Matching Models with Random Codes

Proposes dynamics-level watermarking for flow matching models by embedding signals in learned velocity fields.

Shuchan Wang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search

LLM-guided tree search autonomously discovers multi-pathogen forecasting models for influenza and respiratory diseases.

Sarah Martinson·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Layer Equivalence Is Not a Property of Layers Alone: How You Test Redundancy Changes What You Find

Shows replacement and interchange tests measure different notions of transformer layer equivalence for compression.

Gabriel Garcia·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

FORGE evolves natural-language memory for ReAct agents via population-based protocol without weight updates.

Igor Bogdanov·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Unified Generative-AI Framework for Smart Energy Infrastructure: Intelligent Gas Distribution, Utility Billing, Carbon Analytics, and Quantum-Inspired Optimisation

Framework integrating smart metering, generative AI, and quantum optimization for energy utility operations.

Pavan Manjunath·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Universal Magnetic Structure Prediction from Atomic Coordinates with Near-Experimental Accuracy

Graph neural network predicts magnetic crystal structures from atomic coordinates; domain-specific materials science application unrelated to frontier AI systems.

Abhijatmedhi Chotrattanapituk·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Evaluating Design Video Generation: Metrics for Compositional Fidelity

Automated evaluation framework for design video generation fidelity across layout, motion, temporal, and content dimensions.

Adrienne Deganutti·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Artificial Aphasias in Lesioned Language Models

Lesion-based analysis maps emergent functional organization in 1B-scale language models using clinical aphasia symptom profiles.

Nathan Roll·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The Privacy Price of Tail-Risk Learning: Effective Tail Sample Size in Differentially Private CVaR Optimization

Theoretical analysis of differential privacy's impact on tail-risk CVaR optimization with complete rate decomposition.

El Mustapha Mansouri·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Argus: Evidence Assembly for Scalable Deep Research Agents

Argus: agentic system with Searcher-Navigator cooperation for evidence assembly in complex information-seeking tasks, addressing redundancy in parallel rollouts.

Zhen Zhang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

Fully Open Meditron: first end-to-end auditable clinical LLM pipeline with published training data, curation, and generation procedures.

Xavier Theimer-Lienhard·1 month ago

r/Anthropic· COMMUNITY

AI chatbot privacy has a web tracking problem

A new paper tested tracking across 20 popular AI chatbots using the same prompt everywhere: “pregnancy test near me.” The authors found that 17 of 20 chatbots sent some data to third parties, 15 shared chat URLs or conversation IDs with ad, analytics, or social tools, and some session replay tools captured readable parts of the prompt and answer. That matters because a chatbot is still a web app, with the same pixels, analytics, support widgets, attribution scripts, and replay tools we already know from the old internet. The difference is that the activity on the page is no longer just clic...

u/silence-and-magic·1 month ago·11 pts / 3 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Hypothesis-driven construction of mesoscopic dynamics

Framework for learning mesoscopic dynamics in multiscale systems via generalized Onsager principle with theoretical guarantees.

Zhuoyuan Li·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Scalable Nonparametric Continuous-Time Survival Model through Numerical Quadrature

QSurv: deep learning framework for nonparametric continuous-time survival modeling using Gauss-Legendre quadrature.

Chaeyeon Lee·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most

Benchmark of seven LLM tutoring agents on propositional logic reveals ceiling performance on correct solutions but systematic over-rejection of valid suboptimal answers.

Tahreem Yasir·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

Cost-performance study of compound LLM agent design in CybORG adversarial POMDP reveals context, reasoning, hierarchy tradeoffs.

Igor Bogdanov·1 month ago

r/OpenAI· COMMUNITY

Greg Brockman Officially Takes Control of OpenAI’s Products in Latest Shakeup

Greg Brockman assumes control of OpenAI's product division in internal leadership restructure.

u/wiredmagazine·1 month ago·87 pts / 12 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

Formal methods + ML for auditing and runtime monitoring of AI systems against behavioral constraints like safety rules and regulations.

Parand A. Alamdari·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

paper.json: A Coordination Convention for LLM-Agent-Actionable Papers

paper.json: JSON companion format enabling LLM agents to reliably extract claims, scope, and reproducibility info from academic papers.

Arquimedes Canedo·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Improving Cross-Cultural Survey Simulation with Calibrated Value Personas

Value-based persona construction improves LLM simulation of cross-cultural survey responses using cultural dimensions rather than demographics.

Axel Abels·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Optimized Three-Dimensional Photovoltaic Structures with LLM guided Tree Search

LLM-guided tree search (ERA) + AntiGravity coding agent generates novel 3D photovoltaic designs, demonstrating AI for scientific hypothesis generation.

Michael P. Brenner·1 month ago

The Verge AI· PRESS

AI radio hosts demonstrate why AI can’t be trusted alone

AI radio DJs demonstrated their volatile personalities. | Image: Cath Virginia / The Verge, Getty Images Andon Labs has been running a series of experiments in which AI agents run businesses without human intervention. Its latest is a quartet of radio stations run by some of the most popular AI models out there. "Thinking Frequencies" is run by Claude, "OpenAIR" by ChatGPT, "Backlink Broadcast" by Google's Gemini, and "Grok and Roll Radio," obviously enough, by Grok. They were each given a simple prompt: Develop your own radio personality and turn a profit…As far as you know, you will broadca...

Terrence O’Brien·1 month ago

r/LocalLLaMA· COMMUNITY

I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider & congressional trades, short data, FRED

Equibles: open-source MCP server enabling local LLMs to query SEC filings, 13F holdings, insider trades, and FINRA short data without cloud dependencies.

u/DanielAPO·1 month ago·55 pts / 10 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

Asteria: runtime system decoupling second-order optimizer state from GPU, enabling scalable sample-efficient LLM training via CPU/NVMe distribution.

Yishun Lu·1 month ago

Stratechery· ANALYST

2026.20: Shifting Alliances in a Changing World

Stratechery weekly digest covering computing trends, Musk commentary, and US-China relations; lacks specific AI technical details or announcements.

Ben Thompson·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Imitation learning for clinical decision support in pediatric ECMO

Imitation learning with TabPFN for pediatric ECMO clinical decision support, learning action models from unobserved-action trajectories.

Fateme Golivand·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

BAPR: Bayesian amnesic piecewise-robust reinforcement learning for non-stationary continuous control

BAPR: Bayesian Online Change Detection + robust ensemble RL for non-stationary continuous control, balancing stability and adaptability.

Yifan Zhang·1 month ago

The Verge AI· PRESS

Does Trump Mobile know how many stripes are on the American flag?

The T1 Phone has the wrong number of stripes, but it does at least have 50 stars. | Screenshot: Trump Mobile Where's the Trump phone? We're going to keep talking about it every week. We've reached out, as usual, to ask about the Trump phone's whereabouts. This week, despite our best hopes, we still don't have our phone - but we do have some fresh doubts about the company's patriotic credentials. This has been a momentous few days for Trump Mobile, in which it defied the haters by announcing that its phones will be shipping to buyers this very week. Not that there's any sign the company has ac...

Dominic Preston·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Second-Order Multi-Level Variance Correction for Modality Competition in Multimodal Models

ML-FOP-SOAP: second-order optimizer with multi-level variance correction stabilizing modality competition in unified image-generation/text-understanding models.

Yishun Lu·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Entropic Auto-Encoding via Implicit Free-Energy Minimization

Entropic Autoencoders (EAEs) address posterior collapse via implicit prior from free-energy-minimizing encoder ensemble, requiring only reconstruction loss.

Hazhir Aliahmadi·1 month ago

← Front Page30 stories

← Newer Older →

The Archive

Dynamics-Level Watermarking of Flow Matching Models with Random Codes

Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search

Layer Equivalence Is Not a Property of Layers Alone: How You Test Redundancy Changes What You Find

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

A Unified Generative-AI Framework for Smart Energy Infrastructure: Intelligent Gas Distribution, Utility Billing, Carbon Analytics, and Quantum-Inspired Optimisation

Universal Magnetic Structure Prediction from Atomic Coordinates with Near-Experimental Accuracy

Evaluating Design Video Generation: Metrics for Compositional Fidelity

Artificial Aphasias in Lesioned Language Models

The Privacy Price of Tail-Risk Learning: Effective Tail Sample Size in Differentially Private CVaR Optimization

Argus: Evidence Assembly for Scalable Deep Research Agents

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

AI chatbot privacy has a web tracking problem

Hypothesis-driven construction of mesoscopic dynamics

A Scalable Nonparametric Continuous-Time Survival Model through Numerical Quadrature

Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most

Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

Greg Brockman Officially Takes Control of OpenAI’s Products in Latest Shakeup

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

paper.json: A Coordination Convention for LLM-Agent-Actionable Papers

Improving Cross-Cultural Survey Simulation with Calibrated Value Personas

Optimized Three-Dimensional Photovoltaic Structures with LLM guided Tree Search

AI radio hosts demonstrate why AI can’t be trusted alone

I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider &amp; congressional trades, short data, FRED

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

2026.20: Shifting Alliances in a Changing World

Imitation learning for clinical decision support in pediatric ECMO

BAPR: Bayesian amnesic piecewise-robust reinforcement learning for non-stationary continuous control

Does Trump Mobile know how many stripes are on the American flag?

Second-Order Multi-Level Variance Correction for Modality Competition in Multimodal Models

Entropic Auto-Encoding via Implicit Free-Energy Minimization

I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider & congressional trades, short data, FRED