The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

RoSHAP: A Distributional Framework and Robust Metric for Stable Feature Attribution

RoSHAP proposes distributional framework and robust metric for stable feature attribution rankings amid stochastic variation.

Lanxin Xiang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Pelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action

Pelican-Unified 1.0 is unified embodied foundation model using single VLM for understanding, reasoning, and action generation.

Yi Zhang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Widening the Gap: Exploiting LLM Quantization via Outlier Injection

Quantization-conditioned attack exploits outlier injection to induce malicious behavior in LLMs via advanced quantization schemes.

Xiaohua Zhan·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution

Post-training quantization systematically reverses unlearning in LLMs; per-parameter forgetting updates fail under 4-bit compression.

Saisab Sadhu·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Training ML Models with Predictable Failures

Method to predict deployment-scale failure rates from limited evaluation sets using extreme-value extrapolation; quantifies inherent over-prediction bias.

Will Schwarzer·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Causal Foundation Models with Continuous Treatments

First causal foundation model handling continuous treatments via meta-learning; extends beyond binary intervention setting.

Christopher Stith·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

APWA: A Distributed Architecture for Parallelizable Agentic Workflows

APWA architecture parallelizes LLM-based multi-agent workflows to overcome reasoning and computational bottlenecks in complex tasks.

Evan Rose·1 month ago

The Verge AI· PRESS

Use this map to find the data centers in your backyard

An interactive map tracking data center construction and AI policy, built by Isabelle Reksopuro. When Oregon resident Isabelle Reksopuro heard Google was gobbling up public land to fuel its data centers in her home state, she didn't initially know what to believe. "There's a lot of misinformation about data centers," she said. "Google has denied taking that land." Technically, she explains, The Dalles, a city near the Washington state border, sought to reclaim that land, "and Google is just a big, unnamed power user." The city had in fact asked for ownership of a 150-acre portion of Mount Hoo...

Gaby Del Valle·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Natural Synthesis: Outperforming Reactive Synthesis Tools with Large Reasoning Models

Neuro-symbolic approach couples large reasoning models with model checkers for reactive hardware synthesis via iterative Verilog repair.

Frederik Schmitt·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

MemEye benchmark evaluates multimodal agent memory preservation of visual evidence across granularity and temporal-change dimensions.

Minghao Guo·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Understanding How International Students in the U.S. Are Using Conversational AI to Support Cross-Cultural Adaptation

Survey of 60 international students on adoption of conversational AI chatbots for cross-cultural adaptation support.

Laleh Nourian·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

CoCo-InEKF: State Estimation with Learned Contact Covariances in Dynamic, Contact-Rich Scenarios

CoCo-InEKF uses learned continuous contact covariances instead of binary states for robust legged-robot state estimation in dynamic contact scenarios.

Michael Baumgartner·1 month ago

r/singularity· COMMUNITY

The biggest AI breakthrough in medicine & drug discovery

Reddit post claiming major AI breakthrough in medicine/drug discovery; link-only with no substantive detail provided.

u/sdnr8·1 month ago·166 pts / 20 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

CLOVER: Closed-Loop Value Estimation \& Ranking for End-to-End Autonomous Driving Planning

CLOVER closes training-evaluation mismatch in end-to-end autonomous driving by learning value estimates aligned with rule-based planning metrics.

Sining Ang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

Taxonomy and audit framework for LLM attack benchmarks using 4×6 Target×Technique matrix; reveals coverage gaps in HarmBench, InjecAgent, AgentDoor.

Karthik Raghu Iyer·1 month ago

Ars Technica AI· PRESS

Your doctor’s AI notetaker may be making things up, Ontario audit finds

Made-up therapy referrals, incorrect prescriptions among the common mistakes.

Kyle Orland ·1 month ago

r/LocalLLaMA· COMMUNITY

The RTX 5000 PRO (48GB) arrived and it is better than I expected.

User reports positive experience running local LLMs on NVIDIA RTX 5000 PRO 48GB GPU versus Mac Studio alternative.

u/Valuable-Run2129·1 month ago·88 pts / 63 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Learning from Language Feedback via Variational Policy Distillation

Variational Policy Distillation addresses sparse rewards in RL by using adaptive language feedback to overcome teacher plateau and improve reasoning task performance.

Yang Li·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Proposal and study of statistical features for string similarity computation and classification

Statistical string similarity features using co-occurrence and run-length matrices generalize across languages without linguistic information.

E. O. Rodrigues·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Why Neighborhoods Matter: Traversal Context and Provenance in Agentic GraphRAG

Agentic GraphRAG citations require trajectory-level faithfulness validation that accounts for graph traversal structure and uncited nodes influencing answers.

Riccardo Terrenzi·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Logging Policy Design for Off-Policy Evaluation

Off-policy evaluation logging policy design balances reward concentration and action coverage to minimize estimation error for deployment-free experimentation.

Connor Douglas·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

Dataset-agnostic audio framework converts text-based tool-calling benchmarks (Confetti, When2Call) to voice evaluations using TTS and noise without re-annotation.

Md Tahmid Rahman Laskar·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Improving Multi-turn Dialogue Consistency with Self-Recall Thinking

Self-Recall Thinking framework improves multi-turn dialogue consistency by tracking non-adjacent turn dependencies without full-history context overhead.

Renning Pang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scaling

Dual-Dimensional Consistency unifies sampling width and depth for inference-time LLM scaling, balancing reasoning quality against budget constraints.

Rongman Xu·1 month ago

r/LocalLLaMA· COMMUNITY

When is Andrej Karpathy going to look at a chicken nugget and tweet that it helped him solve AGI, which in turn inspires 6 random devs to create GitHub projects giving us actual AGI?

Reddit appreciation post for Andrej Karpathy's influence on open-source AI development; no new technical content or announcements.

u/Porespellar·1 month ago·41 pts / 28 comm

r/Anthropic· COMMUNITY

Claude Pro Plan is Finally Usable!

u/JuanjoFuchs·1 month ago·11 pts / 4 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

From Data to Action: Accelerating Refinery Optimization with AI

AI framework interprets petrochemical refinery LP optimization outputs using historical data analysis to validate and improve solver-generated decisions.

Dániel Pfeifer·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction

Dynamic Batch-Sensitive Adam optimizer improves convergence on imbalanced sequential datasets by scaling learning rates via batch difficulty scores.

Daniel Asare Kyei·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models

Kernel ridge regression with Average Gradient Outer Product provably recovers low-dimensional central subspaces in multi-index models with sample complexity below prediction threshold.

Libin Zhu·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ML-Embed: Inclusive and Efficient Embeddings for a Multilingual World

ML-Embed introduces 3D Matryoshka Learning framework for efficient, multilingual text embeddings addressing computational cost and linguistic coverage gaps.

Ziyin Zhang·1 month ago

← Front Page30 stories

← Newer Older →

The Archive

RoSHAP: A Distributional Framework and Robust Metric for Stable Feature Attribution

Pelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action

Widening the Gap: Exploiting LLM Quantization via Outlier Injection

Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution

Training ML Models with Predictable Failures

Causal Foundation Models with Continuous Treatments

APWA: A Distributed Architecture for Parallelizable Agentic Workflows

Use this map to find the data centers in your backyard

Natural Synthesis: Outperforming Reactive Synthesis Tools with Large Reasoning Models

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

Understanding How International Students in the U.S. Are Using Conversational AI to Support Cross-Cultural Adaptation

CoCo-InEKF: State Estimation with Learned Contact Covariances in Dynamic, Contact-Rich Scenarios

The biggest AI breakthrough in medicine &amp; drug discovery

CLOVER: Closed-Loop Value Estimation \& Ranking for End-to-End Autonomous Driving Planning

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

Your doctor’s AI notetaker may be making things up, Ontario audit finds

The RTX 5000 PRO (48GB) arrived and it is better than I expected.

Learning from Language Feedback via Variational Policy Distillation

Proposal and study of statistical features for string similarity computation and classification

Why Neighborhoods Matter: Traversal Context and Provenance in Agentic GraphRAG

Logging Policy Design for Off-Policy Evaluation

From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

Improving Multi-turn Dialogue Consistency with Self-Recall Thinking

Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scaling

When is Andrej Karpathy going to look at a chicken nugget and tweet that it helped him solve AGI, which in turn inspires 6 random devs to create GitHub projects giving us actual AGI?

Claude Pro Plan is Finally Usable!

From Data to Action: Accelerating Refinery Optimization with AI

Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction

Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models

ML-Embed: Inclusive and Efficient Embeddings for a Multilingual World

The biggest AI breakthrough in medicine & drug discovery