…

Vol. I · No. 59WED, JUN 17, 2026

Archive

The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

arXiv (cs.AI/CL/LG)· ACADEMIA

From Prompt to Service: An SLM-Based Agent Orchestration Gateway for AI-Driven Virtual Worlds

As generative AI capabilities expand, AI-driven virtual worlds face a growing architectural challenge. Users interact through in-world interfaces in multimodal ways, yet their requests demand fundamentally different AI backend models and computational resources. Embedding these capabilities directly into virtual world systems reduces extensibility, complicates maintenance, and limits the ability to coordinate services distributed across edge and cloud infrastructure. This paper presents an SLM-based Agent Orchestration Gateway, a lightweight runtime coordination mechanism that decouples a vir...

Louis Nisiotis·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Robust Optimization Approach to Sparse Principal Component Analysis

While principal component analysis (PCA) is a fundamental tool for dimensionality reduction, its dense representations make it ill-suited for high-dimensional data. Existing methods address this by promoting sparsity through explicit $\ell_1$-penalties, but these are not obvious to tune due to the unsupervised nature of the task. In contrast, we propose Adversarial PCA (AdvPCA), which leverages robust optimization to achieve sparsity by optimizing the reconstruction objective against bounded, worst-case latent space perturbations. We show that this formulation admits a closed-form reduction, ...

David Vävinggren·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration

Hyperparameter optimization (HPO) for Random Forest faces a specific difficulty in tuning the number of trees: the predictive score typically improves monotonically with ensemble size, so standard methods such as Tree-structured Parzen Estimator (TPE) and Hyperband require a predefined search range and often drive the estimate toward its right boundary. Early-stopping strategies avoid fixing such a range, but can be sensitive to score noise and prone to premature stopping. To address this, we propose an integrated triplet-based plateau-search algorithm that removes the number of trees from th...

Vadim Porvatov·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SAGE: A Quantitative Evaluation of Socialized Evolution in Agent Ecosystems

Self-improving language agents are typically evaluated in isolation: an agent attempts a task, receives feedback, and iteratively refines its own behavior. Yet agents increasingly operate alongside peers whose strategies and outcomes are publicly visible. This raises an under-studied question: when does shared experience produce improvements that self-improvement alone cannot achieve? We introduce SAGE (Social Agent Group Evolution),an evaluation framework that compares two compute-matched conditions: SocialEvo, where agents from five distinct model families co-evolve with access to all peers...

Linyue Pan·15 days ago

OpenAI· FRONTIER

Travelers deploys AI-powered claims countrywide with OpenAI

Travelers built an AI-powered Claim Assistant with OpenAI to guide customers through filing claims, provide 24/7 support, and scale operations during peak demand.

OpenAI·15 days ago

TechCrunch AI· PRESS

Rocket engine startup Impulse raises $500 million to hire people, not AI

Engineering physical systems still depends on human talent, according to Impulse Space president Eric Romo.

Tim Fernholz·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Can LLM Rerankers Predict Their Own Ranking Performance?

Retrieval effectiveness varies substantially across queries, making it important to estimate ranking quality before relevance judgments are available. Query performance prediction (QPP) addresses this need, but most existing methods rely on external predictors after retrieval or reranking. In this paper, we study \textit{reranker-internal QPP}: can an LLM reranker estimate the quality of the ranking it has just produced? We investigate both training-free and training-based approaches. For training-free estimation, we examine metric-specific self-consistency across sampled rankings and verbali...

Shiyu Ni·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

When Should the Teacher Move? Temporal Coupling and Stability in Self On-Policy Distillation

Self on-policy distillation trains a student policy against a teacher derived from its own parameter history, yet the teacher's update schedule -- which governs the \emph{temporal coupling} between teacher and student -- has not been systematically studied as a stability variable. Through a controlled schedule sweep on Qwen3-8B, we establish that \emph{isolation periods}, defined as complete teacher freezing between updates, are the key structural property enabling stable learning, not teacher age. To characterize these underlying training dynamics, we introduce a diagnostic framework of temp...

Haowei Guo·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

High-Precision APT Malware Attribution with Out-of-Scope Resilience

Early attribution of Advanced Persistent Threat (APT) activity can help defenders prioritise investigation, select countermeasures, and reduce the impact of an intrusion. Malware provides useful attribution evidence, but automated APT malware attribution remains difficult in practice. Existing approaches are typically trained and evaluated as closed-set classifiers over a limited number of known APT groups. In operational environments, however, classifiers are likely to encounter samples from groups not represented during training. Closed-set classifiers are then forced to assign such samples...

Peter Williams·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Post-Hoc Robustness for Model-Based Reinforcement Learning

To improve the real-world applicability of reinforcement learning (RL), the field of adversarially robust RL studies how to train agents under adversarial environment perturbations. In this setting, a protagonist agent optimizes a policy under environmental perturbations from an adversary, resulting in a zero-sum Markov game. When adversarially robust RL is combined with model-based RL, the adversary can target a learned transition model instead of the training environment. Extending this idea, this work introduces post-hoc robustification of deep RL agents at inference time. By using the lea...

Siemen Herremans·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Overlaying Governance: A Compositional Authorization Framework for Delegation and Scope in Agentic AI

As AI systems evolve from passive models into autonomous active agents capable of initiating actions, collaborating, and delegating tasks, the traditional boundaries of software systems blur. Traditional authorization and delegation frameworks, built around fixed principals, explicit requests, and static scopes, are insufficient to govern agentic systems. Agentic AI demands richer authorization semantics: agents must inherit and delegate permissions, act under time-limited authority, and coordinate through shared protocols. Existing Identity and Access Management (IAM) systems fail to fully c...

Amjad Ibrahim·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Scalable On-Hardware Training of Quantum Neural Networks and Application to Clinical Data Imputation

Training quantum neural networks (QNNs) on quantum hardware is currently bottlenecked by the cost of gradient estimation: standard parameter-shift methods require a number of circuit evaluations that grows quadratically with the number of trainable parameters, making hardware-based optimisation impractical beyond small system sizes. In this work, we introduce a training framework that reduces this cost to logarithmic in the number of qubits, making gradient-based QNN optimisation feasible on near-term hardware at increasing scales. Our framework combines three co-designed ingredients: (i) a s...

Natansh Mathur·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SPADE: Sketch-guided Path Planning Augmented with Diffusion Experts

Path planning is essential for Autonomous Mobile Robots (AMRs). Conventional methods for incorporating human preferences into planning typically rely on either complex reward engineering or hardware-intensive solutions. Recent state-of-the-art frameworks leverage imitation learning to train behavior-specific path planning models from expert demonstrations. However, these approaches face two key limitations: limited generalization to unseen environments and low robustness in demonstration collection. To address these challenges, this work introduces an enhanced framework that focuses on two ma...

Charbel Abi Hana·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language

We present BaltiVoice, a 16.8-hour read-speech corpus for Balti (ISO 639-3: bft), a Tibetic language spoken in Gilgit-Baltistan, Pakistan, with no prior publicly available ASR resources. The corpus contains 10,060 validated utterances in native Nastaliq script, derived from Mozilla Common Voice recordings. We fine-tune OpenAI Whisper-small on this corpus and report a Word Error Rate (WER) of 30.07% on a held-out validation set of 538 utterances, down from a measured zero-shot baseline of 182.18% for Whisper-small on Balti. The dataset, fine-tuned model, and a live transcription demo are publi...

Muhammad Ali·15 days ago

MIT Tech Review· PRESS

Rehumanizing global health care with agentic AI

The global health care sector is under increasing strain. Decades of chronic underinvestment and constraints in recruitment have coincided with a surge in demand for services for aging populations. Gaps in provision are already taking a toll, with fragmented access to care and high rates of stress and burnout among staff. And it’s getting worse.…

MIT Technology Review Insights·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiable Rewards (RLVR) on Chain-of-Thoughts (CoTs). However, since long CoTs naturally contain trial and errors and mainstream RLVR approaches choose outcome-correct CoT trajectories for memorization, the redundant explorations in long CoTs are inevitably reinforced, which results in the over-thinking issues of LRMs. Previous attempts to resolve this issue mainly give more advantage to shorter trajectories, yet their learning signals are still outcome-based and cannot reduce the memorizati...

Ziyan Liu·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Demystifying Pipeline Parallelism: First Theory for PipeDream

Training modern machine learning models increasingly requires computation to be distributed across many accelerators. Data parallelism remains the default choice and is often paired with tensor-parallel sharding, but model parallelism becomes unavoidable once parameters, activations, or optimizer states no longer fit on a single device. This paper studies pipeline model parallelism through the lens of PipeDream (PD) (Harlap et al., 2018). Our first contribution is theoretical: we introduce Randomized PipeDream (RPD), a stale block-SGD abstraction that yields, to our knowledge, the first clean...

Ivan Ilin·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

HiSE: A Lightweight Hierarchical Semantic Explainer for Heterogeneous Graph Neural Networks

Heterogeneous graph neural networks (HGNNs) have demonstrated remarkable performance in modeling complex relational data, however their interpretability in high-stakes applications remains a critical challenge. Existing explanation methods suffer from two major limitations: on the one hand, the generated explanations fail to reflect the inherent semantic hierarchy of HGNNs, resulting in a lack of fidelity to the model's internal decision-making mechanism; on the other hand, feature explanations often rely on complex search or perturbation mechanisms, leading to excessive computational complex...

Zongrui Li·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Low-Frequency Shortcuts in Texture-Driven Visual Learning

Neural networks suffer from shortcut learning, where learned features generalize well to the training set but not to in-distribution (ID) or out-of-distribution (OOD) test sets. Existing studies are all based on a few standard benchmarks, which are shape-driven. Numerous application domains, however, are texture-driven. In this work, we present shortcut learning analysis for texture-driven domains, and compare it with that of a standard benchmark. We show that texture-driven domains suffer from low-frequency shortcuts. They make the majority of their decisions based on a few low-frequency com...

Utku Şirin·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs

While Large Language Models (LLMs) excel in code generation, they remain prone to replicating subtle yet critical vulnerabilities endemic to their training data. Current alignment techniques, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), typically apply coarse-grained optimization at the sequence level. This approach often fails to address the localized nature of security flaws, where a single incorrect token choice can compromise an entire program. To bridge this gap, we introduce Tree-like Self-Play (TSP), a framework that reframes secure code generation as a fine-gr...

Wenqi Chen·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

NeuroArmor: Safe-Variant-Guided Representation Consistency for Selective Re-Anchoring in Jailbreak Defense

Large language models remain vulnerable to jailbreak attacks that hide harmful intent behind seemingly ordinary requests such as role-play, translation, encoding, adversarial suffixes, and multi-turn buildup. Existing defenses still struggle to handle these attacks without over-blocking benign but sensitive requests, partly because they often apply the same action to every prompt and therefore fail to balance safety and helpfulness. We propose NeuroArmor, a white-box runtime defense that uses prompt-specific safe variants as a local safety reference for deciding when intervention is needed an...

Zhongyang Lin·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation

Hyper-Connections (HC) replace the single Transformer residual stream with multiple streams, introducing a permutation symmetry over stream indices. We study how this symmetry is resolved in practice: whether streams specialize in a balanced way or exhibit dominant-stream usage. Using fine-grained diagnostics for HC-based language models, we trace how multi-stream representations are actually used. We find that after an early seeding stage, residual mixing often remains close to identity, limiting a core HC mechanism for exchanging information between streams. Moreover, both signal and interp...

Ekaterina Alimaskina·15 days ago

Anthropic· FRONTIER

Expanding Project Glasswing

We’re extending Project Glasswing to approximately 150 new organizations in more than fifteen countries.

Anthropic·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A formal definition and meta-model for a machine theory of mind

This paper proposes, for the first time, a rigorous formal definition of the concept of Machine Theory of Mind, based on principles supported by evidence from cognitive psychology, neuroscience and artificial intelligence, and uses the above as a lens to examine state-of-the-art and current efforts in the field, driving a potential agenda for further research there able to "crack" the problem. It also advances a general holistic meta-model for Machine Theory of Mind, and examines the state of the art when it comes to empirically benchmarking such models.

Fabio Cuzzolin·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems

LLM-based multi-agent systems exhibit remarkable collaborative capabilities in complex multi-step tasks. However, these systems are highly sensitive to single-step execution errors that can propagate through agent interactions and lead to cascading failures. To understand the causes of failure and improve system reliability, failure attribution has been introduced as a task that aims to automatically identify the root cause step responsible for a failure. Existing failure attribution methods mainly rely on LLMs to reason over original execution trajectories, which not only incur high inferenc...

Taiyu Zhu·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Rethinking the Role of Tensor Decompositions in Post-Training LLM Compression

Post-training compression is essential for deploying large language models (LLMs) under tight resource constraints. Tensor decompositions have emerged as a promising direction, offering compact parameterizations well suited to Transformer weight structures. However, existing studies evaluate these methods in narrow settings, leaving unclear whether tensorization is effective at large-scale deployment. We systematically evaluate tensor compression across dense and MoE architectures, establishing performance trade-offs grounded in both empirical analysis and theoretical analysis. We identify a ...

Artur Zagitov·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

DMF: A Deterministic Memory Framework for Conversational AI Agents

Conversational AI agents require memory systems that are both scalable and semantically coherent across long interaction horizons. Existing approaches rely predominantly on large language model (LLM)-based summarisation at write time, which introduces non-determinism, escalating token costs, and opacity in pruning decisions. We present the Deterministic Memory Framework (DMF), a CPU-first approach that replaces generative memory compression with a fully deterministic pipeline grounded in classical NLP analysis, vector geometry, and mathematical scoring. DMF assigns each conversational interac...

Matteo Stabile·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Topology-Aware Gaussian Graph Repair for Robust Graph Neural Networks

Graph neural networks have achieved strong performance on graph-structured data, but their effectiveness depends heavily on the quality of the observed graph. In real applications, graph topology is often imperfect: noisy edges may connect unrelated nodes, while missing edges may prevent useful information from being propagated. Existing robust graph learning methods mainly address this problem by removing suspicious edges or by learning a new graph structure during training. However, edge removal alone cannot recover missing connections, and graph structure learning may introduce additional ...

Anubha Goel·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

What Makes Interaction Trajectories Effective for Training Terminal Agents?

Stronger code agents are commonly assumed to be superior teachers for post-training, yet this assumption remains poorly disentangled from task difficulty, harness design, and student capacity. We investigate this pedagogical link using Terminal-Lego, a scalable pipeline that transforms multi-domain real-world issues into environment-verified agentic tasks. Surprisingly, standalone performance does not dictate teaching efficacy: while Claude Opus 4.6 achieves higher scores on Terminal-Bench 2.0, students fine-tuned on trajectories from DeepSeek-V3.2, a lower-scoring agent, exhibit significantl...

Sidi Yang·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Tonal parsimony in chord-sequence analysis: combining modulation cost and tonal vocabulary

We study the assignment of local tonalities to chord sequences, a task useful for harmonic analysis, composition, and jazz-oriented improvisation. Standard dynamic-programming approaches minimize modulations but can introduce unnecessarily many tonal centers. We compare this transition-only objective with pure minimum-vocabulary analysis and with tonal parsimony, which minimizes lexicographically the number of modulations and then the number of distinct tonalities. Although this joint objective is combinatorially hard in general, we give exact algorithms exploiting the fixed 24-tonality major...

François Pachet·15 days ago

← Front Page30 stories

← Newer Older →