The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

SIRA framework improves retrieval-augmented agents by modeling expert search priors, reducing retrieval rounds and latency for organizational knowledge bases.

Zeyu Yang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Inductive Venn-Abers and related regressors

Extension of Venn-Abers predictors to unbounded regression using conformal prediction; narrow technical contribution to probabilistic forecasting.

Ivan Petej·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Edge-specific signal propagation on mature chromophore-region 3D mechanism graphs for fluorescent protein quantum-yield prediction

Chromophore-region 3D mechanism graphs for fluorescent protein quantum-yield prediction; domain-specific protein modeling task, limited AI audience relevance.

Yuchen Xiong·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

Comprehensive benchmark for Multimodal Domain Generalization revealing inconsistent evaluation protocols and fragmented research; validates real-world robustness challenges.

Hao Dong·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

StraTA framework adds explicit strategy sampling to agentic RL, improving credit assignment and exploration over long-horizon LLM decision-making.

Xiangyuan Xue·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models

Hybrid concept-based and abductive explanations for vision models using formal causal reasoning; advances interpretability beyond single-concept limits.

Ronaldo Canizales·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation

GlazyBench dataset (23K formulations) for ceramic glaze property prediction and generation; domain-specific multimodal benchmark with limited broad AI relevance.

Ziyu Zhai·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Recursive Agent Optimization

RAO trains recursive agents to delegate sub-tasks recursively, enabling divide-and-conquer inference scaling for longer contexts and harder problems.

Apurva Gandhi·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

ScaleLogic benchmark isolates proof-planning depth and logic expressiveness to systematically study RL training scaling for LLM long-horizon reasoning.

Tianle Wang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents

Framework for parsing and verifying source attribution in LLM research agents; evaluates citation accuracy via AST parsing and reproducible verification.

Hailey Onweller·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Crafting Reversible SFT Behaviors in Large Language Models

Method to compress SFT-induced LLM behaviors into sparse, causally necessary subnetworks for selective inference-time control.

Yuping Lin·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Hybrid Quantum-Classical GANs for the Generation of Adversarial Network Flows

Hybrid quantum-classical GAN framework for generating adversarial network traffic using variational quantum generators.

Prateek Paudel·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation

Lightweight asymmetric neural codec for compressing multi-modal sensor data on bandwidth-constrained edge devices.

Dan Jacobellis·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

PianoCoRe: Combined and Refined Piano MIDI Dataset

PianoCoRe: unified piano MIDI dataset with 250k performances, 5.6k pieces, and note-level alignments for music AI.

Ilya Borovik·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Parser agreement and disagreement in L2 Korean UD: Implications for human-in-the-loop annotation

Human-in-the-loop L2 Korean morphosyntactic annotation using parser agreement as proxy for correctness.

Hakyung Sung·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

MASPO: framework for joint prompt optimization across LLM-based multi-agent systems with unified evaluation.

Zhexuan Wang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Algospeak, Hiding in the Open: The Trade-off Between Legible Meaning and Detection Avoidance

Formalization of Algospeak coevolution dynamics between LLM evasion and detection; introduces Majority Understandable Modulation metric.

Jan Fillies·2 months ago

r/OpenAI· COMMUNITY

We’re introducing three audio models in the API that unlock a new class of voice apps for developers.

OpenAI releases three audio models via API enabling voice app development for developers.

u/OpenAI·2 months ago·74 pts / 14 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

Theoretical analysis of why sign-based optimizers (SignSGD, Muon) outperform SGD in foundation model training via ℓ1-norm bounds.

Hongyi Tao·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SkillOS: Learning Skill Curation for Self-Evolving Agents

SkillOS: self-evolving LLM-agent framework learning long-horizon skill curation policies from streaming task interactions.

Siru Ouyang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Online Bayesian Calibration under Gradual and Abrupt System Changes

Online Bayesian calibration method for time-evolving digital twins handling gradual drift and abrupt system changes.

Yang Xu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity

Mechanistic analysis traces attention sink phenomenon to variance discrepancy in value aggregation and FFN super-neuron activation.

Siquan Li·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders

SoftSAE introduces dynamic sparsity for Sparse Autoencoders, adapting feature activation count per input for improved mechanistic interpretability.

Jakub Stępień·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

Proves Transformers implement normalized gradient descent for in-context logistic regression, formalizing implicit algorithmic execution.

Chenyang Zhang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

DARTS: Targeting Prognostic Covariates in Budget-Constrained Sequential Experiments

DARTS optimizes covariate selection in budget-constrained randomized trials using Thompson sampling for causal inference.

Kateryna Husar·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents

AI CFD Scientist demonstrates LLM-based agents automating computational fluid dynamics discovery with physics validation loops.

Nithin Somasekharan·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation

Dynamic budget allocation for multi-turn LLM evaluation under conformal survival frameworks to predict rare jailbreak events.

Shai Feldman·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

Patch2Vuln reconstructs security vulnerabilities from Linux binary patches using LLM agents without source code access.

Isaac David·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Weight-Decay Turns Transformer Loss Landscapes Villani: Functional-Analytic Foundations for Optimization and Generalization

Functional-analytic proof that L² weight decay on Transformer cross-entropy satisfies Villani coercivity criteria for optimization guarantees.

Abhijit Das·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

UniSD unifies self-distillation design choices for LLM adaptation without external teachers, systematizing complementary techniques.

Yiqiao Jin·2 months ago

← Front Page30 stories

← Newer Older →