Topic

§ Research

Every story tagged with this topic, ordered by date.

Quoting Boris Cherny

Claude Opus 5 achieves lowest prompt injection vulnerability rate across evals and red team testing, per Anthropic's system card.

Simon Willison·1 day ago

arXiv (cs.AI/CL/LG)· ACADEMIA

3D-Aware VLMs with Implicit and Explicit Geometries

VLM-IE3D framework enhances vision-language models with implicit and explicit 3D geometry tokens from RGB video for improved spatial reasoning.

Wenhao Li·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Expanding Flow Maps

Expanding Flow Maps (EFMs) enable flow-based generative models to handle variable-dimensionality distributions via expanding interpolants with conditional noise.

Sophia Tang·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

GraphVid: Interactive Graph-Controllable Video Generation

GraphVid enables precise multi-object video generation control via graph-structured representations instead of trajectory or text constraints.

Vedant Shah·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Barzilai-Borwein Fails Superlinear Convergence on an Open Set of Quadratics for Every Dimension $n\geq 4$

Theoretical analysis proves Barzilai-Borwein optimization method fails superlinear convergence on open set of quadratics for dimension n≥4.

Dawei Li·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Synthetic data generation framework for quality control automation in gravure printing

Synthetic data generation framework using deep learning to automate surface defect detection in rotogravure printing quality control.

Korota Arsène Coulibaly·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Surprisal Theory is Tautological (without Rational Grounding)

Philosophical critique: surprisal theory's linguistic difficulty predictions are tautological without constraints on language model specification.

Ryan Cotterell·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Beyond Sufficiency: Time Series Explanation with Counterfactual Necessity

TimePNS framework for time-series model explanation using counterfactual necessity to identify essential (not spurious) decision factors.

Hongnan Ma·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MedGame: Storytelling Gamification Empowered by Large Language Models for Medical Education

MedGame transforms static clinical cases into interactive decision-driven learning games using LLMs and dual narrative/director engines.

Qian Wu·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Graph Learning on Ensembles of Cyclic Peptides: An Investigation of Molecular Ensemble Modeling

EnsembleEGNN molecular foundation model encodes conformational ensembles of cyclic peptides using equivariant GNNs with set attention pooling.

Aaron Feller·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Unsupervised Consensus-Based Anomaly Detection for Spatiotemporal Malaria Incidence in Ghana

Consensus anomaly detection applied to Ghana malaria surveillance data identifies spatiotemporal hotspots in Ashanti, Northern regions 2014-2023.

T. Ansah-Narh·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Beyond Sycophancy: Structured Resistance and Compliance in LLM Moral Reasoning

Study reveals LLM moral reasoning involves structured resistance-compliance dynamics paralleling human social psychology, beyond simple sycophancy reduction.

Baihui Wang·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Visual Contrastive Self-Distillation

VCSD proposes visual contrastive self-distillation removing need for privileged information in on-policy distillation via pure input conditioning.

Yijun Liang·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MIRROR: Learning from the Other View for Multi-Modal Reasoning

MIRROR framework exploits complementary reasoning paths across text, diagram, and combined modalities to improve vision-language model reasoning on geometry problems.

Wen Ye·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

X$^3$-OPD: Distilling Reasoning into Large Audio-Language Models via On-Policy Alignment

X³-OPD cross-modal distillation framework transfers reasoning from text LLM teacher to audio-language student via on-policy alignment and acoustic perception.

Dongjie Fu·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Neural solutions of coupled ghost and gluon Dyson--Schwinger equations in Landau gauge

Neural networks solve coupled Dyson-Schwinger equations for Yang-Mills gauge theory with percent-level agreement to fixed-point solutions.

Rodrigo Carmo Terin·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Zero-Flow Two-Sample Tests

Zero-Flow Two-Sample Test uses learned directional misalignment patterns for distribution testing, separating witness learning from hypothesis evaluation.

Yakun Wang·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Windowed-MTP: Removing the Full-Context Draft-KV Tax at Million-Token Context

Windowed-MTP optimizes speculative decoding at million-token context by eliminating full-KV attention overhead in multi-token prediction draft heads.

Alagappan Valliappan·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

From Resource Flow to Executable Tests: Petri-Net-Guided LLM Test Generation for Concurrent Stateful Rust APIs

Petri-net-guided LLM test generation for concurrent Rust APIs addresses shallow test synthesis by integrating formal models with executable test concretization.

Kaiwen Zhang·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ElasticTTT: Prior-Preserving Test-Time Tuning for Video Editing

ElasticTTT framework prevents prior collapse in test-time tuning of diffusion models for video editing by preserving distribution-mapping during optimization.

Yueyi Liu·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

GS-Agent: Creating 4D Physical Worlds With Generative Simulation

GS-Agent generates physically plausible 4D worlds from natural language by combining foundation models with agentic simulation and physics constraints.

Hongxin Zhang·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Artificial Epanorthosis: Why large language models overuse a classical rhetorical figure, and how to mitigate it

LLMs systematically overuse epanorthosis (classical self-correction rhetoric) due to promotional training distributions and RLHF preference for emphatic phrasing.

Federico Boggia·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Toward Generalizable Cognitive Impairment Detection with Speech-Based Multimodal Large Language Models

Speech-based multimodal LLMs detect cognitive impairment across diverse speakers and devices by leveraging linguistic and acoustic biomarkers with improved generalization.

Yingchao Huang·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

What, Where, and How: Disentangling the Roles of Task, Language, and Model in Code Model Representations

Analysis of code model representations shows Qwen2.5-Coder and DeepSeek-Coder align on grammatical concepts across Python/Rust, with task-driven specialization.

Piotr Wilam·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Compact Latent Coordination for Autonomous Vehicles at Unsignalized Intersections

MAPS: hierarchical MARL system using centralized proto-plan embeddings for decentralized AV coordination at unsignalized intersections.

Gil Lifshits·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Finite-Sample Coverage Audits for High-Recall Candidate Generation: Certification and Learning-Theoretic Design

Label complexity bounds for auditing high-recall candidate generation pipelines with finite-sample validity guarantees.

Martin Anthony·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Error Certificates for KV-Cache Eviction via Randomized Design

Randomized KV-cache eviction with error certification via Hájek correction, proving deterministic eviction hides information loss.

Peng Xie·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

AREX: Towards a Recursively Self-Improving Agent for Deep Research

AREX: recursively self-improving research agent exploiting discovery-verification asymmetry to refine multi-constraint answers.

Shuqi Lu·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Detecting LLM-Generated Tokens in Human--LLM Coauthored Text

Token-level detection method for LLM-generated content in human-AI coauthored text using score smoothing.

Yangjun Lu·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

RUMBA: Russian User Memory Benchmark

RUMBA: Russian benchmark for long-term LLM conversational memory with fine-grained taxonomy across temporal reasoning dimensions.

Elizaveta Shevtsova·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

KroQuant: Kronecker-Structured Block Transforms for Efficient Post-Training Quantization of Diffusion Transformers

KroQuant: Kronecker-structured block transforms for W4A4 post-training quantization of diffusion transformers with efficient inference.

Yann Bouquet·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

When Trivia Is Not Trivial: Everyday Knowledge Failures in Multilingual LLMs

TriviaRoomQA benchmark evaluates multilingual LLM performance on 3,300 culturally-grounded trivia questions across 6 European languages and long-tail knowledge.

Anna Mosolova·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Climate-resilient electric vehicle charging infrastructure for sustainable cities: An interpretable causal-ensemble framework for preventive maintenance and low-carbon mobility

FGDSE framework applies causal-ensemble methods to predict EV charging infrastructure faults under climate stress for preventive maintenance.

Cande Lian·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Agent-Guided Relational Concept Discovery: Toward Interpretable Surgical Margin Assessment

Concept-based agent-guided learning improves interpretability and generalization of deep learning models for surgical margin assessment via REIMS spectroscopy.

Nooshin Maghsoodi·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Adaptive Identity Anchoring: Closed-Loop Keyframe Placement for Synthetic Paired Supervision in Video Face Swapping

Adaptive Identity Anchoring improves video face swapping by optimizing keyframe placement for synthetic paired supervision in identity transfer.

Logan Robbins·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Token Budget Saturation and Mechanistic Early Detection of Reasoning Non-Convergence in Chain-of-Thought Models

Linear probes on hidden states detect early non-convergence in chain-of-thought reasoning; DeepSeek-R1-Distill-Qwen-7B shows 90.3% converged vs 6.6% non-converged AIME accuracy.

Renuka Oladri·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Context-weighted Discrete Flow Matching

Context-weighted Discrete Flow Matching modifies CTMC to weight training targets by local context density, improving generative modeling on discrete structures.

Daniil Cherniavskii·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Semantic-Aware Task Clustering for Constructive and Cooperative Multi-Tasking

Semantic-aware task clustering for Cooperative Multi-Task Semantic Communication (CMT-SemCom) ensures constructive multi-tasking by aligning tasks post-initialization.

Ahmad Halimi Razlighi·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

An Evaluation Framework for Structured Audio Captions Validated by Controlled Perturbations

Multi-axis evaluation framework for structured audio captions on AudioCards dataset validates five orthogonal dimensions beyond flat text metrics.

Liang-Yuan Wu·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Bridging the Gap Between Plausibility and Admissibility: Constraint-Aware Flow Maps for Dynamic Graph Systems

Constraint-aware flow maps apply symbolic filtering, weighting, and repair to conditional diffusion models for dynamically feasible graph trajectory generation.

Michael Romei de Socio·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

PATS: Policy-Aware Training Scaffolding for Agentic Reinforcement Learning

PATS reframes skills as dynamic training scaffolds for LLM agent reinforcement learning, converting rollout groups to reduce failure repetition in long-horizon tasks.

Yipeng Shi·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Logical Regression for Planning with Axioms

Approximation method for logical regression in automated planning domains with axioms, enabling robust plan execution.

Connor Little·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Cautious optimism for deep parameterized quantum circuits

Theoretical analysis of generalization in parameterized quantum circuits, showing double descent phenomenon in quantum ML.

Marie Kempkes·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Cycle-Consistent and Uncertainty-Aware Neural Surrogates for Tokamak Edge Plasmas

Cycle-consistent neural surrogate for tokamak edge plasma prediction with uncertainty quantification for real-time control.

Abdourahmane Diaw·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Anti-Periodic Positional Encoding: Möbius Boundary Conditions Make In-Context Retrieval Reliable

Möbius RoPE: anti-periodic positional encoding improving in-context retrieval reliability in 160M–410M-class language models.

Ji Ho Bae·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Diffusion-Model Subpopulation Digital Twin for Mobile Health Deployment: A Case Study on the HeartSteps Intervention

Diffusion-model digital twins for vetting just-in-time adaptive intervention algorithms before mobile-health deployment.

Ziping Xu·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MSBraM: A Multi-scale Self-supervised Brain Foundation Model for Hierarchical EEG Dynamics Learning

MSBraM: self-supervised foundation model for EEG capturing multi-scale temporal brain dynamics across downstream tasks.

Tao Zhou·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Word meaning co-determines vowel-inherent spectral change. A corpus-based investigation of conversational Mandarin

Corpus study finds word semantics correlate with vowel spectral trajectories in Mandarin speech, using embeddings and GAM.

Xiaoyun Jin·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Multimodal Pretraining for Generalizable EEG Representation Learning

Multimodal foundation model for EEG combines Mamba raw-signal encoder and ViT for time-frequency data to improve generalization.

Targol Bakhtiarvand·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Towards Faithful Graph Explanations with Synergistic Edge Effects via Granular Balls

SeeExplainer interprets GNNs by capturing synergistic edge effects via granular balls, addressing limitations of perturbation-based methods.

Jiancu Chen·3 days ago

← Front Page50 stories