The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

The crypto weed vape found me on 4/20, the high holiday of cannabis enthusiasts everywhere. It arrived over Slack with the thumbnail of a man exhaling a plume of vapor, the words "every hit delivers Bitcoin" emblazoned across it. It claimed to be advertising a device called Gudtrip, and I thought everything about it sounded fake. So I went looking for it. What I eventually found, after weeks of searching, dozens of emails, and a reporting effort that spanned continents, was somehow even dumber than I'd imagined. My first port of call was Gudtrip's website, which only made the vape seem more l...

Robert Hart·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MedGym:A Unified Continuous-Time Benchmark for Dynamic Medical Treatment Reinforcement Learning

Medical treatment recommendation poses several challenges to reinforcement learning (RL): patient physiology evolves in continuous time, measurements and interventions are performed at irregular intervals, and treatment effects vary substantially across individuals. Existing RL formulations and simulated environments, however, are based on discrete-time MDP or POMDP abstractions with fixed or pre-specified decision intervals. Thus, it remains difficult to evaluate whether RL methods can handle time-interval-dependent disease progression, personalized treatment response, and safety between con...

Yuepeng Wang·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Revise, Don't Freeze: Sampler-Matched Training for Self-Correcting Masked Diffusion Language Models

Masked diffusion language models (MDLMs) re-predict every position at each denoising step, but standard samplers commit tokens once revealed, leaving this revision capability unused. Existing approaches either add heuristic or learned mechanisms to revise committed tokens, or remask them back to [MASK] before re-predicting; a principled sampler that directly revises visible tokens without auxiliary modules remains underexplored. We introduce D3IM, a parameter-free sampler derived as a corrector-style reverse update that permits direct visible-to-visible revision without additional modules or ...

Longxuan Yu·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

DSL-LLaDA: Scaling Continuous Denoising to 8B Masked Diffusion LMs

Discrete Masked diffusion language models generate text by iterative parallel decoding, but few-step decoding suffers from a tradeoff between length and quality: with a fixed step budget, standard methods can generate a short, high-quality output, or they can produce long but repetitive text. Continuous denoising can sidestep this tradeoff by evolving all positions jointly in embedding space, but building such a model from scratch at scale remains an open problem. We show that a pretrained masked DLM can instead be lightly adapted to support continuous embedding-space denoising. Starting from...

Longxuan Yu·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Data Collection for Training Quality-Control AI in Carpet Manufacturing

Visual inspection remains the dominant quality-control practice in woven and tufted carpet production, yet it is slow, subjective, and inconsistent at the line speeds and widths of modern looms. We present a design proposal for an in-line machine-vision system whose primary purpose is twofold: to inspect the carpet web in real time and, equally importantly, to systematically collect and label images of defect patterns so that increasingly capable quality-control models can be trained over the life of the installation.The proposal is grounded in a concrete industrial setting: a Six Sigma (DMAI...

Akbar Erkinov·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ProductWebGen: Benchmarking Multimodal Product Webpage Generation

Crafting a product display webpage from a source product image, along with layout and visual content instructions, holds significant practical value for domains such as marketing, advertising, and E-commerce. Intuitively, this task demands strict visual consistency across product displays and high-fidelity instruction following to jointly generate renderable HTML code. These requirements on controllability and instruction-following are closely aligned with the core features of advanced multimodal generative models, such as image editing models and unified models. To this end, this paper intro...

Zhihong Liu·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Tackling the Root of Misinformation by Teaching Laypeople about Logical Fallacies via Socratic Questioning and Critical Argumentation

Identifying logical fallacies in everyday discourse is challenging for many people. This challenge is amplified in the era of Large Language Models (LLMs), where malicious agents can deploy fallacious arguments to disseminate misinformation at scale. In this work, we explore the potential of LLMs as part of the solution. We introduce LFTutor, an intelligent tutoring system which uses LLMs to tutor laypeople and help them learn about logical fallacies. LFTutor integrates intent-driven Socratic questioning and critical argumentation principles to actively engage learners to reflect on their rea...

Minjing Shi·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Hybrid Verified Decoding: Learning to Allocate Verification in Speculative Decoding

Large Language Model (LLM) generation remains expensive because autoregressive decoding calls the model once for each new token. Speculative decoding reduces this cost by drafting multiple tokens and verifying them with the target model in one step, but its speedup depends on how many drafted tokens are accepted. Parameter-free draft sources can propose long continuations at low cost in structured and agentic workloads, yet a cache match that looks promising at one generation step may have low payoff at the next. We propose Hybrid Verified Decoding, which predicts the accepted length of a cac...

Xin Su·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

While End-to-End (E2E) Speech-Large Language Models (Speech-LLMs) are rapidly evolving, their evaluation methodologies remain limited to the era of simple transcription. Existing benchmarks suffer from three critical limitations: a pronounced bias towards high-resource languages, a focus on low-level recognition (ASR) rather than semantic reasoning, and a neglect of regional dialects. To bridge this gap, we introduce PolySpeech-100, a massive-scale benchmark designed to assess `native-level' speech comprehension across 110 linguistic variants. We employ a novel hybrid construction pipeline th...

Sicheng Yang·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

AI-IoT-Robotics Integration: Survey of Frameworks, Emerging Trends, and the Path Toward Connected Robotics

The convergence of Artificial Intelligence, the Internet of Things, and Robotics is no longer a futuristic vision; it is rapidly becoming the foundation of real-time, intelligent, and context-aware systems. AI enables perception and reasoning, IoT provides scalable sensing and communication, and robotics delivers embodied actuation. Despite significant progress in pairwise combinations such as AIoT and the Internet of Robotic Things (IoRT), there remains a lack of unified design frameworks that fully integrate all three. This survey synthesizes the state-of-the-art across these domains, empha...

Ranulfo Bezerra·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Cross-Axis Feature Fusion with Joint-Wise Motion Difference Prediction for Text-Based 3D Human Motion Editing

We address text-based 3D human motion editing, where the goal is to preserve the style and structure of a source motion while applying edits described in natural language. The release of the MotionFix dataset has spurred active research into training-based diffusion models that directly generate an edited motion from a source motion and a text instruction. While previous works have focused primarily on learning when an edit should occur temporally, our goal is to create a model that understands not only this temporal aspect but also which specific joints are responsible for the change. Target...

Gyojin Han·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Can AI Review Improve Paper Drafting? An Empirical Study on 20 Computer Architecture Submissions

Research is advancing faster than ever with artificial intelligence (AI); and so are the corresponding research papers. The exploding volume of AI-generated papers have put a strain to peer review, leading to the usage of AI-generated review, potentially wide yet sneaky. However, relevant ethical concerns about confidentiality, quality, and fairness are raised and no consensus has been reached in the broad research community. We expect the debate to continue for a while, but in the meantime, we ask an alternative, practical question: \textit{can AI review improve paper drafting?} We study 20 ...

Di Wu·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Property Prediction of Stacked Bilayer Materials: A Multimodal Learning Approach

AI for materials science is a critical topic within AI for science, aiming to accelerate materials discovery and produce accurate property predictions. Bilayer 2D material stacking is essential for exploring new materials with novel functions and inherent phenomena, enabling the creation of new 2D bilayers for diverse real-world applications. Research on bilayer vdWs materials has made significant progress from experimental and computational perspectives. Various bilayer materials have been successfully synthe sized experimentally and the increasing utilization of high-throughput computing te...

An Vuong·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

FVSpec: Real-World Property-Based Tests as Lean Challenges

We present a benchmark for evaluating AI models and agents on real-world formal software verification tasks. We first scrape 11,039 property-based tests (PBTs) from real-world Python repositories, then automatically translate 2,772 of them (25%) into 9,415 Lean 4 specifications with sorry placeholders (about 3 formalizations/PBT; we retain multiple attempts when none dominates on quality metrics). Translating PBTs into Lean specifications is challenging: it requires modeling Python semantics in Lean, inferring the logical property encoded in an imperative PBT, and handling the inherent diffic...

Quinn Dougherty·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference

Sparsely activated Mixture-of-Experts (MoE) models scale capacity via conditional computation, but distributed inference suffers from cross-GPU expert communication and routing-induced load imbalance. Existing placement methods reduce this cost by co-locating frequently co-activated experts; however, they derive a single deployment plan from globally aggregated routing traces, thereby averaging away the heterogeneous, task-specific co-activation patterns that actually drive communication in multi-task serving. We observe that expert co-activation is strongly task-conditioned: pairs tightly co...

Zhiyao Xu·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Theoretical Analysis of Engression and Reverse Markov Engression

Engression is a recently proposed and effective framework for conditional distribution learning. Its multi-step Reverse Markov extension further improves generative flexibility by decomposing complex conditional sampling into sequential reverse transitions. Despite their strong empirical performance, rigorous finite-sample statistical guarantees for these methods remain unavailable. In this paper, under deep neural network parameterizations, we establish nonasymptotic convergence bounds for Engression by directly controlling the Energy Distance between the learned and target conditional distr...

Jiaqi Huang·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher

Weak-to-strong generalization studies how to improve a strong student using supervision from a weaker teacher when reliable labels are scarce. We view this primarily as a data selection problem, where the key challenge is to identify which weak labels are reliable enough to serve as a training signal. To address this, we introduce trust functions that assign each weak label a scalar trust score and use these scores to filter weak supervision. Across several domains, including world knowledge, quantitative reasoning, and strategy games, trust filtering yields students that match and sometimes ...

Arda Uzunoglu·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Decoding in Order-Agnostic Language Models: Chain-Rule Deviation and Uniform Spreading

Order-agnostic language models (OALMs), including discrete diffusion language models (dLLMs), are trained to predict masked tokens under arbitrary conditioning sets, allowing sequences to be generated or scored under arbitrary reveal orders at inference time. In LLaDA-2.1, we report three findings. First, the learned conditionals are not exact factorizations of a coherent joint distribution: changing only the reveal order shifts target log-likelihood by up to 0.49 nats/token, so likelihood alone mixes content difficulty with path-dependent artifacts. Second, although confidence-first (CF) dec...

Lin Yao·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Subliminal Learning Is Steering Vector Distillation

Subliminal learning refers to a student language model acquiring a teacher's traits (e.g. a system-prompted preference for owls) when fine-tuned on the teacher's outputs, despite the outputs being semantically unrelated to those traits. It remains poorly understood how data without semantic meaning can transfer specific semantic traits. In this work, we show that subliminal learning is mediated by a single steering vector, i.e. a vector added to the model's activations. Across two open-source models, we find that the teacher's system prompt is well approximated by a steering vector, and that ...

Camila Blank·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Registry-Bound LLM Pipeline for Evidence-Grounded Trait Extraction across Tropical Plants, Aquatic Species, and Exotic Pets

We describe a registry-bound large-language-model extraction pipeline producing evidence-grounded structured trait records at scale, on cultivated tropical plant, aquatic, and pet species. Four mechanisms render LLM-derived rows auditable: a versioned 39-key closed-vocabulary trait registry constraining every admitted value to a typed schema; a per-row verbatim evidence quote tying each value to source text; a per-row confidence label (high or medium; low dropped pre-persist); and multi-version preservation. Applied to 409,880 publishable species from the Tropical Species Encyclopedia, the pi...

Jeff Wang·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Large Language Models in Transportation Systems Management and Operations: From Text Reasoning to Multi-modal Decision Support

Transportation systems management and operations (TSMO) increasingly depends on timely interpretation of heterogeneous data, from various sensor streams, incident reports, traveler feedback, and visual observations. Large language models (LLMs), including emerging multi-modal large language models (MM-LLMs), provide a new mechanism for integrating these structured and unstructured inputs into operator-facing decision support. This survey paper reviews LLM- and MM-LLM-based applications in TSMO across three domains: transportation operations & services (supply), mobility & fleet services (dema...

Siyan Li·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Data Enrichment for Symbolic Regression Using Diffusion Models

Symbolic regression (SR) offers a route to scientific discovery by converting observations into interpretable governing equations. However, despite its promise, its reliability degrades sharply when spatiotemporal measurements are sparse, noisy, or physically incomplete, as commonly occurring in practice. Data enrichment (DE) has been shown to be able to mitigate this limitation, yet additional samples can mislead equation discovery unless they preserve the physical structure of the target system. Such implication of DE requires narrow domain expertise as well as technical fluidity, highly li...

Simon De Reuver·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

An Open-Source Benchmark and Baseline for Multi-temporal Referring Segmentation

Large Vision-Language Models (LVLMs) have shown strong visual understanding and language-guided grounding abilities, yet their capacity for multi-temporal visual reasoning remains underexplored. To bridge this gap, we introduce \textbf{Multi-temporal Referring Segmentation (MTRS)}, a new task that aims to segment language-described temporal changes from multi-temporal images. MTRS extends conventional referring segmentation and change detection by jointly requiring temporal correspondence reasoning, language grounding, and pixel-level mask prediction. We propose \textbf{CRAFT-Agent}, an autom...

Bingyu Li·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Profiling Privacy Preservation Against Gradient Inversion Attacks in Tabular Federated Learning

Federated learning (FL) enables multiple data holders to train machine learning models collaboratively without centralizing raw data, making it useful in privacy sensitive domains such as healthcare and institutional data sharing. FL keeps data local to clients while communicating only model updates, such as gradients or model deltas. Nevertheless, these updates can expose private client data through gradient inversion attacks (GIAs). We study this risk for tabular FL under an honest-but-curious server threat model across FL protocols, client batch sizes, training stages, attacker assumptions...

Ivo Osterberg Nilsson·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Practical and Optimal Algorithm for Linear Contextual Bandits with Rare Parameter Updates

We study linear contextual bandits under rare parameter updates: the learner may incorporate reward feedback into its parameter estimate only at a small number of update times, while still observing contexts online and selecting actions sequentially. This viewpoint clarifies a practical distinction that is often blurred in the literature: many "strictly batched" methods additionally restrict within-interval context adaptivity, meaning that the action rule inside an interval cannot depend on the sequence of realized contexts/actions in that interval (beyond the current round's context). For li...

Sanghoon Yu·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Robust Asynchronous Planning via Auto-Formalization

LLMs can plan by either generating action sequences directly as a Planner or translating tasks into domain specific language for an external solver as a Formalizer. While most real-world tasks are asynchronous with non-uniform durations, concurrency, and execution-time constraints, existing benchmarks hardly cover them. We unify these asynchronous planning challenges under a single formulation and introduce the first three benchmarks that address each at scale. We conclude that the choice of formal representation primarily determines whether planning scales: as dependency graphs grow from 5 t...

Jiayi Zhang·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

UME: A Unified Meta-Generalization Framework for Cross-Domain ETA

Accurate Estimated Time of Arrival (ETA) prediction on checkout page is crucial in instant logistics for enhancing user satisfaction, optimizing dispatching, and controlling operational costs. In international on-demand delivery platforms, where ETA data originates from diverse countries or regions with different patterns, multi-domain modeling is of great importance and has been widely adopted. However, existing methods still face three critical challenges in real-world deployment. First, current multi-domain models struggle to generalize to completely unseen domains, failing to achieve zero...

Duo Wang·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Lost in Delusion: Examining LLM Safety Under User Delusions and Distress

LLM chatbots increasingly serve as a first source of support for people in psychological distress, including those whose distress is entangled with delusional beliefs. Prior work on LLM mental-health safety largely evaluates general therapeutic quality or single-turn crisis detection, leaving unclear how models behave when distress is intertwined with delusion over sustained conversations. We address this gap with matched multi-turn simulations, across clinically grounded personas and six LLMs, that pair each delusional conversation with a distress-only control to isolate the effect of delusi...

Andrew Aquilina·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

HypothesisMed: Inference-Time Answer Fusion and Structured Hypothesis-Space Reporting for Biomedical Question Answering

Biomedical question answering with large language models is commonly evaluated using answer accuracy, but answer accuracy alone does not indicate whether a model can produce parseable outputs, follow structured reliability instructions, recognize weak answer spaces, or avoid confident incorrect commitments. This paper presents HypothesisMed, an inference-time reliability pipeline for biomedical multiple-choice question answering. It combines direct, chain-of-thought, HypothesisMed-v3 prompting, and answer fusion. The final answer is selected by fusion, while HypothesisMed-v3 supplies SPACE la...

Md Motaleb Hossen Manik·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Prospect-Theory Behavior from Bellman Optimality in MDPs with Catastrophic States

We study risk-neutral control in Markov decision processes with an absorbing catastrophic state. Even though rewards are linear and the agent has no utility curvature, probability weighting, or framing dependence, standard Bellman optimality produces three prospect-theory-like signatures: an S-shaped value-function profile (convex near catastrophe, concave in the far field), an endogenous loss-sensitivity coefficient $λ^*(S) > 1$, and a reflection-effect policy reversal. Across 495 configurations, the optimal policy plays safe near catastrophe in positive-drift (growth) regimes despite the ri...

Yujiao Chen·19 days ago

← Front Page30 stories

← Newer Older →

The Archive

I went looking for the AI weed vape that gives you Bitcoin for smoking

MedGym:A Unified Continuous-Time Benchmark for Dynamic Medical Treatment Reinforcement Learning

Revise, Don't Freeze: Sampler-Matched Training for Self-Correcting Masked Diffusion Language Models

DSL-LLaDA: Scaling Continuous Denoising to 8B Masked Diffusion LMs

Data Collection for Training Quality-Control AI in Carpet Manufacturing

ProductWebGen: Benchmarking Multimodal Product Webpage Generation

Tackling the Root of Misinformation by Teaching Laypeople about Logical Fallacies via Socratic Questioning and Critical Argumentation

Hybrid Verified Decoding: Learning to Allocate Verification in Speculative Decoding

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

AI-IoT-Robotics Integration: Survey of Frameworks, Emerging Trends, and the Path Toward Connected Robotics

Cross-Axis Feature Fusion with Joint-Wise Motion Difference Prediction for Text-Based 3D Human Motion Editing

Can AI Review Improve Paper Drafting? An Empirical Study on 20 Computer Architecture Submissions

Property Prediction of Stacked Bilayer Materials: A Multimodal Learning Approach

FVSpec: Real-World Property-Based Tests as Lean Challenges

Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference

Theoretical Analysis of Engression and Reverse Markov Engression

Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher

Decoding in Order-Agnostic Language Models: Chain-Rule Deviation and Uniform Spreading

Subliminal Learning Is Steering Vector Distillation

A Registry-Bound LLM Pipeline for Evidence-Grounded Trait Extraction across Tropical Plants, Aquatic Species, and Exotic Pets

Large Language Models in Transportation Systems Management and Operations: From Text Reasoning to Multi-modal Decision Support

Data Enrichment for Symbolic Regression Using Diffusion Models

An Open-Source Benchmark and Baseline for Multi-temporal Referring Segmentation

Profiling Privacy Preservation Against Gradient Inversion Attacks in Tabular Federated Learning

Practical and Optimal Algorithm for Linear Contextual Bandits with Rare Parameter Updates

Robust Asynchronous Planning via Auto-Formalization

UME: A Unified Meta-Generalization Framework for Cross-Domain ETA

Lost in Delusion: Examining LLM Safety Under User Delusions and Distress

HypothesisMed: Inference-Time Answer Fusion and Structured Hypothesis-Space Reporting for Biomedical Question Answering

Prospect-Theory Behavior from Bellman Optimality in MDPs with Catastrophic States