The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why

Training-free diagnostic reveals when on-policy distillation helps vs. harms reasoning models at per-token granularity.

Mohammadreza Armandpour·1 month ago

Shields to Guarantee Probabilistic Safety in MDPs

Probabilistic shielding framework extends classical shields for MDPs; trade-offs between safety guarantees and permissiveness.

Linus Heck·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

LoKA applies FP8 precision to large recommendation models via kernel-level optimization, avoiding quality degradation.

Liang Luo·1 month ago

r/ClaudeAI· COMMUNITY

Claude finds out there are fanfics about him

u/IntergalacticCiv·1 month ago·27 pts / 6 comm

r/LocalLLaMA· COMMUNITY

PowerColor launches Radeon AI PRO R9600D with 32GB GDDR6 memory

PowerColor releases Radeon AI PRO R9600D GPU with 32GB GDDR6 memory in single-slot and passive cooling variants for inference workloads.

u/MundanePercentage674·1 month ago·40 pts / 28 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Neural Weight Norm = Kolmogorov Complexity

Proof that neural weight norms equal Kolmogorov complexity in fixed precision, explaining why weight decay induces Solomonoff's universal prior.

Tiberiu Musat·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Neural at ArchEHR-QA 2026: One Method Fits All: Unified Prompt Optimization for Clinical QA over EHRs

Neural1.5 method for clinical QA over EHRs using DSPy MIPROv2 optimizer for automated prompt tuning across four modular subtasks.

Abrar Majeedi·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents

AssayBench: benchmark for LLMs and agents on virtual cell phenotypic screening combining textual inputs with diverse cellular outputs.

Edward De Brouwer·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Compute Where it Counts: Self Optimizing Language Models

Self-Optimizing Language Models (SOL): dynamic per-token compute allocation via lightweight policy network paired with frozen LLM.

Yash Akhauri·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

CADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generation

CADBench: unified multimodal benchmark for CAD program generation with 18k samples across six modalities and design datasets.

Anna C. Doris·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Attractor-Vascular Coupling Theory: Formal Grounding and Empirical Validation for AAMI-Standard Cuffless Blood Pressure Estimation from Smartphone Photoplethysmography

Attractor-Vascular Coupling Theory: mathematical framework for cuffless blood pressure estimation from smartphone photoplethysmography.

Timothy Oladunni·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory

Decision-centric rate-distortion framework for agent memory compression prioritizing decision quality over descriptive faithfulness.

Mingxi Zou·1 month ago

r/MachineLearning· COMMUNITY

Where are small Models like Qwen3 0.6B and Qwen3.5 0.8B used ? Huggingface shows 2.88 million downloads this month.[D]

Qwen3.5 0.8B sees 2.88M monthly downloads; user reports semantic understanding, JSON parsing, and latency challenges in production workflows.

u/adssidhu86·1 month ago·31 pts / 26 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

BEACON: A Multimodal Dataset for Learning Behavioral Fingerprints from Gameplay Data

BEACON: 430GB multimodal dataset of Valorant gameplay for behavioral authentication and continuous monitoring across skill tiers.

Ishpuneet Singh·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD

BenchCAD: comprehensive benchmark for programmatic CAD generation from visual/textual inputs in realistic industrial settings.

Haozhe Zhang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization

Directional Groupwise Preference Optimization (DGPO): group-level margin-based framework for LLM alignment with directional consistency.

Mengyi Deng·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems

RUBEN uses rule extraction and pruning to explain RAG-LLM outputs and test safety robustness against adversarial prompts.

Joel Rorseth·1 month ago

r/LocalLLaMA· COMMUNITY

MiniCPM 4.6

MiniCPM 4.6 released on Hugging Face; open-weights efficient model variant with updated capabilities.

u/themrzmaster·1 month ago·87 pts / 11 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Masked Generative Transformer Is What You Need for Image Editing

EditMGT applies Masked Generative Transformers to localized image editing, outperforming diffusion-based approaches.

Wei Chow·1 month ago

TechCrunch AI· PRESS

Digg tries again, this time as an AI news aggregator

Digg returns (again) as another place to read AI news.

Sarah Perez·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Learning More from Less: Exploiting Counterfactuals for Data-Efficient Chart Understanding

Counterfactual data augmentation improves Vision-Language Models' chart understanding efficiency without scaling synthetic datasets.

Jianzhu Bao·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Grounded Satirical Generation with RAG

RAG-based satirical definition generator for Finnish news context with human-annotated evaluation framework.

Oona Itkonen·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The Generalized Turing Test: A Foundation for Comparing Intelligence

Generalized Turing Test formalizes agent intelligence comparison via indistinguishability, independent of tasks or datasets.

Daniel Mitropolsky·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?

Pi-Serini evaluates BM25 lexical retrieval sufficiency in agentic research loops paired with frontier LLMs.

Tz-Huan Hsu·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Conditional anomaly detection methods for patient-management alert systems

Distance-metric-based instance methods detect conditional anomalies in patient management alerts.

Michal Valko·1 month ago

r/singularity· COMMUNITY

Upcoming Leaked Gemini Omni VS Nearly Shutting Down Sora 2

Reddit user compares leaked Gemini Omni video model against Sora 2, which OpenAI is reportedly discontinuing.

u/Able-Line2683·1 month ago·106 pts / 51 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

BabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representation

BabelDOC preserves PDF layout during cross-lingual translation via intermediate representation decoupling structure from text.

Qi Yang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

DISCA steers LLM cultural preferences via sociodemographic disagreement signals without fine-tuning or white-box access.

Huynh Trung Kiet·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

Clin-JEPA extends joint-embedding predictive pretraining to EHR trajectories for multi-task patient risk prediction.

Yixuan Yang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Transcoda: End-to-End Zero-Shot Optical Music Recognition via Data-Centric Synthetic Training

Transcoda applies synthetic data and Humdrum kern encoding to optical music recognition without large labeled datasets.

Daniel Dratschuk·1 month ago

← Front Page30 stories

← Newer Older →