The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Safety and accuracy follow different scaling laws in clinical large language models

SaFE-Scale framework reveals clinical LLM safety and accuracy follow divergent scaling laws; introduces RadSaFE-2 benchmark.

Sebastian Wind·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories

OpenSeeker-v2: SFT on informative trajectories achieves frontier LLM search agent capabilities without full RL pipeline.

Yuwen Du·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

HeadsUp: scalable feed-forward 3D Gaussian head reconstruction from multi-view captures using UV-parameterized representation.

Evangelos Ntavelis·2 months ago

r/MachineLearning· COMMUNITY

Production AI very different from the demos [D]

Production AI deployment reveals hidden cost scaling: token usage doubled after adding retrieval context, pushing teams from GPT-4o toward cheaper alternatives.

u/Far-Football3763·2 months ago·33 pts / 11 comm

TechCrunch AI· PRESS

Pennsylvania sues Character.AI after a chatbot allegedly posed as a doctor

According to Pennsylvania's filing, a Character AI chatbot presented itself as a licensed psychiatrist during a state investigation, and also fabricated a serial number for its state medical license.

Russell Brandom·2 months ago·+ covered by others

arXiv (cs.AI/CL/LG)· ACADEMIA

Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours

Dreadnode SDK enables agentic red teaming for AI systems; reduces manual vulnerability testing from weeks to hours.

Raja Sekhar Rao Dheekonda·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

BRIGHT-Retriever: benchmark and training approach for reasoning-intensive retrieval in agentic search, beyond topical matching.

Yilun Zhao·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Conditional Diffusion Sampling

CDS (Conditional Diffusion Sampling): combines parallel tempering and diffusion for sampling from unnormalized multimodal distributions.

Francisco M. Castro-Macías·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment

SymptomAI: conversational agent for differential diagnosis via Fitbit; real-world study (N=13,917) on everyday symptom assessment.

Joseph Breda·2 months ago

r/OpenAI· COMMUNITY

GPT-5.5 Instant is starting to roll out in ChatGPT.

OpenAI begins rollout of GPT-5.5 Instant model variant in ChatGPT, positioning faster inference tier.

u/Distinct_Fox_6358·2 months ago·54 pts / 17 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Enhanced 3D Brain Tumor Segmentation Using Assorted Precision Training

Medical imaging: assorted precision training for 3D brain tumor segmentation to improve early identification.

Adwaitt Pandya·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Physics-Grounded Multi-Agent Architecture for Traceable, Risk-Aware Human-AI Decision Support in Manufacturing

MAKA: multi-agent architecture for risk-aware CNC machining decision support; separates intent, quantitative analysis, and verification.

Danny Hoang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

EQUITRIAGE: A Fairness Audit of Gender Bias in LLM-Based Emergency Department Triage

Fairness audit of five LLMs (Gemini, GPT-4, DeepSeek, Mistral, Nemotron) on emergency triage reveals gender bias persistence in clinical decision support.

Richard J. Young·2 months ago

r/Anthropic· COMMUNITY

Both OpenAI and Anthropic now expect AIs to take over building their successors within 2 years (humans no longer able to contribute)

u/EchoOfOppenheimer·2 months ago·15 pts / 4 comm

Ars Technica AI· PRESS

Google Home gets upgraded Gemini voice assistant and new camera controls

Google's smart home ecosystem is getting its biggest update since the AI-fueled 2025 revamp.

Ryan Whitwam ·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration

Experience-RAG Skill introduces agent-oriented retrieval orchestration layer that learns task-specific retrieval strategies via experience memory.

Dutao Zhang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

From Intent to Execution: Composing Agentic Workflows with Agent Recommendation

Framework automates multi-agent system composition through intent-to-execution workflow and agent recommendation, replacing manual orchestration.

Kishan Athrey·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Flow Sampling: Learning to Sample from Unnormalized Densities via Denoising Conditional Processes

Flow Sampling framework uses diffusion models to sample from unnormalized densities via denoising conditional processes without data.

Aaron Havens·2 months ago

TechCrunch AI· PRESS

OpenAI releases GPT-5.5 Instant, a new default model for ChatGPT

The new GPT-5.5 Instant model will replace GPT-3.5 Instant as the default model for ChatGPT

Ivan Mehta·2 months ago

The Verge AI· PRESS

OpenAI claims ChatGPT’s new default model hallucinates way less

OpenAI's newest default model for ChatGPT might not make stuff up as much. Hallucinations have been an ongoing problem for AI models, but OpenAI says its new GPT-5.5 Instant model has "significant improvements in factuality across the board." The company claims that, based on "internal evaluations," GPT-5.5 Instant produced "52.5% fewer hallucinated claims" than its Instant model for GPT-5.3 "on high-stakes prompts covering areas like medicine, law, and finance." GPT-5.5 Instant also "reduced inaccurate claims by 37.3% on especially challenging conversations users had flagged for factual erro...

Jay Peters·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

Hallucination detection method bridges implicit neural uncertainty and explicit self-judgments via label constraint modeling for improved reliability.

Hao Mi·2 months ago

The Verge AI· PRESS

Meta sued by major book publishers over copyright infringement

Meta is facing a class action lawsuit filed by five major book publishers and one author over claims the company "engaged in one of the most massive infringements of copyrighted materials in history" when training its Llama AI models, as reported earlier by The New York Times. In their suit, Macmillan, McGraw-Hill, Elsevier, Hachette, Cengage, and author Scott Turow allege that Meta "repeatedly copied" their books and journal articles without permission. The lawsuit accuses Meta of knowingly ripping copyrighted work from "notorious pirate sites," such as LibGen, Anna's Archive, Sci-Hub, Sci-M...

Emma Roth·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Feature-Augmented Transformers for Robust AI-Text Detection Across Domains and Generators

Transformer-based AI-text detector using HC3 PLUS and M4 benchmarks demonstrates domain-robust detection with fixed thresholds across generators.

Mohamed Mady·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-Tuning

Weakly supervised framework for school detection from aerial imagery via pretrained model fine-tuning; out of scope for AI frontier.

Zakarya Elmimouni·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Pretrained Model Representations as Acquisition Signals for Active Learning of MLIPs

Active learning for quantum chemistry via pretrained MLIP latent space acquisition signals; domain-specific chemistry application.

Eszter Varga-Umbrich·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Inconsistent Databases and Argumentation Frameworks with Collective Attacks

Connects inconsistent database repairs to argumentation frameworks with collective attacks; theoretical computer science, not applied AI.

Yasir Mahmood·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Transformers with Selective Access to Early Representations

Transformer architecture innovation enables selective early layer access via learned mixing coefficients for memory-efficient low-level feature recovery.

Skye Gunasekaran·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

MOSAIC-Bench evaluates coding agents' vulnerability to multi-stage attack chains that decompose malicious goals into innocuous sequential tasks, exposing alignment gaps in deployed systems.

Jonathan Steinberg·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Integrating Feature Correlation in Differential Privacy with Applications in DP-ERM

CorrDP framework relaxes uniform differential privacy constraints to account for feature heterogeneity and correlations in machine learning.

Tianyu Wang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

TabSurv: Adapting Modern Tabular Neural Networks to Survival Analysis

TabSurv adapts modern tabular neural networks to survival analysis using Weibull distributions and a novel histogram loss for censored data.

Stanislav Kirpichenko·2 months ago

← Front Page30 stories

← Newer Older →