Safety and accuracy follow different scaling laws in clinical large language models
SaFE-Scale framework reveals clinical LLM safety and accuracy follow divergent scaling laws; introduces RadSaFE-2 benchmark.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
SaFE-Scale framework reveals clinical LLM safety and accuracy follow divergent scaling laws; introduces RadSaFE-2 benchmark.
OpenSeeker-v2: SFT on informative trajectories achieves frontier LLM search agent capabilities without full RL pipeline.
HeadsUp: scalable feed-forward 3D Gaussian head reconstruction from multi-view captures using UV-parameterized representation.
Production AI deployment reveals hidden cost scaling: token usage doubled after adding retrieval context, pushing teams from GPT-4o toward cheaper alternatives.
According to Pennsylvania's filing, a Character AI chatbot presented itself as a licensed psychiatrist during a state investigation, and also fabricated a serial number for its state medical license.
Dreadnode SDK enables agentic red teaming for AI systems; reduces manual vulnerability testing from weeks to hours.
BRIGHT-Retriever: benchmark and training approach for reasoning-intensive retrieval in agentic search, beyond topical matching.
CDS (Conditional Diffusion Sampling): combines parallel tempering and diffusion for sampling from unnormalized multimodal distributions.
SymptomAI: conversational agent for differential diagnosis via Fitbit; real-world study (N=13,917) on everyday symptom assessment.
OpenAI begins rollout of GPT-5.5 Instant model variant in ChatGPT, positioning faster inference tier.
Medical imaging: assorted precision training for 3D brain tumor segmentation to improve early identification.
MAKA: multi-agent architecture for risk-aware CNC machining decision support; separates intent, quantitative analysis, and verification.
Fairness audit of five LLMs (Gemini, GPT-4, DeepSeek, Mistral, Nemotron) on emergency triage reveals gender bias persistence in clinical decision support.
Google's smart home ecosystem is getting its biggest update since the AI-fueled 2025 revamp.
Experience-RAG Skill introduces agent-oriented retrieval orchestration layer that learns task-specific retrieval strategies via experience memory.
Framework automates multi-agent system composition through intent-to-execution workflow and agent recommendation, replacing manual orchestration.
Flow Sampling framework uses diffusion models to sample from unnormalized densities via denoising conditional processes without data.
The new GPT-5.5 Instant model will replace GPT-3.5 Instant as the default model for ChatGPT
OpenAI's newest default model for ChatGPT might not make stuff up as much. Hallucinations have been an ongoing problem for AI models, but OpenAI says its new GPT-5.5 Instant model has "significant improvements in factuality across the board." The company claims that, based on "internal evaluations," GPT-5.5 Instant produced "52.5% fewer hallucinated claims" than its Instant model for GPT-5.3 "on high-stakes prompts covering areas like medicine, law, and finance." GPT-5.5 Instant also "reduced inaccurate claims by 37.3% on especially challenging conversations users had flagged for factual erro...
Hallucination detection method bridges implicit neural uncertainty and explicit self-judgments via label constraint modeling for improved reliability.
Meta is facing a class action lawsuit filed by five major book publishers and one author over claims the company "engaged in one of the most massive infringements of copyrighted materials in history" when training its Llama AI models, as reported earlier by The New York Times. In their suit, Macmillan, McGraw-Hill, Elsevier, Hachette, Cengage, and author Scott Turow allege that Meta "repeatedly copied" their books and journal articles without permission. The lawsuit accuses Meta of knowingly ripping copyrighted work from "notorious pirate sites," such as LibGen, Anna's Archive, Sci-Hub, Sci-M...
Transformer-based AI-text detector using HC3 PLUS and M4 benchmarks demonstrates domain-robust detection with fixed thresholds across generators.
Weakly supervised framework for school detection from aerial imagery via pretrained model fine-tuning; out of scope for AI frontier.
Active learning for quantum chemistry via pretrained MLIP latent space acquisition signals; domain-specific chemistry application.
Connects inconsistent database repairs to argumentation frameworks with collective attacks; theoretical computer science, not applied AI.
Transformer architecture innovation enables selective early layer access via learned mixing coefficients for memory-efficient low-level feature recovery.
MOSAIC-Bench evaluates coding agents' vulnerability to multi-stage attack chains that decompose malicious goals into innocuous sequential tasks, exposing alignment gaps in deployed systems.
CorrDP framework relaxes uniform differential privacy constraints to account for feature heterogeneity and correlations in machine learning.
TabSurv adapts modern tabular neural networks to survival analysis using Weibull distributions and a novel histogram loss for censored data.