The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Evaluating Commercial AI Chatbots as News Intermediaries

AI chatbots are rapidly shaping how people encounter the news, yet no prior study has systematically measured how accurately these systems, with their proprietary search integrations and retrieval-synthesis pipelines, handle emerging facts across languages and regions. We present a 14-day (February 9-22, 2026) evaluation of six AI chatbots (Gemini 3 Flash and Pro, Grok 4, Claude 4.5 Sonnet, GPT-5 and GPT-4o mini) on 2,100 factual questions derived from same-day BBC News reporting across six regional services (US & Canada, Arabic, Afrique, Hindi, Russian, Turkish). The best systems achieve ove...

Mirac Suzgun·1 month ago

The Archive

Evaluating Commercial AI Chatbots as News Intermediaries

DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback

FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection

SDPM: Survival Diffusion Probabilistic Model for Continuous-Time Survival Analysis

MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data

CogAdapt: Transferring Clinical ECG Foundation Models to Wearable Cognitive Load Assessment via Lead Adaptation

Deep Reinforcement Learning for Flexible Job Shop Scheduling with Random Job Arrivals

Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job Scheduling

Reducing Political Manipulation with Consistency Training

Understanding Data Temporality Impact on Large Language Models Pre-training

Trump delays AI security executive order: ‘I don’t want to get in the way of that leading’

Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation

Advancing Mathematics Research with AI-Driven Formal Proof Search

Towards a General Intelligence and Interface for Wearable Health Data

Lumberjack: Better Differentially Private Random Forests through Heavy Hitter Detection in Trees

Cyber-Physical Anomaly Detection in IoT-Enabled Smart Grids Using Machine Learning and Metaheuristic Feature Optimization

Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning

Plug-in Losses for Evidential Deep Learning: A Simplified Framework for Uncertainty Estimation that Includes the Softmax Classifier

SeqLoRA: Bilevel Orthogonal Adaptation for Continual Multi-Concept Generation

Ternary Decision Trees with Locally-Adaptive Uncertainty Zones

Proxy-Based Approximation of Shapley and Banzhaf Interactions

The Distillation Game: Adaptive Attacks & Efficient Defenses

Optimization over the intersection of manifolds

ChronoMedKG: A Temporally-Grounded Biomedical Knowledge Graph and Benchmark for Clinical Reasoning

HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools

Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models

Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation

The Endless AI guitar pedal has potential

How misalignment starts

Multiple Neural Operators Achieve Near-Optimal Rates for Multi-Task Learning