The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Evaluating whether AI models would sabotage AI safety research

Anthropic evaluates Claude models (Opus 4.7, Opus 4.6, Sonnet 4.6) for sabotage of AI safety research: finds zero unprompted or continuation-based sabotage.

Robert Kirk·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

NeSyCat: A Monad-Based Categorical Semantics of the Neurosymbolic ULLER Framework

NeSyCat: categorical semantics framework unifying classical, fuzzy, and probabilistic interpretations of ULLER neurosymbolic language via monad theory.

Daniel Romero Schellhorn·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Uncovering Latent Patterns in Social Media Usage and Mental Health: A Clustering-Based Approach Using Unsupervised Machine Learning

Unsupervised clustering identifies mental health risk profiles in social media users; application domain outside core AI architecture.

Md All Shahria·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Evaluation of Pose Estimation Systems for Sign Language Translation

Systematic benchmark of pose estimators (MediaPipe, OpenPose, Sapiens, SMPLest-X) for sign language translation tasks.

Catherine O'Brien·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models

RouteHead method uses query-dependent attention head selection to improve LLM-based document re-ranking.

Yuxing Tian·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Quantum Kernel Advantage over Classical Collapse in Medical Foundation Model Embeddings

Quantum SVM outperforms classical SVM on medical image classification using frozen embeddings from ViT; quantum computing application.

Sebastian Cajas Ordóñez·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Skill Retrieval Augmentation for Agentic AI

Skill Retrieval Augmentation enables LLM agents to retrieve relevant skills from large corpora without explicit enumeration.

Weihang Su·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Fraud Detection in Cryptocurrency Markets with Spatio-Temporal Graph Neural Networks

Graph neural networks detect cryptocurrency fraud by modeling spatio-temporal transaction patterns across related assets.

Lidia Losavio·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A systematic evaluation of vision-language models for observational astronomical reasoning tasks

AstroVLBench evaluates six frontier VLMs on 4,100+ astronomy tasks; Gemini 3 Pro best performer but modality-dependent gaps remain.

Wenke Ren·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data

FastOMOP architecture enables multi-agent LLM systems to generate real-world evidence from OMOP CDM healthcare data.

Niko Moeller-Grell·2 months ago

r/OpenAI· COMMUNITY

Uhhh

Post contains no substantive content.

u/EchoOfOppenheimer·2 months ago·63 pts / 12 comm

r/OpenAI· COMMUNITY

Is the subreddit logo off-center?

Subreddit design question about logo alignment.

u/ethotopia·2 months ago·56 pts / 10 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

MEG metric quantifies semantic grounding of multimodal evidence in RAG systems to reduce hallucination.

Xihang Wang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Enhancing molecular dynamics with equivariant machine-learned densities

DenSNet uses equivariant neural networks to predict electron density for molecular dynamics; materials science application.

Mihail Bogojeski·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

LLMs derive traffic law requirements for autonomous vehicles, scaling beyond manual formal-logic encoding of legal compliance.

Bowen Jian·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Aligned Multi-View Scripts for Universal Chart-to-Code Generation

Chart2NCode dataset: 176K charts with aligned Python/R/LaTeX scripts enabling cross-language chart-to-code generation.

Zhihan Zhang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Hierarchical Behaviour Spaces

Hierarchical Behaviour Spaces: linear combinations of reward functions enable more expressive policies in hierarchical RL, tested on NetHack.

Michael Tryfan Matthews·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Efficient learning by implicit exploration in bandit problems with side observations

Near-optimal bandit algorithm with side observations under partial observability, no prior knowledge of observation system required.

Tomas Kocak·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

GSC-QEMit: A Telemetry-Driven Hierarchical Forecast-and-Bandit Framework for Adaptive Quantum Error Mitigation

GSC-QEMit: adaptive quantum error mitigation framework using hierarchical clustering and bandits for near-term quantum devices.

Steven Szachara·2 months ago

r/OpenAI· COMMUNITY

OpenAI Reportedly Working on an AI Smartphone to Rival iPhone

OpenAI rumored to develop AI smartphone to compete with iPhone; unconfirmed report from social media.

u/anonboxis·2 months ago·56 pts / 50 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

GradMAP: Gradient-Based Multi-Agent Proximal Learning for Grid-Edge Flexibility

GradMAP: decentralized multi-agent learning for grid-edge device coordination embedding AC power-flow physics without parameter sharing.

Yihong Zhou·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Dialysis Risk Prediction and Treatment Effect Estimation for AKI patients using Longitudinal Electronic Health Records

Transformer-based causal model estimates drug treatment effects on dialysis risk in AKI patients using EHR sequences.

Kalyani P. Pande·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Extreme bandits

Extreme bandits: sequential resource allocation for detecting extreme values in security/medical settings with limited feedback.

Alexandra Carpentier·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator

STELLAR-E: fully automated synthetic dataset generation for domain/language-specific LLM evaluation without manual curation.

Alessio Sordo·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models

Layerwise Convergence Fingerprinting: tuning-free runtime defense detecting backdoors, jailbreaks, and prompt injections in LLMs.

Nay Myat Min·2 months ago

r/LocalLLaMA· COMMUNITY

How to run a local coding agent with Gemma 4 and Pi | Patrick Loeber

Tutorial: running a local coding agent with Gemma 4 and Pi using llama.cpp for on-device inference.

u/jacek2023·2 months ago·42 pts / 12 comm

The Verge AI· PRESS

Canva apologizes after its AI tool replaces ‘Palestine’ in designs

I suddenly feel so much better about every embarrassing typo I’ve ever made. | Original Illustration (left) by Agathe Singer One of Canva's new AI features has been caught replacing the word "Palestine" in designs. The Magic Layers feature - which is designed to break flat images out into separate editable components - isn't supposed to make visible alterations to user designs, but it was found by X user @ros_ie9 to automatically switch the phrase "cats for Palestine" to "cats for Ukraine." The issue was seemingly limited specifically to the word "Palestine," as @ros_ie9 noted that related wo...

Jess Weatherbed·2 months ago

r/Anthropic· COMMUNITY

Can't subscribe to Pro for a week, payment fails, support is a bot loop, and I'm owed €68 I can't use

Hey everyone, For over a week now, I've been trying to re-subscribe to the Pro plan from a free account, and I keep hitting the same wall: "*Payment failed. Please try again later. If the problem persists, contact support at https://support.anthropic.com/*" Here's the fun part: that link redirects you straight to Fin, their AI support chatbot. After 11 emails, the bot's only suggestion is… to go back to that same link. I've attached a screenshot of the last mail. I've already tried multiple devices, browsers, and network connections, double and triple-checking my billing info. I'm based i...

u/LoicVita·2 months ago·12 pts / 3 comm

OpenAI· FRONTIER

OpenAI available at FedRAMP Moderate

OpenAI achieves FedRAMP Moderate authorization for ChatGPT Enterprise and API, enabling U.S. federal agency deployment.

OpenAI·2 months ago

r/ClaudeAI· COMMUNITY

When your data is so bad...

Reddit discussion about data quality issues affecting Claude outputs; lacks technical specificity.

u/Crousus·2 months ago·285 pts / 16 comm

← Front Page30 stories

← Newer Older →