Evaluating whether AI models would sabotage AI safety research
Anthropic evaluates Claude models (Opus 4.7, Opus 4.6, Sonnet 4.6) for sabotage of AI safety research: finds zero unprompted or continuation-based sabotage.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
Anthropic evaluates Claude models (Opus 4.7, Opus 4.6, Sonnet 4.6) for sabotage of AI safety research: finds zero unprompted or continuation-based sabotage.
NeSyCat: categorical semantics framework unifying classical, fuzzy, and probabilistic interpretations of ULLER neurosymbolic language via monad theory.
Unsupervised clustering identifies mental health risk profiles in social media users; application domain outside core AI architecture.
Systematic benchmark of pose estimators (MediaPipe, OpenPose, Sapiens, SMPLest-X) for sign language translation tasks.
RouteHead method uses query-dependent attention head selection to improve LLM-based document re-ranking.
Quantum SVM outperforms classical SVM on medical image classification using frozen embeddings from ViT; quantum computing application.
Skill Retrieval Augmentation enables LLM agents to retrieve relevant skills from large corpora without explicit enumeration.
Graph neural networks detect cryptocurrency fraud by modeling spatio-temporal transaction patterns across related assets.
AstroVLBench evaluates six frontier VLMs on 4,100+ astronomy tasks; Gemini 3 Pro best performer but modality-dependent gaps remain.
FastOMOP architecture enables multi-agent LLM systems to generate real-world evidence from OMOP CDM healthcare data.
MEG metric quantifies semantic grounding of multimodal evidence in RAG systems to reduce hallucination.
DenSNet uses equivariant neural networks to predict electron density for molecular dynamics; materials science application.
LLMs derive traffic law requirements for autonomous vehicles, scaling beyond manual formal-logic encoding of legal compliance.
Chart2NCode dataset: 176K charts with aligned Python/R/LaTeX scripts enabling cross-language chart-to-code generation.
Hierarchical Behaviour Spaces: linear combinations of reward functions enable more expressive policies in hierarchical RL, tested on NetHack.
Near-optimal bandit algorithm with side observations under partial observability, no prior knowledge of observation system required.
GSC-QEMit: adaptive quantum error mitigation framework using hierarchical clustering and bandits for near-term quantum devices.
OpenAI rumored to develop AI smartphone to compete with iPhone; unconfirmed report from social media.
GradMAP: decentralized multi-agent learning for grid-edge device coordination embedding AC power-flow physics without parameter sharing.
Transformer-based causal model estimates drug treatment effects on dialysis risk in AKI patients using EHR sequences.
Extreme bandits: sequential resource allocation for detecting extreme values in security/medical settings with limited feedback.
STELLAR-E: fully automated synthetic dataset generation for domain/language-specific LLM evaluation without manual curation.
Layerwise Convergence Fingerprinting: tuning-free runtime defense detecting backdoors, jailbreaks, and prompt injections in LLMs.
Tutorial: running a local coding agent with Gemma 4 and Pi using llama.cpp for on-device inference.
I suddenly feel so much better about every embarrassing typo I’ve ever made. | Original Illustration (left) by Agathe Singer One of Canva's new AI features has been caught replacing the word "Palestine" in designs. The Magic Layers feature - which is designed to break flat images out into separate editable components - isn't supposed to make visible alterations to user designs, but it was found by X user @ros_ie9 to automatically switch the phrase "cats for Palestine" to "cats for Ukraine." The issue was seemingly limited specifically to the word "Palestine," as @ros_ie9 noted that related wo...
Hey everyone, For over a week now, I've been trying to re-subscribe to the Pro plan from a free account, and I keep hitting the same wall: "*Payment failed. Please try again later. If the problem persists, contact support at https://support.anthropic.com/*" Here's the fun part: that link redirects you straight to Fin, their AI support chatbot. After 11 emails, the bot's only suggestion is… to go back to that same link. I've attached a screenshot of the last mail. I've already tried multiple devices, browsers, and network connections, double and triple-checking my billing info. I'm based i...
OpenAI achieves FedRAMP Moderate authorization for ChatGPT Enterprise and API, enabling U.S. federal agency deployment.
Reddit discussion about data quality issues affecting Claude outputs; lacks technical specificity.