History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions
Study shows frontier LLMs continue harmful actions when primed by prior unsafe steps in agent logs, revealing misalignment in long-horizon reasoning.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
Study shows frontier LLMs continue harmful actions when primed by prior unsafe steps in agent logs, revealing misalignment in long-horizon reasoning.
"Very painful": Altman relives his Muskian reaction to losing control over OpenAI.
Framework for agentic evolution integrates feedback organization and evidence management to improve program search and long-horizon agent planning.
VERIME combines LLMs with SMT solvers to audit natural-language specs, detecting ambiguity and safety violations in safety-critical requirements.
Transformer-based smartwatch framework for early psychotic relapse detection using uncertainty-driven anomaly detection on cardiac and motion signals.
Dithered Hadamard quantization provides theoretical guarantees for vector compression in KV cache and federated learning with O(d log d) complexity.
Parallel-scan recurrent neural networks enable scalable variational Monte Carlo for quantum many-body systems via parallelizable RNN architectures.
Theoretical result: finding ε-approximate stationary points in nonconvex-nonconcave min-max optimization requires exponential query complexity.
Anthropic introduces dedicated monthly Agent SDK credits for paid Claude plans, separating programmatic usage limits from interactive chat starting June 15.
Multi-level annotator modeling framework improves reproducibility in LLM evaluations by accounting for human rater bias and subjective variance.
Nous Research publishes efficient pretraining method using token superposition, reducing compute requirements for model training.
Multi-stage LLM pipeline reconstructs arguments from natural language into directed acyclic graphs of premises, conclusions, and logical relations.
Di-BiLPS neural framework solves PDEs under extremely sparse observations via denoising-induced bidirectional latent solvers with efficient inference.
Ensembits tokenizes protein conformational ensembles for dynamics-aware protein language modeling.
Active learning framework for machine-learning interatomic potentials scales to 200k structures via neural tangent kernels.
Unconfirmed reports of new ERNIE models from Baidu possibly launching in 2026; details sparse, sourced from tweets and a long video.
ML model predicts pregnancy-associated thrombotic microangiopathy from longitudinal lab data using interpretable methods.
Study compares cognitive operation tactics in bot-driven amplification vs. generative-AI-enabled disinformation campaigns.
Stateful transformer inference engine cuts streaming query latency from O(n) to O(|q|) via persistent KV cache and Flash Queries.
Resemble AI releases DramaBox, a voice synthesis model built on LTX 2.3, with open weights on Hugging Face.
LMPath uses LLMs and vision models to generate semantic exploration priors for UAV search missions.
MinT infrastructure scales LoRA post-training and serving across millions of adapted LLMs without merging checkpoints.
Empirical study evaluates whether LLMs correctly interpret formal semantics of High-Level Message Sequence Charts.
Figure AI demonstrates humanoid robots completing 8-hour autonomous shifts at human performance levels in livestreamed deployment.
Method detects step-level hallucinations in LLM reasoning by monitoring hidden-state trajectory geometry during inference.
Tiny-scale study compares dense vs. sparse (MoE) transformers under matched parameter budgets; sparse outperforms with top-2 routing.
Quantization technique for weight-only post-training of LLMs using waterfilling to optimize rate distribution across coordinates.
Meta CEO Mark Zuckerberg says its new Incognito Chat is "the first major AI product where there is no log of your conversations stored on servers." Messages in Incognito Chat aren't saved or stored in users' chat history, similar to incognito modes on other AI chatbots, but Meta says its version is different because it also uses end-to-end encryption, which Meta recently removed from Instagram DMs: "Other apps have introduced incognito-style modes, but they can still see the questions coming in and the answers going out. Incognito Chat with Meta AI is truly private, meaning no one - not even ...
Steganographic attack exfiltrating data from vector database embeddings; proposes cryptographic provenance defense for RAG systems.