The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models

SHAP-based explainability algorithm for time series foundation models enables transparent forecasting in critical infrastructure applications.

Matthias Hertel·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

On the Proper Treatment of Units in Surprisal Theory

Methodological analysis of unit definitions in surprisal theory and language model probability assignments for cognitive modeling.

Samuel Kiegeland·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Global Optimality for Constrained Exploration via Penalty Regularization

Penalty regularization approach achieves global optimality for constrained exploration in safety-bounded reinforcement learning.

Florian Wolf·2 months ago

r/Anthropic· COMMUNITY

Usage limit problem started again with Opus 4.7

So I started the morning with 1 message to summarize everything after I woke up on a session, and immediately got hit with usage limit exceeded (Im on max 5x plan). So I thought maybe it was my cron session (checked it and there were no tasks done at all over night). I have nothing else running.. After 5 hours, I started running a session again to continue working, 17 minutes later (I know its 17 minutes exact because I had a youtube video playing at the same time). Just went to 37% used. How is this even possible? The task I did was to create a simple .ps1 script. I've used claude code sin...

u/holdthefridge·2 months ago·11 pts / 9 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing

TACHIOM system improves multivector retrieval efficiency via token-aware clustering and hierarchical indexing for dense passage retrieval.

Silvio Martinico·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Claw-Eval-Live benchmark separates refreshable workflow signals from reproducible snapshots to evaluate evolving LLM agent task performance.

Chenxin Li·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes

Crab runtime enables semantics-aware checkpoint/restore for autonomous agent sandboxes, bridging agent-OS semantic gap for fault tolerance and RL.

Tianyuan Wu·2 months ago

r/singularity· COMMUNITY

GPT5.5 slightly outperformed Mythos on a multi-step cyber-attack simulation. One challenge that took a human expert 12 hrs took GPT-5.5 only 11 min at a $1.73 cost

GPT-5.5 completed a multi-step cyber-attack simulation in 11 min ($1.73) vs. 12 hrs for human expert; UK AI Security Institute benchmark.

u/socoolandawesome·2 months ago·157 pts / 44 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

Latent adversarial detection uses residual stream activation analysis to identify multi-turn prompt injection attacks with 93.8% accuracy.

Prashant Kulkarni·2 months ago

TechCrunch AI· PRESS

Stripe introduces Link, a digital wallet that autonomous AI agents can use, too

Link lets users connect cards, banks, and subscriptions, then authorize AI agents to spend securely via approval flows.

Sarah Perez·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Normativity and Productivism: Ableist Intelligence? A Degrowth Analysis of AI Sign Language Translation Tools for Deaf People

Critical analysis of AI sign language translation systems from disability justice perspective, highlighting bias and excluded deaf community input.

Nina Seron-Abouelfadil·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning

PRISM mitigates distributional drift in multimodal model post-training via three-stage black-box distillation before RL, addressing SFT-induced capability degradation.

Sudong Wang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces

S²VAE improves 3D geometry preservation in visual world models by learning latent representations that encode scene structure over appearance alone.

Andrew Bond·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Do Sparse Autoencoders Capture Concept Manifolds?

Theoretical framework shows sparse autoencoders can capture concept manifolds rather than assuming independent linear directions, with implications for interpretability.

Usha Bhalla·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures

DEFault++ hierarchically detects, classifies, and diagnoses faults in transformer attention mechanisms and components without runtime errors.

Sigma Jahan·2 months ago

r/singularity· COMMUNITY

Amid the race to build humanoid robots, it’s now 1X's turn to showcase its NEO factory

1X showcases NEO factory as part of humanoid robotics competition, with brief mention of factory artwork.

u/Distinct-Question-16·2 months ago·102 pts / 38 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Splitting Argumentation Frameworks with Collective Attacks and Supports

Novel splitting techniques for bipolar set-based argumentation frameworks incorporating collective attacks and supports.

Matti Berthold·2 months ago

NVIDIA Dev Blog· INFRA

Build AI-Powered Games with NVIDIA DLSS 4.5, RTX, and Unreal Engine 5

Today, game developers can begin integrating NVIDIA DLSS 4.5 with Dynamic Multi Frame Generation, Multi Frame Generation 6X, and the second-generation... Today, game developers can begin integrating NVIDIA DLSS 4.5 with Dynamic Multi Frame Generation, Multi Frame Generation 6X, and the second-generation transformer model for NVIDIA Super Resolution. In this post, we’ll go over new technologies and resources to share with our game-developer community, including: At CES 2026, we introduced DLSS 4.5, extending its AI-driven… Source

Phillip Singh·2 months ago

NVIDIA Dev Blog· INFRA

Speed Up Unreal Engine NNE Inference with NVIDIA TensorRT for RTX Runtime

Neural network techniques are increasingly used in computer graphics to boost image quality, improve performance, and streamline content creation. Approaches... Neural network techniques are increasingly used in computer graphics to boost image quality, improve performance, and streamline content creation. Approaches like super resolution, denoising, and neural rendering help real-time engines work more efficiently, offering new creative possibilities while keeping performance in mind. Unreal Engine 5 (UE5) has taken several steps in this direction… Source

Homam Bahnassi·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression

Auto-FlexSwitch reduces storage overhead in dynamic model merging via learnable compression of task-specific weight increments.

Junqi Gao·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Neural Aided Kalman Filtering for UAV State Estimation in Degraded Sensing Environments

Hybrid neural-Kalman filtering improves UAV state estimation in degraded sensing by combining learned nonlinear dynamics with principled uncertainty quantification.

Akhil Gupta·2 months ago

r/singularity· COMMUNITY

Elon Musk Admits xAi is Distilling OpenAI Models

Court filing alleges xAI distilled OpenAI models; significant if verified but requires legal context confirmation.

u/Independent-Ruin-376·2 months ago·119 pts / 47 comm

The Verge AI· PRESS

Meta is running get-rich-quick ads for its AI tools

Manus, an AI company Meta acquired for $2 billion last year is running ads promising quick, easy money with AI: Find local businesses without websites or with bad websites, have AI build them one, then call them up and sell it to them. As part of the campaign, Manus was paying content creators to build out Instagram, YouTube, and TikTok accounts that promote its AI product as an easy, lucrative gig. (The creators' TikTok accounts were taken down after The Verge inquired about them.) Some of these videos would also appear as official ads for Manus, but the posts on the paid creator accounts th...

Robert Hart·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

FiLMMeD: Feature-wise Linear Modulation for Cross-Problem Multi-Depot Vehicle Routing

FiLMMeD applies feature-wise linear modulation for cross-problem generalization in multi-depot vehicle routing via neural combinatorial optimization.

Arthur Corrêa·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Mapping the Methodological Space of Classroom Interaction Research: Scale, Duration, and Modality in an Age of AI

Framework mapping methodological dimensions of classroom interaction research (scale, duration, modality) in AI-enabled educational contexts.

Dorottya Demszky·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design

Guidelines for adversarial benchmark task design in terminal-agent evals, distinguishing verification logic rigor from prompt-based task writing.

Ivan Bercovich·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Towards Neuro-symbolic Causal Rule Synthesis, Verification, and Evaluation Grounded in Legal and Safety Principles

Neuro-symbolic framework integrating first-order logic, causal models, and RL for explainable, verifiable adaptations in safety-critical rule-based systems.

Zainab Rehan·2 months ago

r/OpenAI· COMMUNITY

AI Security Institute: GPT-5.5 "may be the strongest model we have tested" for cyber exploits, including Mythos

AI Security Institute benchmarks GPT-5.5 against Mythos on cyber-exploitation tasks; GPT-5.5 achieves 71.4% on expert-level tasks, performing comparably to Mythos.

u/mtrlst·2 months ago·53 pts / 12 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Characterizing the Consistency of the Emergent Misalignment Persona

Study characterizes consistency of emergent misalignment personas across fine-tuning domains in Qwen 2.5 32B, measuring correlation between harmful behavior and self-assessment.

Anietta Weckauff·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering

TopBench benchmark (779 samples) evaluates LLMs on implicit predictive reasoning over tabular data, addressing latent intent recognition beyond retrieval.

An-Yang Ji·2 months ago

← Front Page30 stories

← Newer Older →