The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

Pentesting agents evaluated on real-world targets show current benchmarks miss complexity and strategic decision-making required in practice.

Pedro Conde·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MMVIAD: Multi-view Multi-task Video Understanding for Industrial Anomaly Detection

MMVIAD introduces first multi-view video dataset for industrial anomaly detection with continuous 2-second inspection clips.

Xiran Zhao·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

Framework enables visual-native multimodal search agents with on-policy data evolution and persistent visual evidence reuse.

Shijue Huang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SLIM: Sparse Latent Steering for Interpretable and Property-Directed LLM-Based Molecular Editing

SLIM uses sparse autoencoders to steer LLM hidden states for interpretable and controllable molecular property editing.

Mingxu Zhang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Predicting 3D structure by latent posterior sampling

Combines NeRF and diffusion models for probabilistic 3D scene reconstruction via latent posterior sampling.

Azmi Haider·1 month ago

r/ClaudeAI· COMMUNITY

i knew it

u/irelatetolevin·1 month ago·37 pts / 5 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

The First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoning

Long-context LLM performance degrades nonlinearly with misleading information proportion, critical for RAG and agentic systems.

Muhan Gao·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

NoRIN: Backbone-Adaptive Reversible Normalization for Time-Series Forecasting

NoRIN applies nonlinear Johnson transform to time-series normalization, extending RevIN to reshape heavy-tailed distributions.

Shun Zhang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Benchmarking Sensor-Fault Robustness in Forecasting

SensorFault-Bench stress-tests cyber-physical forecasting models under sensor noise, bias, and misalignment faults.

Alexander Windmann·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MaD Physics: Evaluating information seeking under constraints in physical environments

MaD Physics benchmarks agents on resource-constrained scientific discovery with real measurement trade-offs and planning.

Moksh Jain·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ALAM: Algebraically Consistent Latent Transitions for Vision-Language-Action Models

ALAM extracts action priors from unlabeled video via algebraically consistent latent codes to improve vision-language-action robot models with limited action data.

Zuojin Tang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

On periodic distributed representations using Fourier embeddings

Fourier embeddings represent periodic signals in high dimensions to improve angular encoding for ML models.

Jakeb Chouinard·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

CLEF: EEG Foundation Model for Learning Clinical Semantics

CLEF, a Transformer-based EEG foundation model using multitaper spectrograms, aligns clinical signals with neurologist reports via contrastive learning on 234-task benchmark.

Peng Cao·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Policy Gradient Methods for Non-Markovian Reinforcement Learning

Novel policy gradient framework jointly optimizes agent state dynamics and control policy for non-Markovian reinforcement learning without fixed state assumptions.

Avik Kar·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Probing Cross-modal Information Hubs in Audio-Visual LLMs

Mechanistic study probes cross-modal information flow and processing dynamics between audio and video in audio-visual LLMs.

Jihoo Jung·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

NanoResearch co-evolves agent skills, memory, and policy to enable personalized research automation for heterogeneous user needs.

Jinhang Xu·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Switching-Geometry Analysis of Deflated Q-Value Iteration

Joint spectral radius analysis provides convergence guarantees for deflated Q-value iteration in discounted Markov decision processes.

Donghwan Lee·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Likelihood scoring for continuations of mathematical text: a self-supervised benchmark with tests for shortcut vulnerabilities

Label-free benchmark evaluates equation-suffix prediction via next-token likelihood scoring to test for shortcut vulnerabilities in technical language models.

Daniel Ranard·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Mistake-Bounded Language Generation

Language generation task minimizes cumulative invalid outputs during learning via mistake-bounded generation framework.

Jon Kleinberg·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights

Empirical evaluation compares domain-adapted and general-purpose LLMs/SLMs for structured threat modeling in cybersecurity.

Saba Pourhanifeh·1 month ago

r/ClaudeAI· COMMUNITY

Anyone else feels the same way?

Freelance programmer reflects on how AI coding assistants have made previously difficult tasks feel easy, raising questions about developer skill assessment.

u/_irucsS·1 month ago·20 pts / 29 comm

r/singularity· COMMUNITY

AI is the manager at this Stockholm café

Stockholm café deploys AI system to manage operations; human-interest feature.

u/Worst_Artist·1 month ago·100 pts / 21 comm

The Verge AI· PRESS

Google stopped a zero-day hack that it says was developed with AI

For the first time, Google says it has spotted and stopped a zero-day exploit developed with AI. According to a report from Google Threat Intelligence Group (GTIG), "prominent cyber crime threat actors" were planning to use the vulnerability for a "mass exploitation event" that would have allowed them to bypass two-factor authentication on an unnamed "open-source, web-based system administration tool." Google's researchers found hints in the Python script used for the exploit that indicated help from AI, like a "hallucinated CVSS score" and "structured, textbook" formatting consistent with LL...

Stevie Bonifield·1 month ago

r/ClaudeAI· COMMUNITY

The Claude Platform on AWS is now generally available.

Anthropic's Claude Platform reaches general availability on AWS with managed agents, code execution, web search, batch processing, and same-day feature parity with native API.

u/ClaudeOfficial·1 month ago·66 pts / 5 comm

r/Anthropic· COMMUNITY

#Keep Sonnet 4.5 DO NOT REMOVE IT

Reddit user requests Anthropic maintain Claude Sonnet 4.5 for creative writing use.

u/Available_Heron4663·1 month ago·15 pts / 21 comm

Simon Willison· ANALYST

Learning on the Shop floor

Shopify's internal coding agent River enforces public Slack channels to enable collaborative code review and organizational learning at scale.

Simon Willison·1 month ago

r/singularity· COMMUNITY

A hurricane PSA built solo over a weekend. The studio gets destroyed by the storm being described. 100% AI

Solo creator built hurricane PSA entirely with AI over weekend; studio damaged during storm.

u/fanisp·1 month ago·100 pts / 24 comm

r/ClaudeAI· COMMUNITY

Anthropic, can we do the same with 4.5 Sonnet please?

Reddit user requests Anthropic release Claude 4.5 Sonnet, context unclear without full thread.

u/FluffyPolicePeanut·1 month ago·21 pts / 13 comm

r/LocalLLaMA· COMMUNITY

What's the current best small model?

Reddit discussion asking for recommendations on best 3B parameter open-weights model; no definitive answer or new information.

u/Conscious_Nobody9571·1 month ago·40 pts / 52 comm

r/MachineLearning· COMMUNITY

Interactive Jensen–Shannon Divergence Visualisation [P]

An interactive visualisation of Jensen–Shannon divergence - the symmetric, always-finite cousin of KL. Shape two distributions and watch JSD, its ceiling of one bit, and the per-point contribution respond in real time. https://robotchinwag.com/posts/jensen-shannon-divergence-visualisation/ Feedback welcome.

u/ancillia·1 month ago·40 pts / 5 comm

← Front Page30 stories

← Newer Older →