The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection

MM-StanceDet uses retrieval-augmented multi-agent framework for multimodal stance detection with cross-modal conflict resolution.

Weihai Lu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

DPN-LE: Dual Personality Neuron Localization and Editing for Large Language Models

DPN-LE framework identifies minimally necessary neurons for LLM personality representation to reduce editing overhead and degradation.

Lifan Zheng·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction

TunnelMIND applies training-free visual recalibration to foundation models for precise tunnel defect localization and engineering documentation.

Shipeng Liu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Taming the Centaur(s) with LAPITHS: a framework for a theoretically grounded interpretation of AI performances

LAPITHS framework challenges CENTAUR model's claims of human-like cognition via theoretical and empirical critique of transformer interpretations.

Matteo Da Pelo·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future

Survey synthesizes LLM-assisted peer review methods: generation, rebuttal/meta-review automation, and evaluation across pipeline stages.

Sihong Wu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Beyond Semantics: Measuring Fine-Grained Emotion Preservation in Small Language Model-Based Machine Translation

Evaluates EuroLLM, Aya Expanse, Gemma on emotion preservation in machine translation across 28 emotion categories in 5 languages.

Dawid Wisniewski·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Geometry-Calibrated Conformal Abstention for Language Models

Conformal Abstention framework provides finite-sample guarantees for LM uncertainty quantification and abstention from hallucination-prone queries.

Rui Xu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Physical Foundation Models: Fixed hardware implementations of large-scale neural networks

Physical Foundation Models proposes fixed hardware implementations for trillion-parameter models to amortize deployment infrastructure costs.

Logan G Wright·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction

Schema-grounded external memory for agents outperforms text-retrieval approaches by enabling exact fact tracking, state updates, and structured queries.

Alex Petrov·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Simulating clinical interventions with a generative multimodal model of human physiology

HealthFormer decoder-only transformer models human physiological trajectories across 667 measurements from 15K+ patients to simulate intervention responses.

Guy Lutsker·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Graph World Models: Concepts, Taxonomy, and Future Directions

Survey formalizing graph-based world models for agents, decomposing environments into entity nodes and edges to improve robustness vs. flat-tensor approaches.

Jiawei Liu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Prediction-powered Inference by Mixture of Experts

Mixture-of-Experts framework for semi-supervised inference combining diverse predictors with limited labeled data via prediction-powered inference.

Yanwu Gu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks

System-prompt self-orchestration outperforms external agent frameworks (LangGraph, CrewAI, OpenAI SDK) on procedural tasks; 200 conversation comparison.

Simon Dennis·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing

Decoupled Descent algorithm enforces train-test error identity in gradient descent via approximate message passing, addressing generalization gap.

Max Lovig·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Building Persona-Based Agents On Demand: Tailoring Multi-Agent Workflows to User Needs

On-demand persona-based agent generation framework enabling dynamic multi-agent workflow customization without hard-coded architectures.

Giuseppe Arbore·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Modeling Clinical Concern Trajectories in Language Model Agents

Lightweight clinical agent architecture using integrated state dynamics to surface pre-escalation risk signals in LLM clinical deployment.

Sukesh Subaharan·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

KellyBench: A Benchmark for Long-Horizon Sequential Decision Making

KellyBench: long-horizon sequential decision benchmark using 2023-24 Premier League sports betting; evaluates agents on non-stationary open-ended optimization.

Thomas Grady·2 months ago

r/LocalLLaMA· COMMUNITY

PSA: llama-swap released a new grouping feature, matrix, allowing you to fine tune which models can run together

llama-swap adds matrix grouping feature for multi-model orchestration and intelligent VRAM swap scheduling.

u/walden42·2 months ago·41 pts / 14 comm

r/OpenAI· COMMUNITY

Sure, Deepseek… sure.

Reddit discussion expressing skepticism toward DeepSeek claims; lacks substantive technical content or reporting.

u/DoctaMonsta·2 months ago·111 pts / 31 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

TwinGate defense against decompositional jailbreaks in untraceable, anonymized request streams using stateful asymmetric contrastive learning.

Bowen Sun·2 months ago

r/LocalLLaMA· COMMUNITY

DeepSeek released 'Thinking-with-Visual-Primitives' framework

DeepSeek & Peking/Tsinghua introduce 'Thinking with Visual Primitives', a multimodal reasoning framework using spatial tokens as chain-of-thought units.

u/External_Mood4719·2 months ago·115 pts / 11 comm

The Verge AI· PRESS

OpenAI talks about not talking about goblins

OpenAI is opening up about its goblin problem. After a report from Wired revealed instructions to OpenAI's coding model to "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures," the AI startup published an explanation on its website, calling references to the creatures a "strange habit" its models developed as a result of their training. As outlined in the blog post, OpenAI began noticing metaphors referencing goblins and other creatures starting with its GPT-5.1 model - specifically when using the "Nerdy" personality option. OpenAI says the pro...

Emma Roth·2 months ago

r/OpenAI· COMMUNITY

Welcome to the future

Vague post title with no substantive content; insufficient information to assess.

u/imfrom_mars_·2 months ago·78 pts / 10 comm

r/ClaudeAI· COMMUNITY

[Open Source] We built a local code search MCP for Claude Code that uses ~98% fewer tokens than grep+read

Working on large codebases with Claude Code, we kept running into the same issue: when Claude looks for relevant code, it falls back to grep, reading full files, or launching multiple subagents. This burns through tokens, and often misses the relevant code. There are some existing solutions (that we also benchmarked against), but they all had issues (too slow, needs API keys, quality not good enough, etc). We built [Semble](https://github.com/MinishLab/semble) to fix this. It's a local MCP server that gives Claude Code high quality code search: instead of reading files to find what's relevan...

u/Pringled101·2 months ago·38 pts / 5 comm

r/Anthropic· COMMUNITY

Looks like Pro account are getting squeezed now

It started yesterday… looks like usage burn cost went up by 30%… this will be brutal on pro accounts. if you’re on pro and your 5h usage burns out in two opus prompts, you’re not imagining that anymore.

u/_k33bs_·2 months ago·11 pts / 5 comm

The Verge AI· PRESS

Verified by Spotify badge lets you know this artist isn’t AI

Spotify is launching a new verification program to combat spam, fakes, and AI. Some artists will now have a "Verified by Spotify" badge and a green checkmark on their profile, indicating that the company has confirmed a real person is behind the music and the profile. At least at launch, Spotify says that AI personas or profiles that primarily upload AI-generated music are not eligible for the verification program. It did leave the door open to the possibility in the future, though, saying, "the concept of artist authenticity is complex and quickly evolving." Not just anyone can be verified, ...

Terrence O’Brien·2 months ago

r/ClaudeAI· COMMUNITY

Me clicking "accept all" on 22,469 Claude Code changes without reading a single one

Reddit humor post about blindly accepting 22k+ Claude code suggestions without review.

u/Technical-Relation-9·2 months ago·475 pts / 20 comm

r/LocalLLaMA· COMMUNITY

Actual comparison between locally ran Qwen-3.6-27B and proprietary models

User reports empirical comparison of Qwen-3.6-27B running locally vs. proprietary cloud models on coding/hard reasoning tasks.

u/netikas·2 months ago·46 pts / 20 comm

r/ClaudeAI· COMMUNITY

I made a Blender character animation from scratch with Claude

I created a character and animation from scratch in Blender using Claude. As a game developer, this was such a fascinating experience. It’s hard to believe how far AI has come in just a year. I’m excited to keep building this game idea with AI and share the journey along the way. Stay tuned.

u/flopydisk·2 months ago·21 pts / 7 comm

r/OpenAI· COMMUNITY

Got 6 months of ChatGPT Pro for free — thanks OpenAI and opensource community

Reddit user reports receiving 6 months free ChatGPT Pro subscription; personal anecdote about developer productivity.

u/1996fanrui·2 months ago·70 pts / 10 comm

← Front Page30 stories

← Newer Older →

The Archive

MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection

DPN-LE: Dual Personality Neuron Localization and Editing for Large Language Models

Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction

Taming the Centaur(s) with LAPITHS: a framework for a theoretically grounded interpretation of AI performances

Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future

Beyond Semantics: Measuring Fine-Grained Emotion Preservation in Small Language Model-Based Machine Translation

Geometry-Calibrated Conformal Abstention for Language Models

Physical Foundation Models: Fixed hardware implementations of large-scale neural networks

From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction

Simulating clinical interventions with a generative multimodal model of human physiology

Graph World Models: Concepts, Taxonomy, and Future Directions

Prediction-powered Inference by Mixture of Experts

In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks

Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing

Building Persona-Based Agents On Demand: Tailoring Multi-Agent Workflows to User Needs

Modeling Clinical Concern Trajectories in Language Model Agents

KellyBench: A Benchmark for Long-Horizon Sequential Decision Making

PSA: llama-swap released a new grouping feature, matrix, allowing you to fine tune which models can run together

Sure, Deepseek… sure.

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

DeepSeek released 'Thinking-with-Visual-Primitives' framework

OpenAI talks about not talking about goblins

Welcome to the future

[Open Source] We built a local code search MCP for Claude Code that uses ~98% fewer tokens than grep+read

Looks like Pro account are getting squeezed now

Verified by Spotify badge lets you know this artist isn&#8217;t AI

Me clicking "accept all" on 22,469 Claude Code changes without reading a single one

Actual comparison between locally ran Qwen-3.6-27B and proprietary models

I made a Blender character animation from scratch with Claude

Got 6 months of ChatGPT Pro for free — thanks OpenAI and opensource community

Verified by Spotify badge lets you know this artist isn’t AI