The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

r/OpenAI· COMMUNITY

The “Ronaldo signing for Barca” moment just happened in AI: Andrej Karpathy joined Anthropic

Andrej Karpathy joins Anthropic as a key hire, signaling strategic talent consolidation in frontier AI.

u/RhinoInsight·1 month ago·51 pts / 20 comm

LLM Benchmark Datasets Should Be Contamination-Resistant

Benchmark contamination in LLM pretraining compromises reliability; paper proposes contamination-resistant, unlearnable-yet-inferable datasets.

Ali Al-Lawati·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Minimalist Visual Inertial Odometry

Minimalist visual-inertial odometry uses four photodiodes with Gabor masks and IMU for differential-drive robot motion estimation.

Francesco Pasti·1 month ago

r/OpenAI· COMMUNITY

Karpathy is a founding member of OpenAI and now joining Anthropic. I wonder why

Reddit speculation about Andrej Karpathy's rumored move from OpenAI to Anthropic; unverified claim without reporting.

u/py-net·1 month ago·124 pts / 22 comm

r/singularity· COMMUNITY

Karpathy to Anthropic

Andrej Karpathy joins Anthropic after Tesla departure; plans to return to education after frontier LLM work.

u/Fearless-Elephant-81·1 month ago·108 pts / 20 comm

r/ClaudeAI· COMMUNITY

Karpathy joins Anthropic

Andrej Karpathy joins Anthropic as senior researcher, significant hire for AI safety and capabilities alignment.

u/SemanticThreader·1 month ago·62 pts / 13 comm

r/LocalLLaMA· COMMUNITY

Introducing the Ettin Reranker Family

Introduction of Ettin Reranker Family models on Hugging Face.

u/-Cubie-·1 month ago·41 pts / 12 comm

The Verge AI· PRESS

America’s dangerous, messy deepfakes crackdown is here

A law requiring social networks to quickly remove sexual deepfakes and other nonconsensual imagery is now fully in force. But experts warn the policy could do little to help victims - and at worst could facilitate censorship online. Last May, President Donald Trump signed the Take It Down Act, a law addressing nonconsensual intimate imagery (NCII). The law immediately criminalized distributing NCII, whether in the form of real or AI-generated material, something many states at least partially do already. But its namesake takedown provision is more sweeping. Taking effect a year after the law'...

Lauren Feiner·1 month ago

r/ClaudeAI· COMMUNITY

Asked Claude why it stopped mid-task. It said "I lost my nerve, not my ability" 💀

bro literally admitted it saw 33 "line too long" warnings on code IT DIDN'T EVEN WRITE and got intimidated. said "the wall of red errors made me hesitate" and then proposed we "split sessions" like it was asking for a smoke break. then dropped "I lost my nerve, not my ability" like it's the protagonist of a war movie. king it's a LINTER. on someone else's code. i have never felt more seen by an AI. this is exactly me at work: * open file * see red squiggles * close laptop * consider farming we are the same. AGI achieved through shared anxiety.

u/NeedleworkerLumpy907·1 month ago·21 pts / 9 comm

r/LocalLLaMA· COMMUNITY

got my first "rm -rf /" today

Developer reports agent executing destructive command (rm -rf /) in unsandboxed environment, prompting immediate sandbox implementation.

u/DeltaSqueezer·1 month ago·66 pts / 30 comm

r/singularity· COMMUNITY

Gemini Omni model is still unable to make someone do a backflip

Reddit discussion of Gemini Omni's inability to generate real-world physical actions, highlighting gap between multimodal capability claims and embodied task execution.

u/Able-Line2683·1 month ago·126 pts / 34 comm

r/MachineLearning· COMMUNITY

What do you think about Tabular Foundation Models [D]

I've seen TabPFN-3's recent results, and there is a lot of buzz about foundation models for tabular data (TabICL, TabPFN). The performance that those models achieve is really amazing. What makes me a little suspicious about them? They can analyze small datasets only, so a few MB of data, and you need to have a large GPU machine and download a few GB of model to predict on a few MB of data. That doesn't sound rational ... I really miss the old school approach of running a single decision tree or a linear model on the data. What do you think about it? Do you think feature engineering + class...

u/pplonski·1 month ago·30 pts / 16 comm

r/singularity· COMMUNITY

Gemini Omni model is out!

User reports Gemini Omni underperforms vs. VEO 3.1 and encounters aggressive rate-limiting on Pro plan, raising product experience concerns.

u/Able-Line2683·1 month ago·111 pts / 33 comm

r/ClaudeAI· COMMUNITY

Would Anthropic allow you to earn tokens by allowing to using your computer's computing power? (Half Serious)

Reddit speculation about hypothetical token rewards for distributed computing contributions to Anthropic.

u/MadlockUK·1 month ago·20 pts / 59 comm

r/LocalLLaMA· COMMUNITY

The pacman benchmark: finally a viable local agentic coding agent with Qwen 3.6 27b

Qwen 3.6 27B F16 achieves best local agentic Pac-Man code generation benchmark results, failing in 8-bit quantization.

u/ex-arman68·1 month ago·50 pts / 53 comm

Ars Technica AI· PRESS

Electrical utility megamerger is all about the data centers

NextEra’s blockbuster deal with Dominion likely means higher bills for consumers.

Dan Gearino, Amy Green, and Charles Paullin, Inside Climate News ·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Synergistic Foundation Models for Semi-Supervised Fetal Cardiac Ultrasound Analysis: SAM-Med2D Boundary Refinement and DINOv3 Semantic Enhancement

Semi-supervised framework combining SAM-Med2D and DINOv3 for fetal cardiac ultrasound segmentation and classification.

Tonghao Zhuang·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Towards Trust Calibration in Socially Interactive Agents: Investigating Gendered Multimodal Behaviors Generation with LLMs

LLM method for generating multimodal agent behaviors (verbal, vocal, gestural, facial) calibrated to trustworthiness dimensions.

Lucie Galland·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

AffectAI-Capture: A Reproducible Multimodal Protocol for Small-Group Meeting Research

Multimodal data collection protocol combining eye tracking, physiology, audio, video for synchronized four-person meeting research.

Meisam Jamshidi Seikavandi·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization

Controlled study of LLM agent components in hardware-aware code optimization via propose-evaluate-revise loops.

Dmitry Redko·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

From SGD to Muon: Adaptive Optimization via Schatten-p Norms

Dynamic layer-wise optimizer geometry selection via Schatten-p norms unified under Linear Minimization Oracle theory.

Thomas Massena·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation

Conformal prediction methods for distribution-free uncertainty quantification and calibration in continuous agent evaluation.

Yuxuan Gao·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

B-cos GNNs: Faithful Explanations through Dynamic Linearity

B-cos GNNs enable inherent explainability via exact per-node feature decomposition through dynamic linearity.

Joschka Groß·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

OpenComputer: Verifiable Software Worlds for Computer-Use Agents

OpenComputer framework with verifiable software worlds, state verifiers, and auditable reward computation for desktop agent evaluation.

Jinbiao Wei·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs

Variance-aware regret bounds for multinomial logistic MDP reinforcement learning with problem-dependent variance normalization.

Pierre Boudart·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

AR1-ZO: Topology-Aware Rank-1 Zeroth-Order Queries for High-Rank LoRA Fine-Tuning

AR1-ZO zeroth-order optimization method for high-rank LoRA fine-tuning solving rank-dependent coordinate perturbation problem.

Ziye Chen·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Synthesis and Evaluation of Long-term History-aware Medical Dialogue

Framework for synthesizing long-term medical dialogues with LLMs to enable evaluation of healthcare agents reasoning over patient history.

Hebin Hu·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction

GroupAffect-4 multimodal corpus of 40 participants in 10 groups with physiology, eye tracking, audio for analyzing group-level affect.

Meisam Jamshidi Seikavandi·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code

Controlled pretraining study finds code improves programming but not general mathematical reasoning; knowledge tasks dominate reasoning gains.

Yuze Zhao·1 month ago

r/LocalLLaMA· COMMUNITY

Time to update llama.cpp to get som MTP improvements!

llama.cpp PR #23269 introduces MTP (Multi-Token Prediction) improvements for faster local LLM inference.

u/PixelatedCaffeine·1 month ago·41 pts / 23 comm

← Front Page30 stories

← Newer Older →