The Archive
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
TabPFN-3 just released: a pre-trained tabular foundation model for up to 1M rows [R][N]
TabPFN-3 was released today, the next iteration of the tabular foundation model, originally published in Nature. Quick recap for anyone new to TabPFN: TabPFN predicts on tabular data in a single forward pass - no training, no hyperparameter search, no tuning. Built on TabPFN-2.5 (Nov 2025) and TabPFNv2 (Nature, Jan 2025), which together crossed 3M downloads and 200+ published applications. What's new: * Scale: 1M rows on a single H100 (10x larger than 2.5).A reduced KV cache (\~8GB per million rows per estimator) and row-chunked inference make this practical on a single GPU * Speed: 10x-10...
Keep losing great answers in long Claude chats
Reddit user describes friction in Claude's UI for retrieving specific answers from long conversations; suggests workaround of manual copying.
oh lovely anthropic
Reddit user complaint about Claude API plan limits and rate-cap implementation, claims marketing misrepresented compute capacity gains.
Amazon employees are "tokenmaxxing" due to pressure to use AI tools
Workers are using an internal AI tool to automate non-essential tasks.
Gemma 4 MTP vs DFlash on 1x H100: dense vs MoE results
Benchmark comparing Gemma 4 multi-token prediction vs. DFlash speculative decoding on H100 using vLLM and SPEED-Bench dataset.
Dessn raises $6M for its production focused design tool
A new startup called Dessn has raised $6M to build AI-powered design tools that work directly with production codebases.
examples : add llama-eval by ggerganov · Pull Request #21152 · ggml-org/llama.cpp
llama.cpp adds llama-eval benchmarking tool supporting AIME, GSM8K, GPQA for local quantized model evaluation.
I built a Claude Code plugin that actually enforces your rules instead of hoping the model follows them
Been using Claude Code heavily and kept running into the same thing everyone here talks about: the model ignores your rules. You tell it to write tests first, it writes the implementation. You give it coding standards, it cherry-picks which ones to follow. And as your rulebook grows, you're burning more and more tokens stuffing everything into context when only a handful of rules are relevant to what you're working on. So I built Writ. Two pieces: A retrieval engine that picks only the relevant rules and skills for the current task. It runs a five stage pipeline over a Neo4j knowledge graph...
A Transfer Learning Evaluation of Deep Neural Networks for Image Classification
Transfer learning evaluation across 11 pre-trained models for image classification; standard practice study without novel methodology.
Random-Set Graph Neural Networks
Random-Set GNNs introduce uncertainty quantification for graph neural networks via epistemic and aleatoric uncertainty modeling.
On the Limitations of Large Language Models for Conceptual Database Modeling
Evaluates LLM ability to generate Entity-Relationship diagrams from natural language; identifies systematic limitations in database modeling tasks.
QDSB: Quantized Diffusion Schrödinger Bridges
QDSB combines quantization with Schrödinger bridges for generative modeling from unpaired samples; highly specialized technical approach.
High-lift Wing Separation Control via Bayesian Optimization and Deep Reinforcement Learning
Applies Bayesian optimization and DRL to aerodynamic control of high-lift wings; domain-specific application outside AI research scope.
On Predicting the Post-training Potential of Pre-trained LLMs
RuDE framework predicts post-training potential of LLMs before fine-tuning; addresses model selection gap using rubric-based evaluation.
Stop wasting electricity
RTX 4090 power optimization for llama.cpp: reduce consumption 40% via power limits without performance loss.
Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning
Reach-Avoid Probability Certificates (RAPCs) enforce probabilistic safety constraints in stochastic RL while minimizing cost.
Towards Order Fairness: Mitigating LLMs Order Sensitivity through Dual Group Advantage Optimization
Dual Group Advantage Optimization mitigates order bias in LLMs to improve RAG and in-context learning fairness.
AI voice startup Vapi hits $500M valuation after winning Amazon Ring over 40 rivals
Vapi says its enterprise business has grown 10-fold since early 2025 as companies shift customer support and sales calls to AI agents.
Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation
Humanoid robot as V2X complement for intersection collision avoidance; autonomous systems application, not frontier AI.
NOFE -- Neural Operator Function Embedding
NOFE enables continuous dimensionality reduction via function-to-function mappings using Graph Kernel Operators.
Assessment of cloud and associated radiation fields from a GAN stochastic cloud subcolumn generator
GAN-based stochastic cloud subcolumn generator for Earth System Models improves representation of subgrid cloud variability.
Can we acknowledge that Anthropic watches open sourcers and copies them?
Reddit discussion alleging Anthropic incorporates open-source features (MCPs, memory, goals) without attribution to original developers.
Enhancing Target-Guided Proactive Dialogue Systems via Conversational Scenario Modeling and Intent-Keyword Bridging
Target-guided dialogue system uses scenario modeling and intent-keyword bridging to steer conversations toward predefined topics.
‘It’s here’: Google issues dire warning after catching hackers using AI to break into computers
Google reports hackers using AI to automate computer intrusions, escalating security concerns for enterprise infrastructure.
Multimodal Abstractive Summarization of Instructional Videos with Vision-Language Models
ClipSum leverages frozen CLIP features with temporal modeling for multimodal abstractive summarization of instructional videos.
Assessing and Mitigating Miscalibration in LLM-Based Social Science Measurement
Study reveals LLM miscalibration in social science measurement tasks; confidence filtering can bias downstream empirical estimates.
Counterfactual Trace Auditing of LLM Agent Skills
Counterfactual Trace Auditing framework measures how agent skills change LLM behavior via structured Skill Influence Pattern annotations.
From Noise to Diversity: Random Embedding Injection in LLM Reasoning
Random soft prompts without training reach comparable reasoning accuracy to optimized prompts, suggesting injection itself aids LLM reasoning.