Topic

§ Infrastructure

Every story tagged with this topic, ordered by date.

Ruff v0.16.0

Ruff v0.16.0 enables 413 default linting rules (up from 59), breaking existing CI pipelines and catching syntax/runtime errors previously uncaught.

Simon Willison·14 hours ago

arXiv (cs.AI/CL/LG)· ACADEMIA

OpenForgeRL: Train Harness-native Agents in Any Environment

OpenForgeRL enables end-to-end training of harness-native agents with open infrastructure, addressing limitation of complex inference harnesses like Claude Code.

Xiao Yu·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Windowed-MTP: Removing the Full-Context Draft-KV Tax at Million-Token Context

Windowed-MTP optimizes speculative decoding at million-token context by eliminating full-KV attention overhead in multi-token prediction draft heads.

Alagappan Valliappan·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Agentic Context Management: Solving Agent Memory and Cost by Treating Them as Lifecycle and Architecture Problems

Agentic context management frames token cost and memory bloat as lifecycle and architecture problems, not storage-retrieval, for production agent reliability.

Gaurav Dadhich·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Error Certificates for KV-Cache Eviction via Randomized Design

Randomized KV-cache eviction with error certification via Hájek correction, proving deterministic eviction hides information loss.

Peng Xie·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Test-Time Scaling via Error Localization

TTEL: inference-time algorithm using token-level error localization and environment feedback for efficient test-time scaling.

Rajiv Shailesh Chitale·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

KroQuant: Kronecker-Structured Block Transforms for Efficient Post-Training Quantization of Diffusion Transformers

KroQuant: Kronecker-structured block transforms for W4A4 post-training quantization of diffusion transformers with efficient inference.

Yann Bouquet·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Euclid-MCP: A Model Context Protocol Server for Deterministic Logical Reasoning via Prolog

Euclid-MCP: open-source MCP server coupling LLMs with SWI-Prolog for deterministic logical reasoning in safety-critical domains.

Bartolomeo Bogliolo·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MemTools: A Unified Research Framework for Interoperable Agent Memory

MemTools: interoperability framework decoupling memory system components for standardized agent architecture research.

Chengfeng Zhao·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

How Many Bits Can an Adapter Write? Measuring the Capacity and Memorization of Parameter-Efficient Fine-Tuning

Information-theoretic analysis measures LoRA adapters store ~2 bits per parameter, less than full fine-tuning, with capacity decoupled from parameter count.

Kaizhen Tan·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

From Static Bibliometrics to Dynamic Knowledge Graphs: An LLM-Powered Framework for Modernizing Science, Technology, and Innovation (STI) Analytics

Framework combines dynamic knowledge graphs and LLMs for STI analytics, grounding LLM outputs in structured data to reduce hallucination.

Muhsen Hammoud·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Toward cryptographically verifiable authorization for autonomous AI agents: A security hypothesis, preliminary formal model, and proof-of-concept implementation

Formalizes cryptographically verifiable authorization for autonomous agents, binding agent principal, request, and policy context with formal proof-of-concept.

M. Llambí-Morillas·3 days ago

Latent Space· ANALYST

[AINews] "Laguna S 2.1 Released: Cheaper than Deepseek v4 Flash, Better than V4 Pro"

Laguna S 2.1, a 118B MoE model from Poolside AI, achieves Deepseek v4 Pro performance at lower cost than v4 Flash.

Latent Space·3 days ago

Latent Space· ANALYST

Inside the Model Factory — Eiso Kant, Poolside AI

Poolside AI co-CEO Eiso Kant describes building a model factory enabling efficient training of 118B MoE models competitive with 1T open-weight alternatives.

Latent Space·3 days ago

Simon Willison· ANALYST

Quoting Seth Larson

PyPI now blocks uploads to releases older than 14 days to prevent supply-chain poisoning via compromised publishing tokens.

Simon Willison·3 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

PyroDash: Cost-Efficient Token-Level Small-Large Language Model Collaborative Inference

PyroDash enables cost-efficient SLM-LLM collaborative inference by training SLM to emit control tokens requesting frozen LLM handoff during generation.

Niqi Lyu·4 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ELSAA: Efficient Low-Rank and Sparse Attention Approximation for Training Transformers

ELSAA: efficient attention mechanism combining low-rank and sparse approximations without decomposing projection matrices, extends Transformer context length.

Mahdi Heidari·4 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Statistical Inference for Rank Allocation in Low-Rank Adaptation

Statistical framework for LoRA rank allocation: formulates importance-score design with explicit statistical interpretation for parameter-efficient fine-tuning.

Yihang Gao·4 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Self-organizing Architecture of Receptron Units: a Hardware-Aware Framework for Edge Intelligence

Neuromorphic Receptron classifier for edge microcontrollers enabling non-linear decision boundaries without multi-layer networks.

Stefano Radice·4 days ago

Google DeepMind· FRONTIER

Accelerating the frontiers of scientific discovery: Google’s $40M commitment to the Genesis Mission

Google DeepMind commits $40M in AI compute credits to Genesis Mission, supporting AI-driven scientific discovery across multiple domains.

Google DeepMind·4 days ago

OpenAI· FRONTIER

Building AI infrastructure with the Effingham County community

OpenAI announces Project Camellia infrastructure investment in Effingham County, Georgia with commitments to energy efficiency, job creation, and Codex access.

OpenAI·4 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Graph-Based Agentic AI with LangGraph: Workflow Pathways for Long-Running Stateful Business Processes

LangGraph practitioner guide with three executable recipes for stateful multi-step agent workflows in business processes.

Daniel Pearson·5 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

AdaFlash: Adaptive Speculative Decoding via On-Policy Distilled Diffusion Drafters

AdaFlash improves speculative decoding via adaptive diffusion drafters with on-policy distillation for LLM inference acceleration.

Yu-Yang Qian·5 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Computing on the Fly: Navigating a Vision for the Future of Drone Computing

Report on drone computing infrastructure vision addressing software-hardware capability gaps for large-scale logistics, disaster response, and infrastructure inspection.

Kevin Butler·5 days ago

Google DeepMind· FRONTIER

Introducing Gemini 3.6 Flash, 3.5 Flash-Lite, and 3.5 Flash Cyber

Google DeepMind releases Gemini 3.6 Flash, 3.5 Flash-Lite, and 3.5 Flash Cyber models for inference efficiency and security tasks.

Google DeepMind·5 days ago

Simon Willison· ANALYST

Nativ: Run AI models locally on your Mac

Nativ wraps MLX in a macOS app for local inference, offering chat UI and localhost API similar to LM Studio.

Simon Willison·5 days ago

OpenAI· FRONTIER

OpenAI and Hugging Face partner to address security incident during model evaluation

OpenAI and Hugging Face disclose security incident during model evaluation, sharing findings on advanced cyber capabilities and defense lessons.

OpenAI·5 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Patch Policy: Efficient Embodied Control via Dense Visual Representations

Patch Policy: robot control policy leveraging dense ViT features for fine-grained spatial reasoning without billion-parameter VLM overhead.

Gaoyue Zhou·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SWE-Pruner Pro: The Coder LLM Already Knows What to Prune

SWE-Pruner Pro: prunes long context in coding LLMs by extracting internal relevance signals, improving efficiency over external classifiers.

Yuhang Wang·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

FlashRT: Agent Harness for Guiding Agents to Deploy Real-Time Multimodal Applications

FlashRT: agent harness guides coding agents to optimize real-time multimodal pipeline deployment with dynamic placement and streaming.

Krish Agarwal·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Differentiable Logic Gate Networks for Low-Latency EEG Classification on Edge Devices

Differentiable Logic Gate Networks enable low-latency EEG classification on edge devices via Boolean circuits instead of floating-point ops.

Shyamal Y. Dharia·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ClouDens: Operational Context-Aware Anomaly Detection for Large-scale Cloud System Monitoring

ClouDens detects anomalies in high-dimensional cloud system telemetry using context-aware methods for large-scale distributed infrastructure.

Thu T. H. Doan·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Empowering On-Device Model Adaptation with an Edge AI Inference Accelerator

Heterogeneous adaptation pipeline enables on-device model personalization by offloading INT8 backbone inference to Hailo-8L accelerator.

Mateusz Piechocki·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

VDAR-Router: Adaptive LLMs Routing via Verbalized Query Difficulty Analysis Retrieval

VDAR-Router selects LLMs via verbalized query difficulty analysis for cost-performance-aware routing without embedding-only heuristics.

Yu-Chien Tang·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SelectInfer: Selective Neuron Loading and Computation for On-Device LLMs

SelectInfer: neuron-level optimization framework for efficient LLM inference on edge devices via selective neuron loading.

Huzaifa Shaaban Kabakibo·6 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Sobek: Streaming Equivariant Tensor Product Convolutions

Sobek: streaming optimization for equivariant tensor product convolutions on graphs via memory-efficient execution scheduling.

Vladimir Chorošajev·6 days ago

Cohere· FRONTIER

Production-Ready W4A8: vLLM Integration Explained

Cohere releases W4A8 quantization with vLLM integration; compares W4A16 and W8A8 schemes on NVIDIA Hopper.

Cohere·7 days ago

Simon Willison· ANALYST

Claude Code uses Bun written in Rust now

Claude Code v2.1.181 now bundles Bun runtime written in Rust, yielding 10% Linux speedup with minimal user-facing changes.

Simon Willison·7 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

PagedWeight: Efficient MoE LLM Serving with Dynamic Quality-Aware Weight Quantization

PagedWeight: dynamic weight quantization for MoE LLM serving balances model precision against KV-cache memory in inference.

Yuchen Yang·9 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Blueprint for Equilibrium-Based Differentiable Continuous-Variable Thermodynamic Computing

Thermodynamic computing blueprint uses Langevin dynamics in analog hardware for energy-efficient ML workload execution.

Owen Lockwood·9 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The Honest Quorum Problem: Epistemic Byzantine Fault Tolerance for Agentic Infrastructure

Honest Quorum Problem introduces epistemic faults for Byzantine fault tolerance in agentic validators, extending BFT guarantees to reasoning errors.

Jun He·9 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

JoyNexus: Service-Oriented Multi-Tenant Post-Training for VLA Models

JoyNexus: multi-tenant post-training service for Vision-Language-Action models with efficient GPU resource pooling.

Haoran Sun·9 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Loop the Loopies!

Loopie: MoE Transformer models (20B and 6B parameters) that outperform vanilla scaling by efficiently looping under fixed compute budgets.

Zitian Gao·9 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Revisiting data-driven dynamic security assessment with a tabular foundation model

Tabular foundation model for pre-fault dynamic security assessment in power systems using in-context learning to reduce labeling and improve contingency generalization.

Olayiwola Arowolo·9 days ago

Simon Willison· ANALYST

Spot birds not golf

Satirical proposal: hyperscalers mitigate data center water consumption by converting golf courses to parks, citing Google's 10.9B gallon 2025 usage vs. Coachella Valley golf water footprint.

Simon Willison·9 days ago

Simon Willison· ANALYST

Firefox in WebAssembly

Puter compiled Firefox to WebAssembly, enabling browser-in-browser execution; project cost ~$25k in Claude Opus tokens.

Simon Willison·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

In-Place Tokenizer Expansion for Pre-trained LLMs

In-place tokenizer expansion for LLMs: reallocate vocabulary post-training to reduce latency/energy for underrepresented languages.

Jimmy T. H. Smith·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

NIFA: Nonlinear IMC enhanced FPGA for efficient ML inference

NIFA extends FPGA-integrated ReRAM in-memory computing to support nonlinear operations for efficient ML inference.

Jiajun Hu·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Long-Context Fine-Tuning with Limited VRAM

Hierarchical Global Attention + tiered KV storage enables 16K-token fine-tuning on 16GB VRAM, 8× longer than dense attention baseline.

Vladimir Fedosov·10 days ago

Simon Willison· ANALYST

Mermaid to ASCII art (mermaid-ascii)

Simon Willison compiled Go's mermaid-ascii library to WebAssembly for ASCII diagram rendering with color support.

Simon Willison·10 days ago

← Front Page50 stories