Research & Infrastructure
The infrastructure that makes frontier AI possible: Hugging Face, NVIDIA, BAIR, and the tool chains behind the models.
Delivering Lifecycle Control for AI Infrastructure at Scale with NVIDIA DGX Spark Enterprise Manageability
As AI infrastructure scales, enterprise expectations for operational maturity are increasing. Organizations expect these systems to be provisionable,... As AI infrastructure scales, enterprise expectations for operational maturity are increasing. Organizations expect these systems to be provisionable, observable, secure, and manageable at scale—the same standard applied to all critical infrastructure. The moment an AI system moves from development into enterprise deployment, that operational foundation is essential. NVIDIA DGX Spark and… Source
Model Quantization: Turn FP8 Checkpoints into High-Performance Inference Engines with NVIDIA TensorRT
Converting a quantized checkpoint into an NVIDIA TensorRT engine bridges the gap between model optimization and production deployment, enabling faster... Converting a quantized checkpoint into an NVIDIA TensorRT engine bridges the gap between model optimization and production deployment, enabling faster inference, higher throughput, and more efficient GPU utilization at scale. In a previous post, we produced a high-quality FP8-quantized Contrastive Language-Image Pretraining (CLIP) checkpoint with NVIDIA TensorRT Model Optimizer. Source
Accelerating Federated Learning Research with AI Agents and NVIDIA FLARE Auto-FL
Federated learning (FL) research often begins with a deceptively simple question: What should we try next? A new aggregation rule, a FedProx coefficient, a... Federated learning (FL) research often begins with a deceptively simple question: What should we try next? A new aggregation rule, a FedProx coefficient, a server optimizer setting, a SCAFFOLD variant, or a model architecture tweak may all look promising before an experiment starts. After the run finishes, the harder questions begin: Did the change actually improve the metric? Source
Evaluate Clinical ASR Models Faster with Agent Skills and NVIDIA Nemotron Speech
Training a speech AI model to correctly recognize or synthesize clinical terminology is surprisingly difficult. Drug names like Acetaminophen, Amlodipine,... Training a speech AI model to correctly recognize or synthesize clinical terminology is surprisingly difficult. Drug names like Acetaminophen, Amlodipine, Cefazolin, and Biktarvy are not part of everyday vocabulary. Procedure names, anatomy terms, and specialty-specific diagnoses introduce the same problem in a different form. Off-the-shelf speech systems can sound fluent and still miss the words… Source
Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell
Pre-training frontier LLMs comes down to throughput. When training spans trillions of tokens across thousands of accelerators, every percentage point of step... Pre-training frontier LLMs comes down to throughput. When training spans trillions of tokens across thousands of accelerators, every percentage point of step time can add up to days of training and substantial compute costs. Numerical precision is one of the highest-leverage knobs available, but low- bit mixed-precision pretraining is hard to get right. To address this… Source
NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents
Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete... Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete complex workflows. However, these multi-agent workflows cause token counts to grow quickly. Agents plan, call tools, invoke sub-agents, receive information, and then pass history, outputs, and reasoning steps back into the model… Source
Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA
AI agents are changing how you interact with your PC. Creators, developers, and AI enthusiasts are already using these agents extensively to assist with... AI agents are changing how you interact with your PC. Creators, developers, and AI enthusiasts are already using these agents extensively to assist with day-to-day tasks such as coding, video editing, and content management. NVIDIA and Microsoft are teaming up to enable the next generation of developers to build on-device agents on the Windows platform, with easier setup, native security… Source
Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw
AI agents are a powerful tool for synthesizing data to accelerate research, summarize information, and help teams make decisions faster. But combining internal... AI agents are a powerful tool for synthesizing data to accelerate research, summarize information, and help teams make decisions faster. But combining internal data with public sources poses security challenges. This post shares an open source example using Hermes Agent with NVIDIA NemoClaw for product research across Outlook, Slack, and GitHub. NVIDIA OpenShell enforces a security-approved… Source
Deploy Agentic-Ready AI at the Edge with Memory Efficiency in NVIDIA JetPack 7.2
As AI agents move from the digital world to the physical environment, they can readily use NVIDIA Jetson to accelerate real-world deployment with optimized... As AI agents move from the digital world to the physical environment, they can readily use NVIDIA Jetson to accelerate real-world deployment with optimized memory and performance. NVIDIA JetPack 7.2 directly supports one-command deployment of NVIDIA NemoClaw, an open source stack that adds privacy and security controls to OpenClaw. It introduces NVIDIA agent skills for Jetson—Jetson device… Source
Run Local AI Agents with Faster Models and Multi-Node Clustering on NVIDIA DGX Spark
The rise of autonomous, long-running AI agents has introduced a new class of compute demand, namely tasks that maintain large context windows, spawn concurrent... The rise of autonomous, long-running AI agents has introduced a new class of compute demand, namely tasks that maintain large context windows, spawn concurrent subagents, and iterate continuously without cloud dependency. Security and privacy concerns are also accelerating the shift toward local agents. Developers, by running autonomous agents on hardware they own with NVIDIA NemoClaw… Source
How to Post-Train Autonomous Vehicle Models in Closed-Loop with NVIDIA Alpamayo
Developing autonomous vehicle (AV) policies requires bridging an important gap between training and deployment. Vision-language-action (VLA) models that can... Developing autonomous vehicle (AV) policies requires bridging an important gap between training and deployment. Vision-language-action (VLA) models that can reason over more complex driving scenes and produce richer intermediate reasoning are predominantly trained in open-loop, where model outputs are directly compared to ground-truth behaviors without considering their effect on the environment. Source
Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3
Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what's... Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what’s happening in their world, predict what’s likely to happen next, and generate actions for specific environments, embodiments, and tasks. NVIDIA Cosmos 3 is a frontier foundation model for physical AI that combines physical reasoning… Source
Advancing AI Infrastructure for Agentic AI with NVIDIA DOCA In-Silicon Security
The AI era is driving a new class of infrastructure: AI factories that transform data into intelligence for autonomous AI agents operating at unprecedented... The AI era is driving a new class of infrastructure: AI factories that transform data into intelligence for autonomous AI agents operating at unprecedented scale. Powered by accelerated computing, AI factories enable enterprises to train, fine-tune, and deploy AI with greater speed and efficiency. This new class of infrastructure also introduces a fundamentally new attack surface spanning… Source
NVIDIA Vera CPU Sets a New Standard for Agentic Workloads in AI Factories
Each wave of AI has created a new scaling law. Pretraining scaled intelligence through larger datasets, more parameters, and massively parallel GPU systems.... Each wave of AI has created a new scaling law. Pretraining scaled intelligence through larger datasets, more parameters, and massively parallel GPU systems. Post-training scaled usefulness through instruction tuning, and re-balancing GPUs for generative inference. Test-time scaling improved reasoning by giving models more generated tokens for thinking. Now, agentic AI and reinforcement… Source
NVIDIA DSX OS Delivers Open, Modular Software for Operating AI Factories at Scale
AI is now essential infrastructure, powered by AI factories that generate intelligence in the form of tokens. As demand grows, these factories must scale... AI is now essential infrastructure, powered by AI factories that generate intelligence in the form of tokens. As demand grows, these factories must scale faster, operate more efficiently, and lower the cost of intelligence across the five-layer stack: energy, chips, infrastructure, models, and applications. NVIDIA DSX platform provides the complete playbook for designing, simulating, building… Source
DynoSim: Simulating the Pareto Frontier
Modern LLM serving is hard to tune because each deployment is a stack of interacting choices: model backend, tensor-parallel shape, prefill/decode split, worker... Modern LLM serving is hard to tune because each deployment is a stack of interacting choices: model backend, tensor-parallel shape, prefill/decode split, worker counts, scheduler settings, routing policy, KV cache behavior, autoscaling thresholds, and topology. Those choices interact across layers, and a local improvement can shift the bottleneck somewhere else. For larger models… Source
How to Automate AI Model Documentation with the NVIDIA MCG Toolkit
As AI models grow in complexity and regulatory scrutiny intensifies under frameworks including California’s AB-2013 and the EU AI Act, software teams... As AI models grow in complexity and regulatory scrutiny intensifies under frameworks including California’s AB-2013 and the EU AI Act, software teams face a challenge beyond delivering great code: They need to produce comprehensive, auditable model documentation before the models are released. Model cards describe how a model works, its intended use and license, training data, performance… Source
Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI
AI applications are moving beyond text generation to multimodal systems that can perceive, search, and reason across images, documents, video, and... AI applications are moving beyond text generation to multimodal systems that can perceive, search, and reason across images, documents, video, and language in real time—turning fragmented information into actionable insights. Step 3.7 Flash, the latest from StepFun, brings these capabilities to production and enterprise-scale, available on NVIDIA-accelerated infrastructure. It is a 198B… Source
NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes
The cold-start problem In production inference deployments, demand fluctuates over time, requiring inference replicas to scale elastically. However,... In production inference deployments, demand fluctuates over time, requiring inference replicas to scale elastically. However, cold-starting inference workloads on Kubernetes can take several minutes. During that time, GPUs are allocated but idle, generating no tokens and serving no requests. This delay increases the risk of service level agreement (SLA) violations during traffic spikes… Source
NVIDIA Blackwell Sets STAC-AI Record for LLM Inference in Finance
Large language models (LLMs) are revolutionizing the financial trading landscape by enabling sophisticated analysis of vast amounts of unstructured data to... Large language models (LLMs) are revolutionizing the financial trading landscape by enabling sophisticated analysis of vast amounts of unstructured data to generate actionable trading insights. These advanced AI systems can process financial news, social media sentiment, earnings reports, and market data to predict stock price movements and automate investment strategies with unprecedented… Source
What’s New for Game Developers in NVIDIA RTX: DLSS 4.5 for UE5 and Multilingual AI Characters
NVIDIA RTX provides game developers with direct paths to AI-driven characters, frame generation, and ray-traced rendering. This post walks through a meaningful... NVIDIA RTX provides game developers with direct paths to AI-driven characters, frame generation, and ray-traced rendering. This post walks through a meaningful set of recent updates across the RTX ecosystem. NVIDIA ACE expands its multilingual AI character capabilities, making it easier to ship conversational NPCs. NVIDIA DLSS 4.5 arrives as an Unreal Engine (UE) plugin… Source