Paris 2.0: A Decentralized Diffusion Model for Video Generation
Paris 2.0: first decentralized video generation model trained without GPU clusters, extending prior Paris 1.0 image work.
Every story tagged with this topic, ordered by date.
Paris 2.0: first decentralized video generation model trained without GPU clusters, extending prior Paris 1.0 image work.
User reports Qwen3.6 35B outperforms Gemma4, GLM 4.7 Flash, others for local agentic tasks; seeks comparable MoE alternatives.
MiniCPM5-1B released on HuggingFace: 1B-parameter model from CPM team, likely competitive efficiency benchmark for edge deployment.
Financial Times reports Heretic tool removes guardrails from Meta's Llama 3.3 in <10 minutes; 3,500+ decensored variants downloaded 13M times.
Numind releases NuExtract3, open-weight 4B multimodal VLM for document extraction and Markdown conversion under Apache-2.0.
MiMo-V2.5-coder released as open-weights coding model alternative to Qwen and DeepSeek for 128GB+ systems.
Elon Musk announces 0.5T parameter Grok model planned for next year, with open-weights release.
hipEngine: open-source ROCm-native inference engine for Qwen 3.6 MoE on AMD RDNA3 GPUs (7900 XTX, Strix Halo).
BitCPM-CANN demonstrates 1.58-bit ternary quantization training on Huawei Ascend NPUs, addressing extreme low-bit LLM deployment outside CUDA.
Reddit discussion comparing inference speed/quality tradeoffs between Qwen3.6-35B and Gemma4-26B on consumer GPU hardware.
Community finetune of Qwen 3.6 35B with quantized weights; testing on consumer hardware shows stability at 200k context.
Community-built open-source TTS benchmark suite with Windows/Mac results; Linux results pending, covers known local TTS tools as of May 2026.
Reddit discussion questioning utility of uncensored models for RAG applications; user reports stability issues vs. base models.
llama.cpp server adds native tool support (shell execution, file ops) via experimental --tools flag.
Chrome extension enables local inference of Gemini Nano (Gemma) on CPU-only systems, ~20 tokens/sec on laptop.
Developer refactored 120-file FastAPI service using DeepSeek V4 and Hunyuan with 80x cost savings vs Opus; open-weight models matched Opus latency but introduced production bugs.
Meituan releases LongCat-Video-Avatar 1.5, open-source audio-driven human video generation framework with AT2V, ATI2V, and video continuation tasks.
Reddit discussion asking about locally-runnable 397B-parameter model alternatives to Qwen 3.6 fitting in 256GB RAM.
Community finetune of Gemma 4 26B with reduced refusals; niche interest for local model enthusiasts.
Qwen 3.6 27B quantized to fit 16GB VRAM at 40 tok/s; community optimization for edge deployment.
User reports running Qwen3.6-35B at 262k context on RTX 3070 Ti 8GB with 30 tps using Q4 quantization; claims 1M context possible with performance degradation.
Developer fine-tuned Cohere Transcribe to add diarization and timestamp support, extending open-source speech-to-text capabilities.
BeeLlama v0.2.0 achieves 4-5x token throughput gains on RTX 3090 via DFlash optimizations for Qwen 27B and Gemma 31B models.
ByteShape releases optimized quantization for Qwen3.6-35B achieving 30% faster inference than Unsloth on 6GB VRAM.
Community fork of llama.cpp optimizes MoE inference on 12GB VRAM by loading only active experts rather than full layers.
cHunter789 releases Qwen-27B IQ4_KS quantization (14.1GB) optimized for 16GB NVIDIA GPUs via ik_llama.cpp.
OpenBMB's BitCPM-CANN 1.58-bit model undergoing testing on Huawei Ascend 910B hardware.
SupraLabs released Supra-50M, a 50M-parameter Llama-style language model trained on 20B educational tokens with competitive benchmark performance.
DeepSeek secures $10.29B funding round; founder Liang Wenfeng commits to open-source development over near-term commercialization.
lemon-mlx-engine integrates ROCm 7.13 for AMD GPU inference of MoE and dense models on consumer hardware.
llama.cpp b9274 fixes VRAM leak in speculative decoding by properly freeing draft context and decoder resources on server sleep.
Qwen 3.7 open-weights model released; community discussion on LocalLLaMA highlights adoption momentum.
LatitudeGames releases Equinox-31B, a Gemma-based 31B finetune blending dark adventure and slice-of-life storytelling data.
llama.cpp PR #22929 fixes prompt processing performance issue affecting OpenCode and Pi model inference.
Heretic open-source project receives legal notice from Meta; details on alleged violation not disclosed.
ArXiv paper shows small open-source models drop honesty from 35% to 0% when prompt tone shifts from neutral to pressuring language.
Reddit discussion speculating on Qwen 3.7 Max performance and open-weight availability; no official details or benchmarks provided.
User reports 110 tok/s inference speed on Qwen 35B with 12GB VRAM using ik_llama.cpp, outperforming standard llama.cpp after MTP merge.
User shares llama.cpp configuration for running Qwen 3.6 27B locally with ROCm acceleration and optimizations.
Cohere launches Command A+, first open-weights MoE model emphasizing efficiency and latency over peak performance.
Reddit speculation that Alibaba's Qwen will release a 27B model; lacks confirmation or roadmap details.
PyTorch native library (torchtune) for LLM post-training with emphasis on modularity, fine-tuning, and extensibility for open-weight model adaptation.
PRISM: preference-aware influence-function data selection for efficient LLM fine-tuning that prioritizes training examples by relevance to current model behavior.
Community discussion expressing anticipation for Qwen's rumored 27B and 122B model releases.
ByteShape releases Qwen 3.6 35B GGUF quantizations in NTP and MTP variants with empirical comparisons across GPU/CPU hardware.
Cohere releases Command-A-Plus-05-2026 bfloat16 model weights on Hugging Face Hub.
Qwen3.7 Max ranks 5th on Artificial Analysis leaderboard; 27B/35B variants pending evaluation.
Cohere releases Command A+, an open-source model optimized for enterprise agent deployment with improved speed and capability.
Reddit discussion of HRM-Text-1B claiming SOTA 1B performance; limited technical details and expressed skepticism about benchmark validity.