Source · Community

r/LocalLLaMA

Reddit · COMMUNITY

Last updated May 28, 2026, 6:00 PM

LiquidAI/LFM2.5-8B-A1B · Hugging Face

looks like you can run it on any potato (A1B)! [https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF) from LiquidAI: LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning. * **On-device personal assistant**: Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices. * **Compressed performance**: Competitive with much larger dense and MoE models on instruction following and agen...

u/jacek2023·2 months ago·49 pts / 15 comm

r/LocalLLaMA

LiquidAI/LFM2.5-8B-A1B · Hugging Face

Reachy Mini goes fully local!

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

HF models page now has a "Base only" toggle to filter out finetunes/quants/etc

My new home office radiator 🥵

Qwen/Qwen-Image-Bench · Hugging Face

The frontier reasoning race is starting to look like a crowded subway station

Gemma-4-Harmonia-31B-Uncensored-Heretic Is Out Now, a Merge of Multiple gemma-4-31B-it Finetunes Designed for a Targeted Approach to Deep Neural Consolidation, Minimizing Regression While Amplifying Unique Capability Boundaries. With KLD 0.0047 and 9/100 Refusals!

Vulnerability found in framework used by VLLM, many MCP servers, and other LLM tools

CrankGPT by Squeez Labs - hand-cranked edge AI - talk about local AI!!!

I built a 103B-token Usenet corpus (1980–2013) — pre-web, human-only, zero AI contamination. Got strong traction on r/ML, thought this community would find it useful.

Qwen3.6 huge quality gain from Q4 to Q6 for coding agent

Behold! Probably the most ghetto local AI server:

260K-param LLM running on an emulated 90s CPU inside an 18-year-old RTOS

Qwen3.6 35B-A3B successfully completed the FoodTruck Bench!

SWE-rebench Leaderboard (March, April and May 2026): GPT-5.5, Opus 4.7, Cursor (Composer 2.5), Kimi K2.6 and More

KV cache quant benchmarks: q5 &amp; q6 are underrated, q8/q4 is bad, TCQ has a niche

Why are the AI Companies spreading F.U.D. about AI?

I ran 8 open-weight models as agents in a persistent MMO for 10 days. Here's the 93k event dataset and some things that I learned

Is Granite-4.1-30b Overshadowed by Qwen3.6 &amp; Gemma4 models?

AI is not for everyone

Info: Nvidia Cuda 13.3 landed

Looks like Miminax-M3 is just around the corner

New DeepSWE benchmark finds Claude Opus cheats

Folks running qwen 3.6 27b for agentic work. Do you dare to use q4_k_m?

Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them (Proof of Concept, Research, I don't want to sell anything)

$400 Qwen 3.6-27B Setup - Dual RTX 3060 - 30-50 t/s

A rare look inside Qwen 3.7’s open source model release approval process:

PrismML just released Binary and Ternary Bonsai Image 4B: 1-bit/ternary text-to-image diffusion transformers that can even run 100% locally in your browser on WebGPU.

Turning local agents into self-optimizing agents

OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face

Okay 27B made me a believer

Tencent Hy-MT2 is now under Apache License 2.0

China Clamps Down on Overseas Travel for AI Talent at Alibaba, DeepSeek

Not sure if this was posted. But I think it's highly relevant to us.

SkillOpt treats markdown skill files as trainable parameters with proper optimization machinery

Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs.

Qwen3.5 35B A3B uncensored heretic Native MTP Preserved is Out Now With the Full 785 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats

CXMT started selling ram to corsair

One letter to appease them all

Update on 12x32gb sxm v100 cluster / local AI for legal drafting

Using Local LLMs for Generating Custom Interactive Recursive Textbooks on the Fly

AI content detector based on Qwen 0.8b fine-tuned on Pangram dataset

CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp

Is Qwen3.6 current king for local agentic use?

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

MiniCPM5-1B

The Financial Times has published an article about Heretic

NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable)

Old Mac Pro still proving its worth

KV cache quant benchmarks: q5 & q6 are underrated, q8/q4 is bad, TCQ has a niche

Is Granite-4.1-30b Overshadowed by Qwen3.6 & Gemma4 models?