The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

r/LocalLLaMA· COMMUNITY

Qwen-3.6-27B, llamacpp, speculative decoding - appreciation post

Performance test: Qwen 3.6 27B with speculative decoding achieves 25.53 tokens/sec with 2x speedup on local hardware.

u/Then-Topic8766·2 months ago·222 pts / 74 comm

r/LocalLLaMA· COMMUNITY

When are we getting consumer inference chips?

User questions absence of consumer inference chips ($200 devices running Llama 3 locally) despite industry investment.

u/SnooStories2864·2 months ago·70 pts / 145 comm

r/LocalLLaMA· COMMUNITY

Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks

llama.cpp Vulkan and SYCL benchmarks comparing Nvidia RTX 3090 vs Intel Arc Pro B70 on prompt processing and token generation.

u/tovidagaming·2 months ago·60 pts / 39 comm

r/LocalLLaMA· COMMUNITY

Qwen 3.6 is actually useful for vibe-coding, and way cheaper than Claude

User demonstrates Qwen3.6-27B running locally via llama-server with 200k context on dual RTX 3090, achieving coding performance cheaper than Claude.

u/sdfgeoff·2 months ago·333 pts / 97 comm

r/LocalLLaMA· COMMUNITY

Llama.cpp's auto fit works much better than I expected

User demonstrates llama.cpp auto-fit enables 57 t/s on Qwen3.6 Q8 256k context despite weights exceeding 32GB VRAM.

u/a9udn9u·2 months ago·78 pts / 36 comm

r/LocalLLaMA· COMMUNITY

235M param LLM from scratch on a single RTX 5080

Plasma 1.0: 235M-param LLaMA-style model trained from scratch on single RTX 5080 GPU.

u/ExcellentTip9926·2 months ago·63 pts / 10 comm

r/LocalLLaMA· COMMUNITY

Open WebUI Desktop Released!

Open WebUI Desktop released with local llama.cpp support and remote server connectivity options.

u/My_Unbiased_Opinion·2 months ago·246 pts / 90 comm

r/LocalLLaMA· COMMUNITY

llama.cpp is the linux of llm

Commentary comparing llama.cpp infrastructure dominance to Linux in LLM ecosystem.

u/DevelopmentBorn3978·2 months ago·174 pts / 84 comm

r/LocalLLaMA· COMMUNITY

Why doesn't any OSS tool treat llama.cpp as a first class citizen?

Community discussion on why OSS AI tools prioritize Ollama over llama.cpp despite engineering parity.

u/rm-rf-rm·2 months ago·293 pts / 104 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Six Llamas: Comparative Religious Ethics Through LoRA-Adapted Language Models

Six Llama-3.1-8B variants fine-tuned on Christian, Islamic, Jewish, Hindu, Buddhist texts reveal systematic differences in ethical reasoning patterns.

Chad Coleman·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

No Universal Courtesy: A Cross-Linguistic, Multi-Model Study of Politeness Effects on LLMs Using the PLUM Corpus

Cross-linguistic study of politeness effects on 5 LLMs (Gemini-Pro, GPT-4o Mini, Claude 3 Sonnet, DeepSeek-Chat, Llama 3) via 22,500 English/Hindi/Spanish prompts.

Hitesh Mehta·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Optimizing Korean-Centric LLMs via Token Pruning

Benchmark compares token pruning compression across Qwen3, Gemma-3, Llama-3, Aya for Korean-centric NLP with English-Korean vocabulary optimization.

Hoyeol Kim·2 months ago

Hugging Face· INFRA

← Newer Older →

The Archive

Qwen-3.6-27B, llamacpp, speculative decoding - appreciation post

When are we getting consumer inference chips?

Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks

Qwen 3.6 is actually useful for vibe-coding, and way cheaper than Claude

Llama.cpp's auto fit works much better than I expected

235M param LLM from scratch on a single RTX 5080

Open WebUI Desktop Released!

llama.cpp is the linux of llm

Why doesn't any OSS tool treat llama.cpp as a first class citizen?

Six Llamas: Comparative Religious Ethics Through LoRA-Adapted Language Models

No Universal Courtesy: A Cross-Linguistic, Multi-Model Study of Politeness Effects on LLMs Using the PLUM Corpus

Optimizing Korean-Centric LLMs via Token Pruning

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

New in llama.cpp: Model Management

Measuring Open-Source Llama Nemotron Models on DeepResearch Bench

Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub

Welcoming Llama Guard 4 on Hugging Face Hub

Welcome Llama 4 Maverick & Scout on Hugging Face

“Llama 3.2 in Keras”

Llama can now see and run on your device - welcome Llama 3.2

Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

Welcome Llama 3 - Meta's new open LLM

Make your llama generation time fly with AWS Inferentia2

Comparing the Performance of LLMs: A Deep Dive into Roberta, Llama 2, and Mistral for Disaster Tweets Analysis with Lora

Non-engineers guide: Train a LLaMA 2 chatbot

Llama 2 on Amazon SageMaker a Benchmark

Fine-tuning Llama 2 70B using PyTorch FSDP

Code Llama: Llama 2 learns to code

Fine-tune Llama 2 with DPO