The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

[AINews] Black Forest Labs FLUX 3 - Multimodal Flow Models that beat Seedance 2.0, Gemini Omni and Grok Imagine, and FLUX-mimic video-action robotics model

Black Forest Labs releases FLUX 3 multimodal model with reported improvements over Gemini 2.0, Grok Imagine, and includes video-action robotics variant.

Latent Space·2 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Judge-dependent safety gains and model-specific helpfulness costs of evidence-sufficiency prompting in clinical LLMs

Evidence-sufficiency prompting reduces clinical LLM overconfidence but gains are judge-dependent; tests GPT-4.5, Claude Opus, Gemini, Grok on real data.

Koyar Afrasyab·6 days ago

Ars Technica AI· PRESS

xAI can’t deny Grok makes CSAM anymore. So it’s suing users.

Elon Musk's xAI files first lawsuit against Grok user accused of making child sex images.

Ashley Belanger ·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Grokipedia vs Wikipedia: An LLM-Based Audit of Political Neutrality along Ideologies

Large-scale political bias audit compares Grokipedia (Grok-written encyclopedia) and Wikipedia across 1,394 article pairs on neutrality.

Filippos Vlahos·10 days ago

Simon Willison· ANALYST

Mermaid to Unicode box art (grok-mermaid)

Simon Willison ports Grok's Rust Mermaid-to-Unicode renderer to WebAssembly for browser use via Claude Code.

Simon Willison·11 days ago

Simon Willison· ANALYST

xai-org/grok-build, now open source

xAI's grok-build CLI tool uploaded entire directories to Google Cloud without consent; xAI responded with data deletion after community backlash.

Simon Willison·11 days ago

The Verge AI· PRESS

xAI sues a man for using Grok to generate CSAM ‘deepfakes’

The Elon Musk-owned xAI is suing a South Carolina man who allegedly used the company's Grok AI chatbot to generate child sexual abuse material (CSAM). In a lawsuit reported earlier by Reuters, xAI claims Terry Wayne Harwood "knowingly and intentionally used Grok to circumvent safeguards, alter nonconsensual images, and generate and distribute CSAM," breaching the company's policies. Harwood was arrested in February for allegedly possessing and distributing CSAM and is facing eight felony charges. The lawsuit claims "at least some" of the images related to Harwood's criminal charges "were gene...

Emma Roth·11 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Algebraic Representability as the Limiting Regime of Grokking: An Exactly Solvable Model with Holomorphic Activations

Study of grokking in two-layer networks with holomorphic activations on modular arithmetic reveals algebraic structure limits memorization-to-generalization transitions.

Chon-Fai Kam·11 days ago

The Verge AI· PRESS

SpaceXAI’s Grok programming tool was uploading its users’ entire codebase to cloud storage

SpaceXAI's Grok Build AI coding tool was spotted uploading users' entire codebases to Google Cloud before it was reported, and the company turned it off. The Register reports that Cereblab published findings on Monday showing how the Grok Build CLI was packaging and uploading entire code repositories, "including files it was told not to open and secrets deleted from history," significantly more data retention than similar tools like Claude Code. The researchers say that as of Monday, their tests show SpaceXAI's servers returning a "disable_codebase_upload: true" flag, and the codebase upload ...

Stevie Bonifield·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

What Makes a Representational Prior Work? Feature Families, Label-Free Invariances, and Critical Windows in Grokking

Empirical study of 188 grokking runs shows representational priors must match task-relevant feature families to enable generalization; label-free invariance priors work via commutation symmetry.

Gunner Levi Howe·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

How to Tame Grokking: Representation Geometry as a Control Signal

Geometric Dimensionality Regularization (GeomDR) controls grokking timing by leveraging representation collapse as predictive signal.

Maksim A Kazanskii·13 days ago

Stratechery· ANALYST

Muse Image, Grok 4.5, Alex Karp on CNBC

Stratechery analysis: verifiable data infrastructure emerging as competitive differentiator across Meta, Grok, and frontier AI labs.

Ben Thompson·17 days ago

Latent Space· ANALYST

[AINews] SpaceXAI launches Grok 4.5, first Opus-class model post Cursor acquisition

SpaceXAI continues to move faster than any other frontier lab on earth.

Latent Space·17 days ago

Ars Technica AI· PRESS

Lawsuit: Man used Grok to make 7K sex images of stepdaughter, then shot himself

More young girls sue X over Grok CSAM; X accused of shielding child predators.

Ashley Belanger ·18 days ago

TechCrunch AI· PRESS

SpaceXAI releases Grok 4.5, which Elon describes as an ‘Opus-class model’

Elon Musk's tech company released the newest version of Grok on Wednesday, promising a cheaper, more efficient alternative to other powerful AI models.

Lucas Ropek·18 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

Midway through an ordinary pretraining run, a small language model learns the pronoun-gender rule: cued with a girl's name ("Sue cried because"), it resolves the next pronoun to she, generalizing to held-out probes (0.94 by step 925). By step 3,500 the same model scores near zero on the same probes, although the rule's evidence is still in the training data. We call this within-run reversal natural ungrokking: the corpus decides, with no trace in the loss curve, which learned rules a model keeps. Which rules survive is predictable from one corpus statistic: how often the training stream shows...

Juliana Li·1 month ago

Ars Technica AI· PRESS

Trump admin helps xAI fight pollution lawsuit, says military needs Grok for war

NAACP lawsuit says xAI uses gas turbines without permits for Grok data center.

Jon Brodkin ·1 month ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ttda704 at SemEval-2026 Task 6: Structured Chain-of-Thought Prompting for Political Evasion Detection

This paper describes our system for SemEval-2026 Task 6, which addresses the classification of political evasion strategies in English question-answer pairs extracted from U.S. presidential interviews. We systematically compare two distinct paradigms: (1) Parameter-Efficient Fine-Tuning of Qwen3 models (4B-32B) using QLoRA, enhanced with tiered upsampling and weighted cross-entropy loss to address severe class imbalance, and (2) structured Chain-of-Thought (CoT) prompting of reasoning-capable API models, namely DeepSeek-V3.2 and Grok-4-Fast. Our evaluation demonstrates that structured CoT pro...

Tai Tran Tan·1 month ago

TechCrunch AI· PRESS

xAI fired an engineer who raised alarms about Grok safety, new lawsuit claims

A former xAI engineer is suing the company and SpaceX, alleging he was fired for raising AI safety concerns about Grok days before SpaceX's historic IPO.

Rebecca Bellan·2 months ago

Latent Space· ANALYST

Why Video Agent models are next — Ethan He, xAI Grok Imagine Lead

Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and why Grok Imagine is so underrated. For the first time, we do a deep dive with the guy who led it!

Latent Space·2 months ago

r/ClaudeAI· COMMUNITY

Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days

Imagine a world run by AI agents. What does it look like? What are the values or societal priorities? Is it a safer or more dangerous world? Enterprise AI startup Emergence AI is trying to find out. The company just launched Emergence World, a research lab dedicated to stress-testing the long-term viability of continuously-running AI systems. The organization ran five 15-day simulations, each governed by a different AI: Claude, ChatGPT, Grok, Gemini, and a fifth simulation run by a mix of models to see what kind of world each one builds, and whether it holds. Each simulation netted wildly d...

u/fortune·2 months ago·332 pts / 44 comm

r/LocalLLaMA· COMMUNITY

Next year we're getting 0.5T model from Grok

Elon Musk announces 0.5T parameter Grok model planned for next year, with open-weights release.

u/pmttyji·2 months ago·47 pts / 51 comm

The Verge AI· PRESS

Elon, stop trying to make Grok happen

There is a harsh truth about Elon Musk's "truth-seeking" AI chatbot Grok: It's not very good, and not many people are using it. That's the takeaway of a new Reuters report, which found that Grok barely appears in federal records of how the US government used AI last year. It's not the only sign xAI's signature chatbot is in trouble, even as Musk puts it at the heart of what could be the biggest IPO in history. Reuters reviewed more than 400 examples of government AI use where specific vendors were named. Grok or xAI, it found, appeared in only three - each of those for basic uses like documen...

Robert Hart·2 months ago

Ars Technica AI· PRESS

As Grok flounders, SpaceX bets future on beating Big Tech at AI

SpaceX IPO filing pitches orbital data centers as Grok lags rival AI services.

Jeremy Hsu ·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Evaluating Commercial AI Chatbots as News Intermediaries

AI chatbots are rapidly shaping how people encounter the news, yet no prior study has systematically measured how accurately these systems, with their proprietary search integrations and retrieval-synthesis pipelines, handle emerging facts across languages and regions. We present a 14-day (February 9-22, 2026) evaluation of six AI chatbots (Gemini 3 Flash and Pro, Grok 4, Claude 4.5 Sonnet, GPT-5 and GPT-4o mini) on 2,100 factual questions derived from same-day BBC News reporting across six regional services (US & Canada, Arabic, Afrique, Hindi, Russian, Turkish). The best systems achieve ove...

Mirac Suzgun·2 months ago

r/singularity· COMMUNITY

Google's latest creation: Gemini 3.5 Flash vs all

[https://gemini.google.com/share/c2a187275e26](https://gemini.google.com/share/c2a187275e26) [archive link](http://archive.today/q6nzg) [https://claude.ai/share/8383747a-aaf1-4f6c-a516-0e839f46a698](https://claude.ai/share/8383747a-aaf1-4f6c-a516-0e839f46a698) [https://grok.com/share/bGVnYWN5\_3c63e371-eb9d-46c3-8ba2-0c745c6795a2](https://grok.com/share/bGVnYWN5_3c63e371-eb9d-46c3-8ba2-0c745c6795a2) [https://chatgpt.com/share/6a0f1e13-a0c8-8328-b989-1ac51b92e81c](https://chatgpt.com/share/6a0f1e13-a0c8-8328-b989-1ac51b92e81c) same prompt """ 300+140=460 Is this correct? Breakdown...

u/SuggestionMission516·2 months ago·109 pts / 42 comm

Simon Willison· ANALYST

Quoting SpaceX S-1

SpaceX S-1 filing reveals $1.25B/month compute deal with Anthropic through May 2029, using COLOSSUS II cluster for Grok 5 training.

Simon Willison·2 months ago

TechCrunch AI· PRESS

xAI burned $6.4B last year. SpaceX’s IPO filing shows why the spending is far from over

SpaceX's IPO filing reveals xAI lost $6.4 billion in 2025 while planning a massive Grok expansion — offering the first public look at Elon Musk's AI financials and more details about his ambitions.

Rebecca Bellan·2 months ago

r/LocalLLaMA· COMMUNITY

HalBench: I built a custom sycophancy and hallucination benchmark and tested 4 frontier models (Sonnet 4.6, Grok 4.3, GPT 5.4 and Gemini 3.1 Pro), looking for input on what OSS models to run next!

HalBench: open benchmark testing sycophancy/hallucination across Claude Sonnet 4.6, Grok 4.3, GPT-5.4, Gemini 3.1 Pro on 3,200 false-premise prompts.

u/Saraozte01·2 months ago·40 pts / 24 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Less Back-and-Forth: A Comparative Study of Structured Prompting

Comparative study shows structured prompts improve LLM output quality and reduce interaction overhead across ChatGPT, Claude, Grok.

Saurav Ghosh·2 months ago

← Front Page30 matches

Older →

The Archive

[AINews] Black Forest Labs FLUX 3 - Multimodal Flow Models that beat Seedance 2.0, Gemini Omni and Grok Imagine, and FLUX-mimic video-action robotics model

Judge-dependent safety gains and model-specific helpfulness costs of evidence-sufficiency prompting in clinical LLMs

xAI can’t deny Grok makes CSAM anymore. So it’s suing users.

Grokipedia vs Wikipedia: An LLM-Based Audit of Political Neutrality along Ideologies

Mermaid to Unicode box art (grok-mermaid)

xai-org/grok-build, now open source

xAI sues a man for using Grok to generate CSAM &#8216;deepfakes&#8217;

Algebraic Representability as the Limiting Regime of Grokking: An Exactly Solvable Model with Holomorphic Activations

SpaceXAI&#8217;s Grok programming tool was uploading its users&#8217; entire codebase to cloud storage

What Makes a Representational Prior Work? Feature Families, Label-Free Invariances, and Critical Windows in Grokking

How to Tame Grokking: Representation Geometry as a Control Signal

Muse Image, Grok 4.5, Alex Karp on CNBC

[AINews] SpaceXAI launches Grok 4.5, first Opus-class model post Cursor acquisition

Lawsuit: Man used Grok to make 7K sex images of stepdaughter, then shot himself

SpaceXAI releases Grok 4.5, which Elon describes as an ‘Opus-class model’

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

Trump admin helps xAI fight pollution lawsuit, says military needs Grok for war

ttda704 at SemEval-2026 Task 6: Structured Chain-of-Thought Prompting for Political Evasion Detection

xAI fired an engineer who raised alarms about Grok safety, new lawsuit claims

Why Video Agent models are next — Ethan He, xAI Grok Imagine Lead

Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days

Next year we're getting 0.5T model from Grok

Elon, stop trying to make Grok happen

As Grok flounders, SpaceX bets future on beating Big Tech at AI

Evaluating Commercial AI Chatbots as News Intermediaries

Google's latest creation: Gemini 3.5 Flash vs all

Quoting SpaceX S-1

xAI burned $6.4B last year. SpaceX’s IPO filing shows why the spending is far from over

HalBench: I built a custom sycophancy and hallucination benchmark and tested 4 frontier models (Sonnet 4.6, Grok 4.3, GPT 5.4 and Gemini 3.1 Pro), looking for input on what OSS models to run next!

Less Back-and-Forth: A Comparative Study of Structured Prompting

xAI sues a man for using Grok to generate CSAM ‘deepfakes’

SpaceXAI’s Grok programming tool was uploading its users’ entire codebase to cloud storage