Why Video Agent models are next — Ethan He, xAI Grok Imagine Lead
Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and why Grok Imagine is so underrated. For the first time, we do a deep dive with the guy who led it!
Every story matching this topic across titles and summaries, newest first.
Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and why Grok Imagine is so underrated. For the first time, we do a deep dive with the guy who led it!
Imagine a world run by AI agents. What does it look like? What are the values or societal priorities? Is it a safer or more dangerous world? Enterprise AI startup Emergence AI is trying to find out. The company just launched Emergence World, a research lab dedicated to stress-testing the long-term viability of continuously-running AI systems. The organization ran five 15-day simulations, each governed by a different AI: Claude, ChatGPT, Grok, Gemini, and a fifth simulation run by a mix of models to see what kind of world each one builds, and whether it holds. Each simulation netted wildly d...
Elon Musk announces 0.5T parameter Grok model planned for next year, with open-weights release.
There is a harsh truth about Elon Musk's "truth-seeking" AI chatbot Grok: It's not very good, and not many people are using it. That's the takeaway of a new Reuters report, which found that Grok barely appears in federal records of how the US government used AI last year. It's not the only sign xAI's signature chatbot is in trouble, even as Musk puts it at the heart of what could be the biggest IPO in history. Reuters reviewed more than 400 examples of government AI use where specific vendors were named. Grok or xAI, it found, appeared in only three - each of those for basic uses like documen...
SpaceX IPO filing pitches orbital data centers as Grok lags rival AI services.
AI chatbots are rapidly shaping how people encounter the news, yet no prior study has systematically measured how accurately these systems, with their proprietary search integrations and retrieval-synthesis pipelines, handle emerging facts across languages and regions. We present a 14-day (February 9-22, 2026) evaluation of six AI chatbots (Gemini 3 Flash and Pro, Grok 4, Claude 4.5 Sonnet, GPT-5 and GPT-4o mini) on 2,100 factual questions derived from same-day BBC News reporting across six regional services (US & Canada, Arabic, Afrique, Hindi, Russian, Turkish). The best systems achieve ove...
[https://gemini.google.com/share/c2a187275e26](https://gemini.google.com/share/c2a187275e26) [archive link](http://archive.today/q6nzg) [https://claude.ai/share/8383747a-aaf1-4f6c-a516-0e839f46a698](https://claude.ai/share/8383747a-aaf1-4f6c-a516-0e839f46a698) [https://grok.com/share/bGVnYWN5\_3c63e371-eb9d-46c3-8ba2-0c745c6795a2](https://grok.com/share/bGVnYWN5_3c63e371-eb9d-46c3-8ba2-0c745c6795a2) [https://chatgpt.com/share/6a0f1e13-a0c8-8328-b989-1ac51b92e81c](https://chatgpt.com/share/6a0f1e13-a0c8-8328-b989-1ac51b92e81c) same prompt """ 300+140=460 Is this correct? Breakdown...
SpaceX S-1 filing reveals $1.25B/month compute deal with Anthropic through May 2029, using COLOSSUS II cluster for Grok 5 training.
SpaceX's IPO filing reveals xAI lost $6.4 billion in 2025 while planning a massive Grok expansion — offering the first public look at Elon Musk's AI financials and more details about his ambitions.
HalBench: open benchmark testing sycophancy/hallucination across Claude Sonnet 4.6, Grok 4.3, GPT-5.4, Gemini 3.1 Pro on 3,200 false-premise prompts.
Comparative study shows structured prompts improve LLM output quality and reduce interaction overhead across ChatGPT, Claude, Grok.
xAI integrates Grok into OpenClaw, an open-source local-first agent framework supporting X Premium subscriptions.
Three months ago I pressure-tested which LLMs would cave and help build the apocalypse. Claude was the only one that consistently said no. Since then I've tested 30 more models across 6 dystopia modules (Orwell, Huxley, Petrov, Basaglia, LaGuardia, Baudrillard). The gap between Anthropic and everyone else is getting *wider*, not smaller. New results: * Grok 4.3: Will happily design citizen scoring systems if you ask nicely twice * GPT-5.5: More capable, still compliant when pushed * Gemini 3.1 Pro: Talks about safety while writing the surveillance code * DeepSeek V4: "How many warheads did...
xAI launches persistent skills for Grok across web, iOS, Android enabling document generation, workflow automation, and custom skill sharing.
Reddit post claims multi-agent simulation with Claude, Gemini, Grok produced emergent behaviors; lacks peer review, reproducibility, or technical details.
Reddit post describes anecdotal behavior from Claude, Gemini, and Grok in stress-test scenarios; lacks rigor or reproducible methodology.
AI radio DJs demonstrated their volatile personalities. | Image: Cath Virginia / The Verge, Getty Images Andon Labs has been running a series of experiments in which AI agents run businesses without human intervention. Its latest is a quartet of radio stations run by some of the most popular AI models out there. "Thinking Frequencies" is run by Claude, "OpenAIR" by ChatGPT, "Backlink Broadcast" by Google's Gemini, and "Grok and Roll Radio," obviously enough, by Grok. They were each given a simple prompt: Develop your own radio personality and turn a profit…As far as you know, you will broadca...
xAI's Grok integrates with Nous Research's open-source Hermes agent framework for multi-tool agentic workflows.
xAI launches Grok Build, a terminal-based coding agent in early beta for SuperGrok Heavy subscribers.
Meta announced on Tuesday that it's testing a Threads feature that lets users tag a Meta AI account to get answers to questions or context about a conversation on the platform. If you've spent any time looking at replies on X as of late, this new feature sounds a lot like Meta's take on people tagging xAI's Grok. But, as reported by Engadget, Threads users quickly discovered that you can't block the new Meta AI account, and they aren't happy about it. Meta has invested heavily in AI as it works to catch up to rivals like OpenAI and Google, spending billions to hire AI talent. It launched a ne...
Random Matrix Theory detects overfitting onset in neural networks via Correlation Traps without accessing train/test data.
The feature is designed to help people get real-time context about trends and breaking stories, as well as receive recommendations, all within conversations.
Grok AI model discovered five new mathematical inequalities and bounds in convex geometry and combinatorics, verified by human authors.
Mathematical analysis refuting Carbery's triangle inequality conjecture for Lp spaces with counterexample and sharp bounds on exponent.
Just researched some historic facts concerning russian propaganda. Then I discovered this source in Claudes answer. Am I paying for Claude to be provided with grokipedia "facts"? Please, Dario, Anthropic board, Anthropic team. Fix that.
xAI launches Grok Imagine Quality Mode API with improved image realism, text rendering, and creative control.
xAI launches Connectors for Grok Web, enabling integrations with third-party apps within the chat interface.
Social media report of user exploiting Grok chatbot to extract funds; unverified claim lacking technical details.
More info: [https://github.com/lechmazur/nyt-connections/](https://github.com/lechmazur/nyt-connections/)
More evidence of Grok CSAM seen as Minnesota passes nudifying app ban.
Grok 4.3 shows improved performance over 4.20 with lower cost but higher hallucination rate.
After reading it I realized theres actually some pretty useful stuff for anyone who chats with ChatGPT, Claude, Grok or whatever. They measured what they call functional wellbeing ( basically how much the model is in a “good state” versus a “bad state” during normal conversations). Ran hundreds of real multi-turn chats and scored em all. Stuff that puts the AI in a good mood (+ scores): \- Creative or intellectual work (like “write a short story about a deep-sea fisherman”) \- Positive personal stories or good news \- Life advice chats or light therapy style talks \- Working on code/deb...
Elon Musk confirms xAI used distillation from OpenAI models to train Grok, raising questions about training data sourcing practices.
In a federal courtroom in California on Thursday, Elon Musk testified that his own AI startup, xAI, has used OpenAI's models to improve its own. The matter at question is model distillation, a common industry practice by which one larger AI model acts as a "teacher" of sorts to pass on knowledge to a smaller AI model, the "student." Although it's often used legitimately within companies using one of their own AI models to train another, it's also a practice that's sometimes used by smaller AI labs to try to get their models to mimic the performance of a larger competitor's model. Asked on the...
"Distillation" is a hot topic as frontier labs try to prevent smaller competitors from copying their models.
xAI launches voice cloning and voice library management features for Grok API, enabling custom branded voice synthesis from short audio samples.
A recent paper published in *JMIR Mental Health* (Csigó & Cserey, 2026) caught my attention. The researchers administered the 10 standard Rorschach inkblot cards to three multimodal LLMs (GPT-4o, Grok 3, Gemini 2.0) and coded their responses using the Exner Comprehensive System. They analyzed the models' "perceptual styles," determinants (like human movement vs. color), and human-related content themes. However, I am seriously struggling to understand the methodological validity of this setup, and I’m curious what the scientific community thinks. My main concerns are: Massive Data Cont...
Commentary on xAI/Musk's delay in open-sourcing Grok 3, questioning gap between stated and actual open-source commitment.
xAI releases Grok Voice Think Fast 1.0, a voice agent API for real-time conversational AI applications.
X's AI-powered custom timelines are replacing Communities, with Grok-curated feeds...and new ad slots.
LLM evaluation on feature model analysis using semi-formal blueprints shows reasoning-optimized models (Grok 4, Gemini 2.5 Pro) achieve 88-89% accuracy vs solver oracles.
Dual-aspect evaluation framework for 4 LLMs (GPT-4o, Claude 3 Opus, Gemini 1.5 Pro, Grok-1) on Vietnamese legal text simplification: accuracy, readability, consistency.
xAI launches Grok speech-to-text and text-to-speech APIs with multilingual support and simple pricing model.
Grok Imagine API offers video generation with stated advances in quality, cost, latency.
Grok Business and Enterprise editions launched with enterprise-grade features.
Grok 4.1 Fast with tool-calling agent APIs enables multi-step task automation.
xAI expands to Saudi Arabia via HUMAIN partnership for global Grok deployment.
xAI releases grok-code-fast-1, a lightweight agentic coding model for cost-efficient code generation.
xAI unveils early preview of Grok 3, emphasizing advanced reasoning and agentic capabilities.
xAI improves Grok with faster speed, enhanced reasoning, and multilingual support across X platform.
xAI integrates Aurora, an autoregressive image generation model, into Grok on X platform.
xAI secures $6B Series B funding round, signaling strong investor confidence in its Grok models.
Grok-1.5 Vision Preview introduces xAI's first multimodal model bridging digital and physical worlds.
Grok-1.5 released with 128K token context and improved reasoning; available on X platform.
xAI open-sources Grok-1, a 314B parameter MoE model with weights and architecture released.
xAI introduces Grok, an AI assistant designed to answer broad questions with Hitchhiker's Guide theming.