What happens when AI starts building itself?
Richard Socher's new $650 million startup wants to build an AI that can research and improve itself indefinitely — and he insists it will actually ship products.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
Richard Socher's new $650 million startup wants to build an AI that can research and improve itself indefinitely — and he insists it will actually ship products.
Anthropic partners with Gates Foundation on $200M initiative; funding and strategic alignment unclear.
Mythos Preview LLM helped develop macOS kernel exploit for Apple M5 in 5 days, demonstrating AI-assisted security vulnerability discovery.
And it doesn't show until two steps in. Went through 80% of my usage overnight. I don't know if it was happening before 2.1.139, but that's what I was running. FFS, Dario. Just put me on the payroll. And in the meantime, SORT YOURSELVES OUT.
Agentic inference has fundamentally changed the runtime dynamics of inference workloads by introducing non-deterministic trajectories—actions, observations,... Agentic inference has fundamentally changed the runtime dynamics of inference workloads by introducing non-deterministic trajectories—actions, observations, and decisions that an AI agent produces while working through a task. These trajectories compound end-to-end latency across hundreds of inference requests per session. NVIDIA Vera Rubin NVL72 handles the bulk of that inference load as… Source
Town’s 49,000 California residents compete with Nevada data centers for energy.
According to Bloomberg, OpenAI has enlisted an outside law firm to work through its options.
A new open source gadget called Clawdmeter turns Claude Code usage stats into a tiny desktop dashboard for AI coding power users.
Microsoft first started opening up access to Claude Code in December, inviting thousands of its own developers to use Anthropic's AI coding tool daily. It was part of an effort to get project managers, designers, and other employees to experiment with coding for the first time, and sources tell me that Claude Code has proved very popular inside Microsoft over the past six months. Perhaps a little too popular, as Microsoft is now preparing to walk back its Claude Code push. I understand that Microsoft is planning to remove most of its Claude Code licenses and push many of its developers to use...
User asks about quantization quality differences (Q4 vs Q6) for Qwen 3.6 on consumer GPUs; anecdotal discussion, no new data.
Reddit user argues humanoid robot video shows autonomous behavior rather than teleop, speculating market denial about robotics disruption.
EU GPU price tracking over 50 days shows RTX 5090 rising 3% while mid-range cards drop 7-9%, relevant for local inference hardware planning.
EntityBench: 140-episode benchmark for evaluating entity consistency in multi-shot video generation across characters, objects, and locations.
ATLAS: Framework comparing agentic reasoning (code/tools) vs. latent reasoning (embeddings) for visual reasoning tasks with trade-off analysis.
RefDecoder: Conditional video VAE decoder improving structural detail preservation in latent diffusion by injecting reference image conditioning.
FutureSim: Benchmark measuring frontier agents' ability to adapt and forecast beyond knowledge cutoff using chronological world event replay.
PDI-Bench: Quantitative framework for auditing geometric coherence in generated video via perspective distortion and point-tracking metrics.
VGGT-Edit: Feed-forward 3D scene editing via native diffusion instead of 2D-lifting, enabling interactive instruction-following in generalized models.
Systematic comparison of retrieval strategies in LLM agent RAG systems, analyzing tool output presentation and performance under information-seeking tasks.
Tensor similarity: Weight-based metric for mechanistic interpretability that is invariant to symmetries and captures cross-layer functional equivalence.
Shodh-MoE: Sparse mixture-of-experts architecture addressing negative transfer in multi-physics foundation models via selective routing.
OpenDeepThink: Population-based test-time compute framework using Bradley-Terry pairwise comparison to select best LLM reasoning candidates.
MetaBackdoor demonstrates backdoor attacks on LLMs via positional encoding manipulation without textual trigger modification.
EviScreen applies evidential reasoning with historical case retrieval for interpretable disease screening in medical images.
Retrieval-augmented multimodal framework aligns unstructured clinical narratives with structured EHR data for precise timeline reconstruction.
Position paper argues behavioral assurance methods cannot verify latent safety properties required by current AI governance frameworks.
Hand-in-the-Loop improves VLA dexterous manipulation through seamless human intervention with gesture-jump mitigation via command alignment.
MeMo framework encodes new knowledge into modular memory model while freezing LLM parameters, enabling efficient domain-specific updates.
Self-Distilled Agentic RL extends on-policy self-distillation to multi-turn agents with token-level guidance and skill conditioning.