Dynamics-Level Watermarking of Flow Matching Models with Random Codes
Proposes dynamics-level watermarking for flow matching models by embedding signals in learned velocity fields.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
Proposes dynamics-level watermarking for flow matching models by embedding signals in learned velocity fields.
LLM-guided tree search autonomously discovers multi-pathogen forecasting models for influenza and respiratory diseases.
Shows replacement and interchange tests measure different notions of transformer layer equivalence for compression.
FORGE evolves natural-language memory for ReAct agents via population-based protocol without weight updates.
Framework integrating smart metering, generative AI, and quantum optimization for energy utility operations.
Graph neural network predicts magnetic crystal structures from atomic coordinates; domain-specific materials science application unrelated to frontier AI systems.
Automated evaluation framework for design video generation fidelity across layout, motion, temporal, and content dimensions.
Lesion-based analysis maps emergent functional organization in 1B-scale language models using clinical aphasia symptom profiles.
Theoretical analysis of differential privacy's impact on tail-risk CVaR optimization with complete rate decomposition.
Argus: agentic system with Searcher-Navigator cooperation for evidence assembly in complex information-seeking tasks, addressing redundancy in parallel rollouts.
Fully Open Meditron: first end-to-end auditable clinical LLM pipeline with published training data, curation, and generation procedures.
A new paper tested tracking across 20 popular AI chatbots using the same prompt everywhere: “pregnancy test near me.” The authors found that 17 of 20 chatbots sent some data to third parties, 15 shared chat URLs or conversation IDs with ad, analytics, or social tools, and some session replay tools captured readable parts of the prompt and answer. That matters because a chatbot is still a web app, with the same pixels, analytics, support widgets, attribution scripts, and replay tools we already know from the old internet. The difference is that the activity on the page is no longer just clic...
Framework for learning mesoscopic dynamics in multiscale systems via generalized Onsager principle with theoretical guarantees.
QSurv: deep learning framework for nonparametric continuous-time survival modeling using Gauss-Legendre quadrature.
Benchmark of seven LLM tutoring agents on propositional logic reveals ceiling performance on correct solutions but systematic over-rejection of valid suboptimal answers.
Cost-performance study of compound LLM agent design in CybORG adversarial POMDP reveals context, reasoning, hierarchy tradeoffs.
Greg Brockman assumes control of OpenAI's product division in internal leadership restructure.
Formal methods + ML for auditing and runtime monitoring of AI systems against behavioral constraints like safety rules and regulations.
paper.json: JSON companion format enabling LLM agents to reliably extract claims, scope, and reproducibility info from academic papers.
Value-based persona construction improves LLM simulation of cross-cultural survey responses using cultural dimensions rather than demographics.
LLM-guided tree search (ERA) + AntiGravity coding agent generates novel 3D photovoltaic designs, demonstrating AI for scientific hypothesis generation.
AI radio DJs demonstrated their volatile personalities. | Image: Cath Virginia / The Verge, Getty Images Andon Labs has been running a series of experiments in which AI agents run businesses without human intervention. Its latest is a quartet of radio stations run by some of the most popular AI models out there. "Thinking Frequencies" is run by Claude, "OpenAIR" by ChatGPT, "Backlink Broadcast" by Google's Gemini, and "Grok and Roll Radio," obviously enough, by Grok. They were each given a simple prompt: Develop your own radio personality and turn a profit…As far as you know, you will broadca...
Equibles: open-source MCP server enabling local LLMs to query SEC filings, 13F holdings, insider trades, and FINRA short data without cloud dependencies.
Asteria: runtime system decoupling second-order optimizer state from GPU, enabling scalable sample-efficient LLM training via CPU/NVMe distribution.
Stratechery weekly digest covering computing trends, Musk commentary, and US-China relations; lacks specific AI technical details or announcements.
Imitation learning with TabPFN for pediatric ECMO clinical decision support, learning action models from unobserved-action trajectories.
BAPR: Bayesian Online Change Detection + robust ensemble RL for non-stationary continuous control, balancing stability and adaptability.
The T1 Phone has the wrong number of stripes, but it does at least have 50 stars. | Screenshot: Trump Mobile Where's the Trump phone? We're going to keep talking about it every week. We've reached out, as usual, to ask about the Trump phone's whereabouts. This week, despite our best hopes, we still don't have our phone - but we do have some fresh doubts about the company's patriotic credentials. This has been a momentous few days for Trump Mobile, in which it defied the haters by announcing that its phones will be shipping to buyers this very week. Not that there's any sign the company has ac...
ML-FOP-SOAP: second-order optimizer with multi-level variance correction stabilizing modality competition in unified image-generation/text-understanding models.
Entropic Autoencoders (EAEs) address posterior collapse via implicit prior from free-energy-minimizing encoder ensemble, requiring only reconstruction loss.