PFlash: 10x prefill speedup over llama.cpp at 128K on a RTX 3090
PFlash: speculative prefill technique achieves 10x speedup on 128K context with quantized 27B models on RTX 3090, open-source C++/CUDA implementation.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
PFlash: speculative prefill technique achieves 10x speedup on 128K context with quantized 27B models on RTX 3090, open-source C++/CUDA implementation.
Deep kernel learning with transformer embeddings stratifies glaucoma patient risk from sparse EHR data; medical ML application without LLM/frontier AI component.
FinSafetyBench: bilingual red-teaming benchmark (14 subcategories) for evaluating LLM refusal of financial crimes and ethics violations grounded in real cases.
MemCoE: cognition-inspired two-stage memory optimization for LLM agents to learn personalized long-term user preferences within context windows.
FedKPer addresses generalization/personalization in medical federated learning via knowledge personalization; healthcare ML infrastructure without LLM focus.
Persona-induced latent variable model for adaptive user querying under budget constraints; ML methodology tangential to frontier LLM research.
ML-Bench&Guard: policy-grounded multilingual safety benchmark (14 languages) aligning LLMs with region-specific regulations and cultural context.
Reddit user reports severe hallucinations and task non-compliance in Claude Opus 4.7 on May 1st; anecdotal complaint without reproduction details.
Developer demo of generative game engine using Gemini 3 for spell generation with 6-player multiplayer physics simulation.
The Pentagon has struck deals with OpenAI, Google, Microsoft, Amazon, Nvidia, Elon Musk's xAI, and the startup Reflection, allowing the agency to use their AI tools in classified settings, according to an announcement on Friday. At the same time, the Defense Department has left out Anthropic - which it previously used for classified information - after declaring it a supply-chain risk. This builds upon deals with OpenAI and xAI, which have already reached agreements with the Pentagon for the "lawful" use of their AI systems. A report from The Information suggests Google has struck a similar a...
Intel releases AutoRound, a low-bit quantization algorithm optimized for CPU/XPU/CUDA with vLLM and Transformers compatibility.
Obfuscated Natural Number Game benchmarks LLM prover architectural reasoning vs. pattern matching; evaluates formal theorem-proving capabilities beyond saturation.
Elon Musk spent the better part of three days on the witness stand this week in his lawsuit against OpenAI, and it’s already getting messy. Emails, texts, and his own tweets are surfacing in court, and there are plenty more witnesses to come. Musk’s argument against OpenAI? By converting the company to a for-profit model, Sam Altman betrayed the “nonprofit for the […]
MathArena: continuously-maintained evaluation platform aggregating mathematics benchmarks to track LLM progress; successor to static math benchmarks.
Augmented Lagrangian Multiplier Network stabilizes state-wise constraint enforcement in RL; safety optimization methodology without LLM specificity.
InpaintSLat: training-free 3D inpainting via initial noise optimization in latent diffusion; computer vision task orthogonal to LLM/frontier AI focus.
Formalizes Phase-Latency Isomorphism showing spiking sparse distributed memory and transformers share five functional operations with cosine similarity retrieval.
Introduces mini-batch Markov risk measures and multipattern Q-learning with regret bounds for risk-averse finite-horizon MDPs.
Elon Musk is the one who wanted this trial. He has spent months claiming OpenAI "stole a nonprofit," and saying he was the actual driving force behind one of the most important companies currently in tech. All indications are that he won't win his case against the company, but he's fighting it anyway. So you'd think he'd have done better when it was his time to take the stand. Verge subscribers, don't forget you get exclusive access to ad-free Vergecast wherever you get your podcasts. Head here. Not a subscriber? You can sign up here. Instead, Musk spent much of the week arguing with lawyers ...
AdaMeZO enables Adam-style zeroth-order LLM fine-tuning without storing moment estimates, reducing GPU memory while maintaining convergence.
Casts budget-constrained group assignment as Riemannian manifold optimization for mixed-precision quantization and expert selection.
PEACE framework uses cross-modal alignment and curriculum learning for transfer of adult ECG models to pediatric diagnosis.
I just went through one of the most infuriating support experiences with Claude / Anthropic, and I need to get this off my chest. I paid extra for Claude Design credits, about €80 worth, and used them to create actual designs I needed for work. Then those designs just vanished. Not “hard to find,” not “moved somewhere else”: gone. Completely disappeared after I paid for the service. I opened support and immediately asked for a refund or, at the very least, to speak to a human. What I got instead was Fin, the AI “agent,” which looped me endlessly through the same bullshit: “Try clearing cac...
Task-aware evaluation framework for blood glucose forecasting with event-level metrics addressing high-risk regimes in clinical decision support.
In the beginning, platforms like Fiverr were places where people could hire freelancers to do specialized creative labor using skills that took years to develop. In the age of generative AI, though, many of these gig workers have embraced the technology in order to meet clients' demands. These workers' profiles emphasize that they can quickly (and cheaply) whip up images and videos of just about anything. But often, what their clients are looking for are dramatic animations inspired by the Christian Bible. On TikTok, YouTube, Instagram, and Facebook it is very easy to stumble across AI-genera...
Combines multimodal energy-based models with VAE refinement via MCMC to improve inter-modal dependency capture in generative modeling.
On-policy self-distillation for GUI grounding provides dense token-level supervision from single rollouts in autonomous agent GUI interaction.
Adapts stochastic stress optimization from graph drawing to dimensionality reduction, replacing SMACOF with SGD-based methods.
PROBE recasts MLIP uncertainty quantification as selective classification using frozen backbone embeddings for interatomic potential reliability.
Framework for autonomous materials discovery embedding manufacturability constraints to bridge lab-to-deployment gap.