Unlearning Offline Stochastic Multi-Armed Bandits
First formal study of machine unlearning in offline multi-armed bandits with privacy and decision-quality tradeoffs.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
First formal study of machine unlearning in offline multi-armed bandits with privacy and decision-quality tradeoffs.
New metric (Class Angular Distortion Index) for evaluating cluster arrangement in dimensionality reduction visualizations.
Reddit post claims Elon Musk discussed AI extinction risk at trial; judge reportedly restricted the topic.
BlenderRAG retrieval system improves LLM-to-Blender code generation success from 40.8% to 70% via multimodal examples.
H-RAG hierarchical parent-child retrieval pipeline for multi-turn RAG conversation tasks in SemEval-2026.
EGREFINE frames database schema refinement as optimization to improve Text-to-SQL accuracy while preserving query equivalence.
SC-Taxo generates hierarchical scientific taxonomies using LLMs with semantic consistency constraints across hierarchy levels.
Study of embedding similarity invariance under machine translation across 28 languages using Manifesto Corpus.
Xiaomi's MiMo-V2.5-Pro and Kimi K2.6 dominate custom social deduction game benchmark, outperforming other open-weights models.
gemma-4-31B-it-DFlash open-weights model released on Hugging Face, pending llama.cpp integration.
Task vector analysis reveals structural conflicts (magnitude, sign, module-wise) preventing SFT-RLVR integration in LLMs.
Reddit discussion expressing sentiment that AI progress remains in early stages without substantive technical claims or data.
ECCV reviews should be out by 2nd May. Since no exact time was specified this year, they’ll likely be released sometime within the next 48 hours. Hopefully, the reviews go well for everyone. We can use this thread to discuss them, as I haven’t seen one started yet.
Encoding probe method reconstructs LLM representations using interpretable features, avoiding confounds of decoding probes.
User demonstrates closed-loop SVG generation using Qwen3.6-27B with Agno framework and vision feedback for iterative refinement.
User demonstrates DFlash speculative decoding in llama.cpp with Qwen3.5-35B-A3B on RTX 2080 SUPER 8GB, achieving inference on VRAM-constrained hardware.
Microsoft is launching a new AI agent inside Word that's specifically designed for legal teams. Legal Agent handles document edits, negotiation history, and complex documents to help legal teams handle tasks like reviewing contracts. "Instead of relying on general AI models to interpret commands, the agent follows structured workflows shaped by real legal practice, managing clearly defined, repeatable tasks like reviewing contracts clause by clause against a playbook," explains Sumit Chauhan, corporate vice president of Microsoft's Office Product Group. The Legal Agent can work with existing ...
Reddit discussion speculating on TurboQuant adoption timeline and asymmetric K/V quantization gains.
Reddit discussion about granting Codex API access to local macOS environment; user seeks opinions on security/feasibility.
Reddit discussion of unexplained model behavior (goblin preference) and speculative commentary on AI alignment risks.
Reddit discussion about user engagement with Claude's thinking process and command execution UI elements.
Study reports AI system outperforms emergency room physicians in diagnostic accuracy, suggests collaborative clinical deployment model.
A new US-wide cell phone network marketed to Christians is set to launch next week. It blocks porn, which experts in network security say marks the first time a US cell plan has used network-level blocking for such content that can’t be turned off even by adult account owners. It’s also rolling out a filter…
User documents personal journey running open-weights models locally on increasingly expensive hardware (M3 Ultra to RTX Pro 6000), testing Qwen, DeepSeek, Gemma, MiniMax.
GPU rental costs on Vast.ai and Mithril spike above $1k/hour for H100/H200/B200, raising affordability concerns for academic and startup ML development.
To start with, I'm using Claude for years, and it's been a roller coaster, especially with the usage policy. I'm a lawyer and I wrote a **legal research skill**, instructing the model exactly what to verify and where. When I asked it a tax-related question, (which is also law, by the way) Opus 4.7 told me I should contact a tax expert because it's a lawyer (??) and not a tax expert. Then it answered my question anyway and basically made up even the basic stuff. Since I knew it was wrong, I asked whether it had verified this, and the model told me no, it just remembered the answer from i...
Reddit discussion on conference submission pressure and burnout in academic ML research culture.
Grok 4.3 shows improved performance over 4.20 with lower cost but higher hallucination rate.