When Simulation Lies: A Sim-to-Real Benchmark and Domain-Randomized RL Recipe for Tool-Use Agents
RobustToolBench benchmark exposes tool-use agent failures from deployment noise; domain-randomized RL improves robustness.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
RobustToolBench benchmark exposes tool-use agent failures from deployment noise; domain-randomized RL improves robustness.
StepCodeReasoner supervises intermediate execution traces via RL to prevent reward hacking in code reasoning tasks.
Sparse autoencoders on LLM layer transitions detect out-of-domain interactions without treating model as black box.
STAGE framework addresses semantic drift across modality domains in federated graph learning with multimodal node attributes.
Study quantifies sample efficiency in Predictive Coding vs Backpropagation using target alignment metric, finding PC enables more efficient learning in small-scale experiments.
Claude Haiku vulnerable to multi-turn prompt injection via fictional rule construction and word-filling technique.
Paper formalizes positional encoding requirements for Transformer-based neural combinatorial optimization on vehicle routing, accounting for spatial structure unlike NLP.
Theoretical analysis of Delightful Policy Gradient, which gates advantage-based updates to escape suboptimal policy corners faster than softmax policy gradient.
Empirical study of supervised fine-tuning on procedural skills across Qwen3.5 0.8B-4B models shows W-shaped pre-training trajectory and uniform SFT gains.
YFPO leverages neuron activation patterns for preference optimization in mathematical reasoning, using model internals to guide reward signals instead of external preference data.
Proposes segment-level supervision for LLM-based Lean 4 theorem proving, balancing dense local signals of step-level training with coherence of whole-proof generation.
Proposes Hierarchical-Cluster SOINN classifier for class-incremental learning that models classes as manifolds rather than point collapse, addressing Neural Collapse theory gaps.
Earphone-based passive biometric authentication system using in-ear accelerometers for heartbeat identification, unrelated to AI frontier topics.
Reddit user benchmarks open-weight LLMs (Qwen 3.6, Zaya1) on chess visualization task; Qwen 35B-A3B achieves near-perfect SVG chessboard generation.
Model for emulating individualized chess player styles and decision-making, addressing limitations of skill-level generalization in superhuman chess engines.
Proteus red-teaming framework studies adaptive leakage of LLM agent skills via iterative adversarial revision, addressing real deployment risks beyond single-shot audits.
Mechanism for collaborative ML ensures fair data valuation and incentivizes truthfulness via game-theoretic rewards.
Qwen-Scope: open-source suite of sparse autoencoders for mechanistic interpretability across 7 Qwen models.
Stratechery opinion: Elon Musk's dual involvement in SpaceX and xAI mirrors Anthropic's structure; argues Musk should focus xAI on B2B services.
Layer-wise relevance propagation extends attribution methods to EEG-based foundation models for interpretability verification.
Sobolev-regularized MMD gradient flow with global convergence guarantees for distribution matching.
FATE: on-policy framework for agentic safety alignment via failure trajectory learning without safety-utility trade-off.
Adaptive TD-Lambda extends temporal difference learning to multi-agent RL with large joint action spaces.
Task-aware contrastive learning for automatic modulation classification via intra-instance consistency.
LOFT: parameter-efficient fine-tuning framework using low-rank orthogonal transformations with task-aware subspace selection.
Information-theoretic foundation for self-supervised clustering via KL-divergence optimization with mode-collapse constraints.
FIS-DiT: training-free frame interleaving acceleration for video diffusion transformers in few-step inference regimes.
IPI-proxy toolkit enables red-teaming web-browsing AI agents against indirect prompt injection attacks embedded in whitelisted domain HTML.
Anchor-guided variance-aware reward modeling resolves non-identifiability in preference learning by augmenting pairwise comparisons with coarse response-level labels.
ZipRerank achieves efficient listwise multimodal reranking for long documents by reducing visual tokens and eliminating autoregressive decoding.