Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why
Training-free diagnostic reveals when on-policy distillation helps vs. harms reasoning models at per-token granularity.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
Training-free diagnostic reveals when on-policy distillation helps vs. harms reasoning models at per-token granularity.
Probabilistic shielding framework extends classical shields for MDPs; trade-offs between safety guarantees and permissiveness.
LoKA applies FP8 precision to large recommendation models via kernel-level optimization, avoiding quality degradation.
PowerColor releases Radeon AI PRO R9600D GPU with 32GB GDDR6 memory in single-slot and passive cooling variants for inference workloads.
Proof that neural weight norms equal Kolmogorov complexity in fixed precision, explaining why weight decay induces Solomonoff's universal prior.
Neural1.5 method for clinical QA over EHRs using DSPy MIPROv2 optimizer for automated prompt tuning across four modular subtasks.
AssayBench: benchmark for LLMs and agents on virtual cell phenotypic screening combining textual inputs with diverse cellular outputs.
Self-Optimizing Language Models (SOL): dynamic per-token compute allocation via lightweight policy network paired with frozen LLM.
CADBench: unified multimodal benchmark for CAD program generation with 18k samples across six modalities and design datasets.
Attractor-Vascular Coupling Theory: mathematical framework for cuffless blood pressure estimation from smartphone photoplethysmography.
Decision-centric rate-distortion framework for agent memory compression prioritizing decision quality over descriptive faithfulness.
Qwen3.5 0.8B sees 2.88M monthly downloads; user reports semantic understanding, JSON parsing, and latency challenges in production workflows.
BEACON: 430GB multimodal dataset of Valorant gameplay for behavioral authentication and continuous monitoring across skill tiers.
BenchCAD: comprehensive benchmark for programmatic CAD generation from visual/textual inputs in realistic industrial settings.
Directional Groupwise Preference Optimization (DGPO): group-level margin-based framework for LLM alignment with directional consistency.
RUBEN uses rule extraction and pruning to explain RAG-LLM outputs and test safety robustness against adversarial prompts.
MiniCPM 4.6 released on Hugging Face; open-weights efficient model variant with updated capabilities.
EditMGT applies Masked Generative Transformers to localized image editing, outperforming diffusion-based approaches.
Digg returns (again) as another place to read AI news.
Counterfactual data augmentation improves Vision-Language Models' chart understanding efficiency without scaling synthetic datasets.
RAG-based satirical definition generator for Finnish news context with human-annotated evaluation framework.
Generalized Turing Test formalizes agent intelligence comparison via indistinguishability, independent of tasks or datasets.
Pi-Serini evaluates BM25 lexical retrieval sufficiency in agentic research loops paired with frontier LLMs.
Distance-metric-based instance methods detect conditional anomalies in patient management alerts.
Reddit user compares leaked Gemini Omni video model against Sora 2, which OpenAI is reportedly discontinuing.
BabelDOC preserves PDF layout during cross-lingual translation via intermediate representation decoupling structure from text.
DISCA steers LLM cultural preferences via sociodemographic disagreement signals without fine-tuning or white-box access.
Clin-JEPA extends joint-embedding predictive pretraining to EHR trajectories for multi-task patient risk prediction.
Transcoda applies synthetic data and Humdrum kern encoding to optical music recognition without large labeled datasets.