Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval
SIRA framework improves retrieval-augmented agents by modeling expert search priors, reducing retrieval rounds and latency for organizational knowledge bases.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
SIRA framework improves retrieval-augmented agents by modeling expert search priors, reducing retrieval rounds and latency for organizational knowledge bases.
Extension of Venn-Abers predictors to unbounded regression using conformal prediction; narrow technical contribution to probabilistic forecasting.
Chromophore-region 3D mechanism graphs for fluorescent protein quantum-yield prediction; domain-specific protein modeling task, limited AI audience relevance.
Comprehensive benchmark for Multimodal Domain Generalization revealing inconsistent evaluation protocols and fragmented research; validates real-world robustness challenges.
StraTA framework adds explicit strategy sampling to agentic RL, improving credit assignment and exploration over long-horizon LLM decision-making.
Hybrid concept-based and abductive explanations for vision models using formal causal reasoning; advances interpretability beyond single-concept limits.
GlazyBench dataset (23K formulations) for ceramic glaze property prediction and generation; domain-specific multimodal benchmark with limited broad AI relevance.
RAO trains recursive agents to delegate sub-tasks recursively, enabling divide-and-conquer inference scaling for longer contexts and harder problems.
ScaleLogic benchmark isolates proof-planning depth and logic expressiveness to systematically study RL training scaling for LLM long-horizon reasoning.
Framework for parsing and verifying source attribution in LLM research agents; evaluates citation accuracy via AST parsing and reproducible verification.
Method to compress SFT-induced LLM behaviors into sparse, causally necessary subnetworks for selective inference-time control.
Hybrid quantum-classical GAN framework for generating adversarial network traffic using variational quantum generators.
Lightweight asymmetric neural codec for compressing multi-modal sensor data on bandwidth-constrained edge devices.
PianoCoRe: unified piano MIDI dataset with 250k performances, 5.6k pieces, and note-level alignments for music AI.
Human-in-the-loop L2 Korean morphosyntactic annotation using parser agreement as proxy for correctness.
MASPO: framework for joint prompt optimization across LLM-based multi-agent systems with unified evaluation.
Formalization of Algospeak coevolution dynamics between LLM evasion and detection; introduces Majority Understandable Modulation metric.
OpenAI releases three audio models via API enabling voice app development for developers.
Theoretical analysis of why sign-based optimizers (SignSGD, Muon) outperform SGD in foundation model training via ℓ1-norm bounds.
SkillOS: self-evolving LLM-agent framework learning long-horizon skill curation policies from streaming task interactions.
Online Bayesian calibration method for time-evolving digital twins handling gradual drift and abrupt system changes.
Mechanistic analysis traces attention sink phenomenon to variance discrepancy in value aggregation and FFN super-neuron activation.
SoftSAE introduces dynamic sparsity for Sparse Autoencoders, adapting feature activation count per input for improved mechanistic interpretability.
Proves Transformers implement normalized gradient descent for in-context logistic regression, formalizing implicit algorithmic execution.
DARTS optimizes covariate selection in budget-constrained randomized trials using Thompson sampling for causal inference.
AI CFD Scientist demonstrates LLM-based agents automating computational fluid dynamics discovery with physics validation loops.
Dynamic budget allocation for multi-turn LLM evaluation under conformal survival frameworks to predict rare jailbreak events.
Patch2Vuln reconstructs security vulnerabilities from Linux binary patches using LLM agents without source code access.
Functional-analytic proof that L² weight decay on Transformer cross-entropy satisfies Villani coercivity criteria for optimization guarantees.
UniSD unifies self-distillation design choices for LLM adaptation without external teachers, systematizing complementary techniques.