CogScale: Scalable Benchmark for Sequence Processing
CogScale: 14-task synthetic benchmark for evaluating sequence processing in novel architectures at reduced computational cost.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
CogScale: 14-task synthetic benchmark for evaluating sequence processing in novel architectures at reduced computational cost.
KPMG deploys Claude across 276,000+ employees via Digital Gateway platform in multi-year strategic partnership with Anthropic.
MSAlign aligns molecule and mass spectra foundation models for improved metabolite identification in drug discovery and clinical research.
Memory-augmented RL framework for CAD generation agents handling long operation sequences and geometric constraints with error correction.
EngiAI benchmark suite for multi-agent LLM engineering design with workflow, RAG, and simulation evaluation across seven prompt styles.
TERGAD detects graph anomalies by combining text and structure-aware representations to identify inconsistencies between node content and topology.
ContextRAG constructs graph topology for RAG without LLM-based extraction using k-means and formal concept analysis for multi-hop QA.
Survey of GNN architectures for community detection in graphs, reviewing clustering performance on large high-dimensional networks.
ByteDance released Lance, a 3B-parameter open-weight multimodal model supporting image/video understanding, generation, and editing.
LIFT and PLACE: coarse-to-fine knowledge distillation framework for lightweight diffusion models via linear fitting and adaptive coefficient estimation.
Survey of 120+ studies on mathematical reasoning in LLMs: datasets, architectures, training strategies, evaluation protocols for AI benchmarking.
Trace-based benchmark measuring safety-aligned LLM behavior (Gemma 4) as autonomous security agents vs. uncensored derivatives on 30 vulnerability-analysis tasks.
Projection agents: RL-GNN approach for graph combinatorial optimization with improved generalization and scalability across diverse problem instances.
CAIT: dependency parser and POS tagger for CHILDES child-adult interaction data, outperforming SpaCy and Stanza on syntactic structure.
Anthropic acquires Stainless for $300M+, gaining control of major MCP server generation platform serving OpenAI, Google, Meta, and Cloudflare SDKs.
Arabic NLP framework for financial sentiment analysis using Transformer-based NER on Saudi market news and social media data.
LLM-based generative error correction for low-resource West Frisian ASR with data contamination analysis and offline dataset construction.
Home lab setup post showcasing multi-GPU infrastructure for running 35B+ parameter models locally.
Multi-concept backdoor injection vulnerability in text-to-image diffusion models: semantic conflicts from sequential fine-tuning and redistributed checkpoints.
Diffusion-Copula framework decouples marginal distributions from dependence structures for multivariate financial time-series forecasting with tail-risk calibration.
Closed-loop AI workflow (Gaussian process, Bayesian optimization) for cryomicroneedle cryoprotectant discovery from 198 mesenchymal stem-cell formulations.
Strategic classification framework incorporating behavioral biases and cognitive deviations from rational agent assumptions.
Automated neighborhood generation for local search via constraint symmetry analysis in the IDP system.
Convergence analysis for consensus-based particle optimization in nonconvex bi-level problems.
Cross-View Attention Fusion network for cardiac output estimation from photoplethysmography signals.
CriterAlign: criterion-centric LLM judge for pairwise code preference evaluation with task-specific rubric alignment.
Pseudocode-guided structured reasoning framework reducing hallucinations in vision-language models for robotic automation.
Prior alignment approach enabling tabular foundation models to generalize under strategic feature manipulation post-deployment.
OScaR: KV cache quantization technique addressing token norm imbalance for extreme compression in long-context LLMs.
Reddit post expressing opinion about 'Jarvis' with minimal substantive content or claims.