The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Shared Doubt: Zero-shot Cross-Lingual Confidence Estimation for Language Models

Confidence estimation (CE), i.e. quantifying the reliability of a model's prediction, has attracted great interest in the context of large language models (LLMs). However, most studies focus on English, ignoring the multilingual reality of LLM usage, while many CE methods degrade or require retraining across languages. To address this gap, we investigate whether multilingual LLMs encode shared, language-transferable confidence features. We use a lightweight linear probe that predicts answer correctness directly from intermediate representations. Trained monolingually, the probe generalizes ze...

Athina Kyriakou·21 days ago

The Archive

Shared Doubt: Zero-shot Cross-Lingual Confidence Estimation for Language Models

Latent Geometric Chords for Query-Efficient Decision-Based Adversarial Attacks

Jony Ive’s funky Ferrari

Fixed-Point Masked Generative Modeling

Benchmarking and Enhancing Text-to-Image Models for Generating Visual Representations in Early Arithmetic Education

Simulation of collision avoidance behavior in crowd movement by data-driven approach

Learning Whom to Trust: Market-Feedback Adaptive Retrieval for Frozen LLMs in Event-Driven Financial RAG

Beyond Additive Decompositions: Interpretability Through Separability

MAECO-Lite: Modular Ontology for Dynamic Malware Analysis

Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration

Geometry-based Schrödinger Bridges for Trustworthy Multimodal Fusion

Boston Children’s uses AI to unlock new diagnoses

How Braintrust turns customer requests into code with Codex

Check out real-life AI prototypes from the Futures Lab.

This chip startup just raised $135M on a bet that AI’s biggest bottleneck isn’t compute — it’s memory

This AI startup will clean your home for free to train future robots

Student Capacity Moderates Knowledge Distillation Effectiveness: A Systematic Study Across ResNet Teacher-Student Pairs on CIFAR-10

FlagGAM: Rule-Based Generalized Additive Modeling for Explainable Tabular Prediction

From Local Geometry to Global Pseudo Labeling for Robust Positive Unlabeled Learning under Covariate Shift

How well does Classification Accuracy capture Concept Drift Detection Quality? An overview of Concept Drift Detection evaluation

Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines

Retriever Portfolios: A Principled Approach to Adaptive RAG

Towards Efficient LLMs Annealing with Principled Sample Selection

Detect in Any Scene: An Agentic Framework for Object Detection with Experience-Aware Reasoning

MindVoice: Reconstructing Intelligible Speech from Non-invasive Neural Signals with Pretrained Priors

Convergence of Two-Timescale Markovian Stochastic Approximations with Applications in Reinforcement Learning

MIMO: Multilingual Information Retrieval via Monolingual Objectives

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training