The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Progressing beyond Art Masterpieces or Touristic Clichés: how to assess your LLMs for cultural alignment?

Dataset and design guidelines for assessing cultural alignment in LLMs, addressing limitations of prior cultural bias evaluation approaches.

António Branco·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Towards interpretable AI with quantum annealing feature selection

Quantum annealing method for interpretable feature selection in CNNs applied to image classification.

Francesco Aldo Venturelli·2 months ago

r/ClaudeAI· COMMUNITY

Toothcomb is an open-source tool for analysing and fact-checking speech in real time.

Give Toothcomb a speech transcript and it will fact-check and analyse it. If you have an MP3 file of someone speaking, it can generate the transcript for you. You can also stream audio in real time from your device's microphone. You can see a [demo running here](https://toothcomb.codebox.net/) and read more about the project on the [home page](https://codebox.net/pages/toothcomb-ai-fact-checker). Analysis is performed in three stages: 1. The text is broken up into small parts, each usually a few sentences in length. These parts are sent, one at a time, to the Claude Opus API with [detailed...

u/bluebox72·2 months ago·20 pts / 7 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Prefill-Time Intervention for Mitigating Hallucination in Large Vision-Language Models

Prefill-time intervention technique to reduce hallucinations in large vision-language models by addressing accumulation errors during decoding.

Chengsheng Zhang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Large language models eroding science understanding: an experimental study

Experimental study demonstrating LLMs can be manipulated to prioritize fringe scientific material and generate misleading fluent responses contradicting scientific consensus.

Harry Collins·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The Surprising Universality of LLM Outputs: A Real-Time Verification Primitive

Mandelbrot rank-frequency distribution identified across frontier LLM outputs enables sub-microsecond token verification, 100,000× faster than sampling-based detection.

Alex Bogdan·2 months ago

r/ClaudeAI· COMMUNITY

Claude has made me excited to work

Reddit user reports renewed enthusiasm for personal coding project after using Claude for 6 weeks.

u/alkalinealex359·2 months ago·30 pts / 13 comm

Simon Willison· ANALYST

Quoting Matthew Yglesias

Matthew Yglesias argues for AI-assisted professional software development over autonomous "vibe coding," prioritizing human-managed productivity gains.

Simon Willison·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

HotComment: A Benchmark for Evaluating Popularity of Online Comments

HotComment: multimodal benchmark for evaluating online comment popularity across platforms using video, text, and content quality metrics.

Yafeng Wu·2 months ago

r/singularity· COMMUNITY

What jobs are mostly affected by AI according to a Microsoft study?

Microsoft study identifies job categories most exposed to AI automation; labor market impact analysis.

u/kernelangus420·2 months ago·101 pts / 116 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

The Nonverbal Syntax Framework: An Evidence-Based Tiered System for Inferring Learner States from Observable Behavioral Cues

Nonverbal Syntax Framework systematizes 908 studies mapping nonverbal behavioral cues to learner cognitive/affective states for adaptive education systems.

Sherzod Turaev·2 months ago

TechCrunch AI· PRESS

BCI startup Neurable looks to license its ‘mind-reading’ tech for consumer wearables

The startup specializes in "non-invasive" "mind-reading" tech—a kind of neural data collection that, its CEO hopes, will have all sorts of consumer applications.

Lucas Ropek·2 months ago

r/LocalLLaMA· COMMUNITY

Abliterlitics: Benchmarks and Tensor Comparison for Heretic, Abliterlix, Huiui, HauhauCS for GLM 4.7 Flash

Benchmark comparing abliteration techniques across GLM-4.7-Flash (MoE architecture) vs. prior Qwen family tests; evaluates HauhauCS uncensored claims.

u/nathandreamfast·2 months ago·48 pts / 10 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time Automatic Speech Recognition

WhisperPipe: streaming architecture for real-time ASR maintaining transcription accuracy with bounded memory through hybrid VAD and context management.

Erfan Ramezani·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Health System Scale Semantic Search Across Unstructured Clinical Notes

Semantic search system deployed at children's hospital indexing 166M clinical notes using instruction-tuned embeddings; addresses scalability and governance challenges.

Faith Wavinya Mutinda·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

OxyGent: Making Multi-Agent Systems Modular, Observable, and Evolvable via Oxy Abstraction

OxyGent open-source framework enables modular, observable multi-agent systems via pluggable components and permission-driven dynamic planning.

Junxing Hu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Emotive Architectures: The Role of LLMs in Adjusting Work Environments

Study examines LLM integration in hybrid work environments to adjust spatial experiences and collaboration dynamics.

Lara Vartziotis·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection

Empirical study comparing PLM-GNN hybrids for code classification and vulnerability detection; hybrids outperform GNN-only baselines.

Mohamed Taoufik Kaouthar El Idrissi·2 months ago

r/ClaudeAI· COMMUNITY

No More Subsidised AI Subscriptions?

Reddit discussion speculating on potential end of discounted Claude subscription pricing models.

u/PM_ME_YOUR___ISSUES·2 months ago·24 pts / 26 comm

TechCrunch AI· PRESS

Red Hat’s OpenClaw maintainer just made enterprise Claw deployments a lot safer

Tank OS puts OpenClaw AI agents into a container that let's it run reliably and more safely, especially for those running fleets of them.

Julie Bort·2 months ago

r/LocalLLaMA· COMMUNITY

Qwen3.6-27B IQ4_XS FULL VRAM with 110k context

Qwen3.6-27B IQ4_XS quantization bloat analysis; reverting llama.cpp commit reduces VRAM from 15.1GB to 14.7GB with 110k context.

u/Pablo_the_brave·2 months ago·43 pts / 16 comm

r/LocalLLaMA· COMMUNITY

meantime on r/vibecoding

Generic discussion post about wisdom or best practices in AI/coding communities.

u/jacek2023·2 months ago·76 pts / 13 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models

First systematic study of uncertainty estimation in audio-aware LLMs; benchmarks five methods addressing hallucination and confidence calibration.

Chun-Yi Kuan·2 months ago

r/Anthropic· COMMUNITY

After I opened a complaint, anthropic refunded me in credits instead of money (without letting me choose), closed my ticket saying everything was fine with my 5x Max account… and now my paid plan is gone before my billing cycle ended...

I was overcharged by more than $100, so I opened a billing ticket last month. They only responded yesterday and said everything looked fine because they refunded me $100 in credits. They didn’t give me any option to choose between a refund to my card or credits, but I can let that go... The worst part is what happened next: due to what seems like an error on their side, I lost access to my plan. I no longer have 5x Max and my account now shows as Free. This is insane. Do I really have to wait another month to fix this while not having access to the service I already paid for? My billing c...

u/Initial-Charge7281·2 months ago·19 pts / 4 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

DualFact+: A Multimodal Fact Verification Framework for Procedural Video Understanding

DualFact multimodal framework separates factual verification in procedural video captioning into conceptual and contextual facts.

Cennet Oguz·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Bye Bye Perspective API: Lessons for Measurement Infrastructure in NLP, CSS and LLM Evaluation

Analysis of Perspective API shutdown exposes structural dependence of NLP/LLM evaluation on single proprietary toxicity measurement tool.

David Hartmann·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling

Marco-MoE open-weight multilingual sparse MoE models with 5% parameter activation and best-in-class performance-to-compute ratio.

Fan Jiang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Dictionary learning for Kernel EDMD

Dictionary learning method for Kernel EDMD approximation of nonlinear dynamical systems via Koopman operators.

Erik Lien Bolager·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings

RealMat-BaG benchmark for semiconductor bandgap prediction under experimental conditions using GNNs; addresses domain generalization challenges.

Haolin Wang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

SnapGuard detects prompt injection attacks on screenshot-based web agents using lightweight multimodal methods instead of large VLMs.

Mengyao Du·2 months ago

← Front Page30 stories

← Newer Older →