Source · Community

r/MachineLearning

Reddit · COMMUNITY

Last updated May 28, 2026, 6:00 PM

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

Hello everyone. The new dataset is named MONET, is Apache 2.0 and available on HF: [https://huggingface.co/datasets/jasperai/monet](https://huggingface.co/datasets/jasperai/monet) **MONET is open, Apache 2.0-licensed image–text dataset. It was built from 2.9 billion images and refined to 104.9 million high-quality samples.** We are also publishing [a paper](https://arxiv.org/abs/2605.21272) that explains how the dataset was created if you are curious and 3 compagnions projects * [A umap to visualize the distribution](https://huggingface.co/spaces/jasperai/monet-umap) * [A retreival tool ...

u/dh7net·2 months ago·46 pts / 7 comm

r/MachineLearning

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

AI-generated CUDA kernels silently break training and inference [R]

[D] Where do you go for serious AI research discussion online? [D]

Already 11 000 submissions for EMNLP? [D]

The famous METR AI time horizons graph contains numerous severe errors [D]

How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D]

PapersWithCode new features - week 1 [P]

COLM 2026 ReviewsDiscussion [D]

NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P]

Novel Problems in VLA [R]

Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D]

OpenAI claims a general-purpose reasoning model found a counterexample to Erdos's unit-distance bound [D]

How competitive are PhD admissions currently [D]

Machine Learning on Spherical Manifold [R]

What do you think about Tabular Foundation Models [D]

A Simple Solution to Improve Broken Peer Review System at AI Conferences [R]

How to get rejected by IEEE T-PAMI with 'Excellent' scores?[D]

Released a free 9.8M doc Indic multilingual corpus — Hindi, Bengali, Tamil, Telugu + 7 more (CC0, HuggingFace) [P]

Sub-JEPA: a simple fix to LeCun group's LeWorldModel that consistently improves performance [P]

Reviving PapersWithCode (by Hugging Face) [P]

Slop is making me feel disconnected from AI Research [D]

Program misleading high school students into paying to perform academic misconduct in ML Research [D]

Do you agree with Judea that learning from data is not everything? [D]

Backlash against Arxiv's proposed 1 year ban is genuinely perplexing. [D]

arXiv implements 1-year ban for papers containing incontrovertible evidence of unchecked LLM-generated errors, such as hallucinated references or results. [N]

Would a 2000-2021 ML paper even get accepted today? [D]

Human-level performance via ML was *not* proven impossible with complexity theory [D]

Built Support Vector Machine(SVM) from scratch in Rust [P]

Elastic Attention Cores for Scalable Vision Transformers [R]

How do you create memorable poster for top tier conferences ( ICML/ICLR/NEURips ect…) [D]

Steam Recommender using similarity! (Undergraduate Student Project) [P]

TabPFN-3 just released: a pre-trained tabular foundation model for up to 1M rows [R][N]

ICML Author Removal [D]

Where are small Models like Qwen3 0.6B and Qwen3.5 0.8B used ? Huggingface shows 2.88 million downloads this month.[D]

Interactive Jensen–Shannon Divergence Visualisation [P]

Is reproducing or implementing a paper considered research? [R]

PhD students in ML, how many hours on average do you work? [D]

Signals: finding the most informative agent traces without LLM judges [R]

What is an average publication outcome for an ML PhD? [D]

We are hitting a wall trying to force transformers to do actual logic [D]

My experience interviewing with Huawei Vancouver for an ML research role: strong mismatch between how it was pitched and how it was evaluated [D]

DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]

Interactive KL Divergence Visualisation [P]

People Interested in Continual Learning Research[R]

Disillusionment with mechanistic interpretability research [D]

Getting harassed by an aggressive “independent researcher” demanding very specific citations and phrasing in my paper [D]

ECCV reviewer wants me to compare and contrast to my own paper. [D]

META Superintelligence Lab Presents: ProgramBench: Can SOTA AI Recreate Real Executable Programs(ffmpeg, SQLite, ripgrep) From Scratch Without The Internet?

Stop letting LLMs edit your .bib [D]

NeurIPS Submission Number [D]

Human-level performance via ML was not proven impossible with complexity theory [D]