UPDATED MONTHLY · LIVE EVAL DATA

AI Generator Leaderboard

Detection accuracy of PixelPrism's 16-detector forensic ensemble against every major AI image generator, measured on a held-out canary set the model never sees during training.

Real-photo accuracy

96.0%

Fresh-AI average

96.6%

Per-generator min

91.0%

Drift gap

−2.7%

Generator	Type	Accuracy	Avg P(AI)	First detected	Last retrain
DALL-E 3 OpenAI	KNOWN	100.0%	0.990	Oct 2023	Jul 1 2026
HunyuanDiT Tencent	FRESH	100.0%	0.990	May 2024	Jul 1 2026
PixArt-Sigma PixArt-alpha	FRESH	100.0%	0.990	Jun 2024	Jul 1 2026
Recraft V3 Recraft	FRESH	99.6%	0.990	Jan 2025	Jul 1 2026
FLUX 1.1 Black Forest Labs	FRESH	97.8%	0.970	Aug 2024	Jul 1 2026
ai_overall Unknown	FRESH	97.4%	0.000	—	Jul 1 2026
Nano Banana Google Gemini Image	FRESH	97.4%	0.960	Mar 2026	Jul 1 2026
Midjourney v6 Midjourney	FRESH	96.8%	0.960	Apr 2024	Jul 1 2026
real Unknown	FRESH	96.0%	0.000	—	Jul 1 2026
Ideogram 2.0 Ideogram	FRESH	94.5%	0.940	Aug 2024	Jul 1 2026
Imagen 3 Google Vertex AI	FRESH	93.3%	0.920	Apr 2024	Jul 1 2026
Stability SD 3.5 Stability AI	FRESH	92.3%	0.910	Mar 2024	Jul 1 2026
Grok Imagine xAI	FRESH	91.0%	0.890	Aug 2024	Jul 1 2026

● ≥ 95% ● 80–94% ● < 80%

Methodology

Each month we generate 250-500 fresh images per generator using the published API and a curated prompt set covering 165 diverse subjects.
10% of each generator's output is held out as a canary set the detector never sees during training.
The canary set is run through PixelPrism's 16-detector ensemble + HGBM meta-classifier with the live production weights.
Accuracy = % of canary images where the meta-classifier returns P(AI) ≥ 0.5.
Real-photo accuracy is measured on a separate corpus of 8,500 photos from CelebA, ImageNet, Food101, Hemg, Pexels, COCO, country211, and Cat_and_Dog datasets.
"FRESH" tag = generator first appeared in our test corpus in 2024 or later. "KNOWN" = present since the project's earliest training distribution.

These numbers are auto-generated from data/eval_post_retrain_v8_2026-05.json on lilliemae. Data and methodology are open by design — when our detector misses something, you'll see it here. We do not cherry-pick.