UPDATED MONTHLY · LIVE EVAL DATA

AI Generator Leaderboard

Detection accuracy of PixelPrism's 16-detector forensic ensemble against every major AI image generator, measured on a held-out canary set the model never sees during training.

Last retrainMay 5, 2026
Detectors16
Canary set232 images
Model versionv0.10.0
Page updatedMay 5 2026 22:55 UTC
Real-photo accuracy
96.1%
Fresh-AI average
96.6%
Per-generator min
91.0%
Drift gap
−2.7%
Generator Type Accuracy Avg P(AI) First detected Last retrain
DALL-E 3
OpenAI
KNOWN100.0%0.990Oct 2023May 5 2026
HunyuanDiT
Tencent
FRESH100.0%0.990May 2024May 5 2026
PixArt-Sigma
PixArt-alpha
FRESH100.0%0.990Jun 2024May 5 2026
Recraft V3
Recraft
FRESH99.6%0.990Jan 2025May 5 2026
FLUX 1.1
Black Forest Labs
FRESH97.8%0.970Aug 2024May 5 2026
Nano Banana
Google Gemini Image
FRESH97.4%0.960Mar 2026May 5 2026
Midjourney v6
Midjourney
FRESH96.8%0.960Apr 2024May 5 2026
Ideogram 2.0
Ideogram
FRESH94.5%0.940Aug 2024May 5 2026
Imagen 3
Google Vertex AI
FRESH93.3%0.920Apr 2024May 5 2026
Stability SD 3.5
Stability AI
FRESH92.3%0.910Mar 2024May 5 2026
Grok Imagine
xAI
FRESH91.0%0.890Aug 2024May 5 2026

≥ 95%   80–94%   < 80%

Methodology

  1. Each month we generate 250-500 fresh images per generator using the published API and a curated prompt set covering 165 diverse subjects.
  2. 10% of each generator's output is held out as a canary set the detector never sees during training.
  3. The canary set is run through PixelPrism's 16-detector ensemble + HGBM meta-classifier with the live production weights.
  4. Accuracy = % of canary images where the meta-classifier returns P(AI) ≥ 0.5.
  5. Real-photo accuracy is measured on a separate corpus of 8,500 photos from CelebA, ImageNet, Food101, Hemg, Pexels, COCO, country211, and Cat_and_Dog datasets.
  6. "FRESH" tag = generator first appeared in our test corpus in 2024 or later. "KNOWN" = present since the project's earliest training distribution.

These numbers are auto-generated from data/eval_post_retrain_v8_2026-05.json on lilliemae. Data and methodology are open by design — when our detector misses something, you'll see it here. We do not cherry-pick.