The Image Edit Detection and Localization Challenge
Benchmarking detection and localization of real-world image edits.
06 Apr 2026 ⋅ Jesse Hostetler, Ph.D. ⋅ 3 min ⋅ #Image Forensics#AI-Edited Media Detection
The Image Edit Detection and Localization Challenge
Benchmarking detection and localization of real-world image edits
Manipulated images pose big risks, and AI tools make them easier to create than ever. But the risk is not only from fully AI-generated images — it’s also from subtle, hard-to-spot edits to real images, whether those edits are made using AI tools or traditional splicing methods.
That’s why we designed the Image Edit Detection and Localization Challenge: an evaluation that challenges participants to create image manipulation analytics that not only detect whether an image is manipulated, but also what kind of manipulation was used and where in the image the edits occurred.
Why This Matters Now
Generative models evolve fast, making image tampering cheaper, easier, and more convincing. The prevalence of manipulated images poses a growing threat to information security and societal trust.
There has been a shift from full-image synthesis to AI-assisted image editing of real photos, along with an increased prevalence of mixed-authenticity images containing subtle, hard-to-detect manipulations.
The blind spot: despite advances in detecting fully AI-generated synthetic images, current methods fall short in detecting AI-edited images and localizing the edits.
What We Built
To push the research forward, UL Research Institute’s Digital Safety Research Institute (DSRI) constructed the Image Edit Detection and Localization Challenge using an entirely novel dataset of manipulated images sourced from our data engine.
The dataset includes:
- Splicing
- Traditional editing
- Fully synthetic images
- AI-edited images
- Real originals
Participants were tasked with delivering image manipulation analytics that DSRI evaluated against a private evaluation dataset. Participants were given:
- No training data
- No information about the evaluation data
We hosted the competition on Dyff, DSRI’s AI verification platform, and evaluated entries across three core capabilities.
Evaluation Metrics
- Detection: Real vs. manipulated (balanced accuracy)
- Classification: Type of manipulation (multiclass balanced accuracy / average recall)
- Localization: Where the manipulation occurred (intersection over union, IoU)
Who Participated
Seven teams from six universities and labs across the U.S., Europe, and Asia participated:
- The Catholic University of America (USA)
- Sungkyunkwan University (South Korea)
- University of Science, VNU-HCM (Ho Chi Minh City, Vietnam)
- University of Catania (Italy)
- Binghamton University (USA)
- Politecnico di Milano (Italy)
What We Learned
DSRI’s Image Edit Detection and Localization Challenge provided the first large-scale blind evaluation of image manipulation detection, classification, and localization on private, purpose-built datasets.
The results reveal both progress and critical gaps.
Key Results
-
The top team achieved:
- 0.95 balanced accuracy for detection
- 0.88 balanced accuracy for classification
-
Most teams performed well on fully synthetic images (≥ 0.80 balanced accuracy)
-
Locally synthesized content remains nearly invisible:
- 6 of 9 teams achieved < 0.10 recall on this class
-
Localization remains the frontier challenge:
- Even the best systems overlapped only ~15% of actual edited regions on average
- Distinguishing locally edited from locally synthesized content proved fundamentally difficult, with most teams near chance performance
Additional Observations
- Spliced edits are easier to localize than AI-generated edits
- Smaller manipulations are often easier to localize than larger ones
- A simple baseline was surprisingly competitive on localization despite weaker detection, suggesting detection and localization require fundamentally different approaches
The challenge establishes a clear research roadmap: the field must move beyond binary detection toward fine-grained classification and precise localization of AI-generated local edits — the manipulation type most relevant to real-world scenarios.
Results at a Glance
-
Detection and classification are feasible
Top performers reached 0.95 balanced accuracy (detection) and 0.88 average recall (classification) -
Localization is fragile
Best IoU on local manipulations reached only ~0.38, with most systems far lower -
Edit type matters
Hand-edited images (splices and traditional edits) were easier to localize; AI edits were the hardest -
Edit size matters
Small edits (<15%) were often easier to localize than large ones, as large edits dilute boundary signals
What’s Next
We are convening a participant roundtable to exchange technical insights and gather feedback on the challenge.
For future research challenges, visit:
https://dsri.org/challenges