The Image Edit Detection and Localization Challenge

Benchmarking detection and localization of real-world image edits.

06 Apr 2026 ⋅ Jesse Hostetler, Ph.D. ⋅ 3 min ⋅ #Image Forensics#AI-Edited Media Detection

The Image Edit Detection and Localization Challenge

Benchmarking detection and localization of real-world image edits

Manipulated images pose big risks, and AI tools make them easier to create than ever. But the risk is not only from fully AI-generated images — it’s also from subtle, hard-to-spot edits to real images, whether those edits are made using AI tools or traditional splicing methods.

That’s why we designed the Image Edit Detection and Localization Challenge: an evaluation that challenges participants to create image manipulation analytics that not only detect whether an image is manipulated, but also what kind of manipulation was used and where in the image the edits occurred.

Why This Matters Now

Generative models evolve fast, making image tampering cheaper, easier, and more convincing. The prevalence of manipulated images poses a growing threat to information security and societal trust.

There has been a shift from full-image synthesis to AI-assisted image editing of real photos, along with an increased prevalence of mixed-authenticity images containing subtle, hard-to-detect manipulations.

The blind spot: despite advances in detecting fully AI-generated synthetic images, current methods fall short in detecting AI-edited images and localizing the edits.

What We Built

To push the research forward, UL Research Institute’s Digital Safety Research Institute (DSRI) constructed the Image Edit Detection and Localization Challenge using an entirely novel dataset of manipulated images sourced from our data engine.

The dataset includes:

Splicing
Traditional editing
Fully synthetic images
AI-edited images
Real originals

Participants were tasked with delivering image manipulation analytics that DSRI evaluated against a private evaluation dataset. Participants were given:

No training data
No information about the evaluation data

We hosted the competition on Dyff, DSRI’s AI verification platform, and evaluated entries across three core capabilities.

Evaluation Metrics

Detection: Real vs. manipulated (balanced accuracy)
Classification: Type of manipulation (multiclass balanced accuracy / average recall)
Localization: Where the manipulation occurred (intersection over union, IoU)

Who Participated

Seven teams from six universities and labs across the U.S., Europe, and Asia participated:

The Catholic University of America (USA)
Sungkyunkwan University (South Korea)
University of Science, VNU-HCM (Ho Chi Minh City, Vietnam)
University of Catania (Italy)
Binghamton University (USA)
Politecnico di Milano (Italy)

What We Learned

DSRI’s Image Edit Detection and Localization Challenge provided the first large-scale blind evaluation of image manipulation detection, classification, and localization on private, purpose-built datasets.

The results reveal both progress and critical gaps.

Key Results

The top team achieved:
- 0.95 balanced accuracy for detection
- 0.88 balanced accuracy for classification
Most teams performed well on fully synthetic images (≥ 0.80 balanced accuracy)
Locally synthesized content remains nearly invisible:
- 6 of 9 teams achieved < 0.10 recall on this class
Localization remains the frontier challenge:
- Even the best systems overlapped only ~15% of actual edited regions on average
- Distinguishing locally edited from locally synthesized content proved fundamentally difficult, with most teams near chance performance

Additional Observations

Spliced edits are easier to localize than AI-generated edits
Smaller manipulations are often easier to localize than larger ones
A simple baseline was surprisingly competitive on localization despite weaker detection, suggesting detection and localization require fundamentally different approaches

The challenge establishes a clear research roadmap: the field must move beyond binary detection toward fine-grained classification and precise localization of AI-generated local edits — the manipulation type most relevant to real-world scenarios.

Results at a Glance

Detection and classification are feasible
Top performers reached 0.95 balanced accuracy (detection) and 0.88 average recall (classification)
Localization is fragile
Best IoU on local manipulations reached only ~0.38, with most systems far lower
Edit type matters
Hand-edited images (splices and traditional edits) were easier to localize; AI edits were the hardest
Edit size matters
Small edits (<15%) were often easier to localize than large ones, as large edits dilute boundary signals

What’s Next

We are convening a participant roundtable to exchange technical insights and gather feedback on the challenge.

For future research challenges, visit:
https://dsri.org/challenges