ShelfApi

Automatic shelf bay detection for retail photos
Performance Report — 10-03-2026
279
Blind-tested photos
1.33%
Mean deviation
880
Verified training photos
~10ms
Inference time (GPU)

What does ShelfApi do?

ShelfApi analyses photos of retail shelves, endcap displays (kopstellingen), and checkout displays (kassameubels). It automatically detects the boundaries of the product area and masks the surroundings. This speeds up manual review and enables automated shelf analysis.

Pipeline

Step 1
Photo Upload
Shelf photo from the store
Step 2
AI Boundary Detection
CNN detects 8 boundary points
Step 3
Perspective Correction
Straighten angled photos
Step 4
Masking
Blur or remove surroundings
Technical detail: The model is an EfficientNet-B0 (CNN) that predicts 8 boundary points: 4 for left/right boundaries (x-axis, as % of width) and 4 for top/bottom boundaries (y-axis, as % of height). This supports angled boundaries in all directions. Inference takes ~10ms on GPU, ~50ms on CPU.

Dataset

MetricCount
Total photos in dataset3,814
Human-verified annotations880
Skipped (unusable)29
Available for future verification2,905

Photos are sourced from Roamler jobs across multiple categories: regular shelf bays, endcap displays (kopstellingen), and checkout displays (kassameubels).

Model Performance

Result: The latest model (v0.5, 8 outputs) achieves 1.33% mean deviation on the validation set. X-boundaries average ~2.0% error, Y-boundaries ~0.65% error. On 279 blind-tested photos, the mean deviation is 0.44%.

Error distribution

Per boundary point (8 per photo) — how far does the CNN prediction deviate from the human annotation?

< 1%
90%
1-2%
3%
2-3%
2%
3-5%
2%
5-10%
2%
> 10%
1%

Per boundary point

BoundaryMean deviationMedian
Left Top1.04%0.00%
Left Bottom0.79%0.00%
Right Top0.89%0.00%
Right Bottom0.80%0.00%
Top Left0.00%0.00%
Top Right0.00%0.00%
Bottom Left0.00%0.00%
Bottom Right0.00%0.00%

Training progression

Label quality has a dramatic impact. More data with noisy labels performs worse than less data with clean human-verified labels.

RoundTraining photosLabelsMean deviationImprovement
Round 12,814 (Gemini)Automatic5.07%Baseline
Round 2226Human-verified2.87%-43%
Round 3401Human-verified1.65%-68%
Round 4 (eval)496Human-verified1.42%-72%
Round 5596 + endcapsHuman-verified2.12%New category
Round 6780 + endcapsHuman-verified1.60%-68%
Round 7 (8-out)880 + checkoutHuman-verified1.33%-74%

Examples

Green lines = human annotation (ground truth). Red lines = CNN prediction. Darkened areas are masked out.

Perfect prediction (0% error)

31992991_FabricEnhancers_FullBay_2
Mean deviation: 0.0%
32000540_FabricCleaning_FullBay_2
Mean deviation: 0.0%

Good prediction (< 1% error)

30522176_Q56810027_3
Mean deviation: 0.2%
Human CNN prediction
30531317_Q56810027_3
Mean deviation: 0.2%
Human CNN prediction

Moderate prediction (2-5% error)

31069640_Q56810027_2
Mean deviation: 2.1%
Human CNN prediction
kop_174851_10105636_foto9_kopstelling9
Mean deviation: 2.1%
Human CNN prediction

Worst predictions

30385719_Q56810027_2
Mean deviation: 6.7%
Human CNN prediction
30387876_Q56810027
Mean deviation: 7.6%
Human CNN prediction

Conclusions & Next Steps