link: https://arxiv.org/pdf/2305.03706
github: https://github.com/ladwigd/Leaflet-Product-Classification/tree/main?tab=readme-ov-file
What is the paper about?
A new dataset containing snippets from product leaflets has been created. Basic image and text-based modelling has been done to predict the product-dependent class/category from each individual product information snippet. The ‘snippets’ look like each of the product information containers in the below image:

Dataset
41.6k product images in 832 classes obtained from leaflets.
Base leaflets had come from 132 different retailers from 2016 to 2022
Classes include items including food, beverages, household goods, cosmetics, pet foods, etc.
Each class has 40 images in training and 10 in test
No parsed text from images has been included
Why is this dataset interesting?
Baseline Models