Web acquired image datasets need curation: an examplar pipeline evaluated on Greek food images
Main Authors: | Vasileios Sevetlidis, George Pavlidis, Vasileios Arampatzakis, Chairi Kiourt, Spyridon Mouroutsos, Antonios Gasteratos |
---|---|
Format: | Proceeding Journal |
Bahasa: | eng |
Terbitan: |
, 2021
|
Subjects: | |
Online Access: |
https://zenodo.org/record/5553705 |
Daftar Isi:
- Mining Web data to create AI-usable datasets, is still non-trivial. Unfortunately, despite the free data access, the formation of a dataset useful for machine learning applications cannot rely solely on a data mining phase. For any given query, the retrieved sample may include duplicated, misclassified or completely irrelevant content. The consequence of not “cleaning” those datasets is to end up with faulty, noisy and imbalanced datasets. Thus, curation is necessary, to tackle the variable degrees of inconsistency found on the retrieved samples. This paper suggests a pipeline consisting of state-of- the-art and off-the-shelf methods for curating an image dataset retrieved from the Web. As a case study, the pipeline is applied on expanding food datasets with currently uncategorized Greek dishes, leveraging information found in a specialized ontology, aiming at increasing the accuracy in food recognition applications.