Web acquired image datasets need curation: an examplar pipeline evaluated on Greek food images

Main Authors: Vasileios Sevetlidis, George Pavlidis, Vasileios Arampatzakis, Chairi Kiourt, Spyridon Mouroutsos, Antonios Gasteratos
Format: Proceeding Journal
Bahasa: eng
Terbitan: , 2021
Subjects:
Online Access: https://zenodo.org/record/5553705
Daftar Isi:
  • Mining Web data to create AI-usable datasets, is still non-trivial. Unfortunately, despite the free data access, the formation of a dataset useful for machine learning applications cannot rely solely on a data mining phase. For any given query, the retrieved sample may include duplicated, misclassified or completely irrelevant content. The consequence of not “cleaning” those datasets is to end up with faulty, noisy and imbalanced datasets. Thus, curation is necessary, to tackle the variable degrees of inconsistency found on the retrieved samples. This paper suggests a pipeline consisting of state-of- the-art and off-the-shelf methods for curating an image dataset retrieved from the Web. As a case study, the pipeline is applied on expanding food datasets with currently uncategorized Greek dishes, leveraging information found in a specialized ontology, aiming at increasing the accuracy in food recognition applications.