Content-based image classification is a wide research field addressing the problem of categorizing images according to their content. A common way to approach content-based classification is through learning from examples --- a given class of images is described by means of a suitable training set of data. The main drawback of this approach is the fact that collecting data to build homogeneous training and validation sets is a boring and time consuming task, even if the Web can help providing a potentially inexhaustible source of images. In this paper we present a system to automatically download images from the Web and a selection of techniques useful to prune the images downloaded according to some criteria. These techniques work as filters at various degrees of complexity: some are simple measurements other are image classifiers themselves. We focus on two critical ones (monochrome vs color images and photos vs graphics) showing their effectiveness on a manually labeled validation set of data. We conclude the paper analyzing the overall performance of the system with an a posteriori analysis of the results obtained in a few run.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.