• español
    • English
  • English 
    • español
    • English
  • Login
View Item 
  •   DSpace Home
  • Producción Científica
  • Departamento de Ciencia de la Computación e Inteligencia Artificial
  • DIAN-Artículos, capítulos, libros...
  • View Item
  •   DSpace Home
  • Producción Científica
  • Departamento de Ciencia de la Computación e Inteligencia Artificial
  • DIAN-Artículos, capítulos, libros...
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

SI(FS)2: Fast simultaneous instance and feature selection for datasets with many features

Thumbnail
View/Open
si_fs.pdf (920.8Kb)
Author
García Pedrajas, Nicolás
Romero-del-Castillo, Juan A.
Cerruela García, Gonzalo
Publisher
Elsevier
Date
2021
Subject
Instance selection
Feature selection
Evolutionary algorithms
K nearest neighbor rule
METS:
Mostrar el registro METS
PREMIS:
Mostrar el registro PREMIS
Metadata
Show full item record
Abstract
Data reduction is becoming increasingly relevant due to the enormous amounts of data that are constantly being produced in many fields of research. Instance selection is one of the most widely used methods for this task. At the same time, most recent pattern recognition problems involve highly complex datasets with a large number of possible explanatory variables. For many reasons, this abundance of variables significantly hinders classification and recognition tasks. There are efficiency issues, too, because the speed of many classification algorithms is greatly improved when the complexity of the data is reduced. Thus, feature selection is also a widely used method for data reduction and for gaining an understanding of feature information. Although most methods address instance and feature selection separately, the two problems are interwoven, and benefits are expected from performing these two tasks jointly. However, few algorithms have been proposed for simultaneously addressing the tasks of instance and feature selection. Furthermore, most of those methods are based on complex heuristics that are very difficult to scale up even to moderately large datasets. This paper proposes a new algorithm for dealing with many instances and many features simultaneously by performing joint instance and feature selection using a simple heuristic search and several scaling-up mechanisms that can be successfully applied to datasets with millions of features and instances. In the proposed method, a forward selection search is performed in the feature space jointly with the application of standard instance selection in a constructive subspace built stepwise. Several simplifications are adopted in the search to obtain a scalable method. An extensive comparison using 95 large datasets shows the usefulness of our method and its ability to deal with millions of instances and features simultaneously. The method is able to obtain better classification performance results than state-of-the-art approaches while achieving considerable data reduction.
URI
http://hdl.handle.net/10396/26740
Fuente
García-Pedrajas, N., Romero Del Castillo, J. A., & García, G. C. (2021). SI(FS)2: fast simultaneous instance and feature selection for datasets with many features. Pattern Recognition, 111, 107723. https://doi.org/10.1016/j.patcog.2020.107723
Versión del Editor
https://doi.org/10.1016/j.patcog.2020.107723
Collections
  • Artículos, capítulos, libros...UCO
  • DIAN-Artículos, capítulos, libros...

DSpace software copyright © 2002-2015  DuraSpace
Contact Us | Send Feedback
© Biblioteca Universidad de Córdoba
Biblioteca  UCODigital
 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

LoginRegister

Statistics

View Usage Statistics

De Interés

Archivo Delegado/AutoarchivoAyudaPolíticas de Helvia

Compartir


DSpace software copyright © 2002-2015  DuraSpace
Contact Us | Send Feedback
© Biblioteca Universidad de Córdoba
Biblioteca  UCODigital