SI(FS)2: Fast simultaneous instance and feature selection for datasets with many features

García Pedrajas, Nicolás; Romero-del-Castillo, Juan A.; Cerruela García, Gonzalo

dc.contributor.author	García Pedrajas, Nicolás
dc.contributor.author	Romero-del-Castillo, Juan A.
dc.contributor.author	Cerruela García, Gonzalo
dc.date.accessioned	2024-01-24T17:09:58Z
dc.date.available	2024-01-24T17:09:58Z
dc.date.issued	2021
dc.identifier.issn	0031-3203
dc.identifier.uri	http://hdl.handle.net/10396/26740
dc.description.abstract	Data reduction is becoming increasingly relevant due to the enormous amounts of data that are constantly being produced in many fields of research. Instance selection is one of the most widely used methods for this task. At the same time, most recent pattern recognition problems involve highly complex datasets with a large number of possible explanatory variables. For many reasons, this abundance of variables significantly hinders classification and recognition tasks. There are efficiency issues, too, because the speed of many classification algorithms is greatly improved when the complexity of the data is reduced. Thus, feature selection is also a widely used method for data reduction and for gaining an understanding of feature information. Although most methods address instance and feature selection separately, the two problems are interwoven, and benefits are expected from performing these two tasks jointly. However, few algorithms have been proposed for simultaneously addressing the tasks of instance and feature selection. Furthermore, most of those methods are based on complex heuristics that are very difficult to scale up even to moderately large datasets. This paper proposes a new algorithm for dealing with many instances and many features simultaneously by performing joint instance and feature selection using a simple heuristic search and several scaling-up mechanisms that can be successfully applied to datasets with millions of features and instances. In the proposed method, a forward selection search is performed in the feature space jointly with the application of standard instance selection in a constructive subspace built stepwise. Several simplifications are adopted in the search to obtain a scalable method. An extensive comparison using 95 large datasets shows the usefulness of our method and its ability to deal with millions of instances and features simultaneously. The method is able to obtain better classification performance results than state-of-the-art approaches while achieving considerable data reduction.	es_ES
dc.format.mimetype	application/pdf	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.rights	https://creativecommons.org/licenses/by-nc-nd/4.0/	es_ES
dc.source	García-Pedrajas, N., Romero Del Castillo, J. A., & García, G. C. (2021). SI(FS)2: fast simultaneous instance and feature selection for datasets with many features. Pattern Recognition, 111, 107723. https://doi.org/10.1016/j.patcog.2020.107723	es_ES
dc.subject	Instance selection	es_ES
dc.subject	Feature selection	es_ES
dc.subject	Evolutionary algorithms	es_ES
dc.subject	K nearest neighbor rule	es_ES
dc.title	SI(FS)2: Fast simultaneous instance and feature selection for datasets with many features	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.relation.publisherversion	https://doi.org/10.1016/j.patcog.2020.107723	es_ES
dc.relation.projectID	Gobierno de España. PID2019-109481GB-I00	es_ES
dc.relation.projectID	Junta de Andalucía. UCO-1264182	es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES

Ficheros en el ítem

Nombre:: si_fs.pdf
Tamaño:: 920.8Kb
Formato:: PDF

Ver/

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem