• español
    • English
  • English 
    • español
    • English
  • Login
View Item 
  •   DSpace Home
  • Producción Científica
  • Artículos, capítulos, libros...UCO
  • View Item
  •   DSpace Home
  • Producción Científica
  • Artículos, capítulos, libros...UCO
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Distributed multi-label feature selection using individual mutual information measures

Thumbnail
View/Open
article_ventura_1.pdf (1.263Mb)
Author
González López, Jorge
Ventura Soto, S.
Cano, Alberto
Publisher
Elsevier
Date
2020
Subject
Multi-label learning
Feature selection
Mutual information
Distributed computing
Apache spark
METS:
Mostrar el registro METS
PREMIS:
Mostrar el registro PREMIS
Metadata
Show full item record
Abstract
Multi-label learning generalizes traditional learning by allowing an instance to belong to multiple labels simultaneously. This causes multi-label data to be characterized by its large label space dimensionality and the dependencies among labels. These challenges have been addressed by feature selection techniques which improve the final model accuracy. However, the large number of features along with a large number of labels call for new approaches to manage data effectively and efficiently in distributed computing environments. This paper proposes a distributed model to compute a score that measures the quality of each feature with respect to multiple labels on Apache Spark. We propose two different approaches that study how to aggregate the mutual information of multiple labels: Euclidean Norm Maximization (ENM) and Geometric Mean Maximization (GMM). The former selects the features with the largest -norm whereas the latter selects the features with the largest geometric mean. Experiments compare 9 distributed multi-label feature selection methods on 12 datasets and 12 metrics. Results validated through statistical analysis indicate that ENM is able to outperform the reference methods by maximizing the relevance while minimizing the redundancy of the selected features in constant selection time.
URI
http://hdl.handle.net/10396/33731
Fuente
Gonzalez-Lopez, J., Ventura, S., & Cano, A. (2019). Distributed multi-label feature selection using individual mutual information measures. Knowledge-Based Systems, 188, 105052. https://doi.org/10.1016/j.knosys.2019.105052
Versión del Editor
https://doi.org/10.1016/j.knosys.2019.105052
Collections
  • DIAN-Artículos, capítulos, libros...
  • Artículos, capítulos, libros...UCO

DSpace software copyright © 2002-2015  DuraSpace
Contact Us | Send Feedback
© Biblioteca Universidad de Córdoba
Biblioteca  UCODigital
 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

LoginRegister

Statistics

View Usage Statistics

De Interés

Archivo Delegado/AutoarchivoAyudaPolíticas de Helvia

Compartir


DSpace software copyright © 2002-2015  DuraSpace
Contact Us | Send Feedback
© Biblioteca Universidad de Córdoba
Biblioteca  UCODigital