Reproducibility Crossroads: Impact of Statistical Choices on Proteomics Functional Enrichment

View/ Open
Author
Biełło, Karolina A.
Die, José V.
Amil-Ruiz, Francisco
Fuentes-Almagro, Carlos
Pérez Rodríguez, Javier
Olaya-Abril, Alfonso
Publisher
MDPIDate
2025Subject
ProteomicsMeta-analysis
Functional enrichment
Statistical methods
Reproducibility
METS:
Mostrar el registro METSPREMIS:
Mostrar el registro PREMISMetadata
Show full item recordAbstract
Quantitative proteomics relies on robust statistical methods for differential expression, critically impacting downstream functional enrichment. This meta-analysis systematically investigated how statistical hypothesis testing approaches and criteria for defining biological relevance influence functional enrichment concordance. We reanalyzed five independent label-free quantitative proteomics datasets using diverse frequentist (t-test, Limma, DEqMS, MSstats) and Bayesian (rstanarm) approaches. Concordance of Gene Ontology (GO) and KEGG pathways was assessed using Jaccard indices and correlation metrics, grouping comparisons by statistical test and biological relevance consistency. The results demonstrated highly significant differences in similarity distributions among the comparison groups. Comparisons varying only hypothesis testing methods (with constant relevance criteria, FC or Bayesian) showed the highest consistency. Conversely, comparisons with differing biological relevance criteria (or varied methodological choices) yielded significantly lower consistency, highlighting this definition’s critical impact on GO term overlaps. KEGG pathways displayed more uniform, method-insensitive concordance. Sensitivity analysis confirmed the findings’ robustness, underscoring that methodological choices profoundly influence functional enrichment outcomes. This work emphasizes the critical need for transparency and careful consideration of analytical decisions in proteomics research to ensure reproducible and biologically sound interpretations.
