CS Events
|
|||||||||||
|
|
|||||||||||
Research TalkGeneralizability and Machine Learning in Biomedicine |
|
||
Tuesday, November 04, 2025, 10:30am - 12:00pm |
|||
Speaker: Dr. Lanjing Zhang, Rutgers Department of Chemical Biology
Location : CoRE 301
:
Event Type: Research Talk
Abstract: There are often performance differences between intra-dataset and cross-dataset tests in machine learning (ML) modeling of biomedical data, as well as those between applying ML to all samples and a subset of them (e.g., all versus older patients). However, reducing these differences may reduce ML performances. It is thus a challenging dilemma to develop models that excel in intra-dataset testing and are generalizable to subset-sample or cross-dataset testing. Therefore, we propose a multi-criteria framework to 1). Improve ML fairness in classifying multicategory cause of deaths in cancer patients; and 2). Understand and improve performance and generalizability of ML in intra-dataset and cross-dataset testing. Among the colorectal cancer patients (n=515) of various age, sex, and racial groups in the TCGA data, all ML models exhibited biases for these sociodemographic groups. Methods to optimize model performance, including testing the model on merged groups and others, show the potential to reduce disparities in model performance for different groups. Importantly, both robust Analysis of Variance (ANOVA) and Kruskal–Wallis tests consistently identified differentially expressed genes as one of the most influential factors in both cancer types. The proposed multi-criteria framework successfully identified the model that achieved both the best cross-dataset performance and similar intra-dataset performance. In summary, generalizing ML performance is challenging, as evidenced by ML biases in classifying a subset of samples and lower performance in cross-dataset testing. We seem able to develop methods to improve ML fairness and generalizable ML performance. Specifically, ML performance distributions significantly deviated from normality, which motivates using both robust parametric and non-parametric statistical tests. We also quantified and provided possible exploitability on the factors associated with cross-dataset performances and generalizability of ML models in two cancer types. A multi-criteria framework was developed and validated to identify the models that are accurate and consistently robust cross datasets
:
Contact Professor Jie Gao
Zoom link:
https://rutgers.zoom.us/j/96284815623?pwd=vNqxiPxmeOLj9vQO3Bn8yaM0GdBGaM.1
Subscribe to RSS Feed