CS Events

Flat View
By Year
Monthly View
By Month
Weekly View
By Week
Daily View
Today
Jump to month
Jump to month
Search
Search

Research Talk

Generalizability and Machine Learning in Biomedicine

 

Download as iCal file

Tuesday, November 04, 2025, 10:30am - 12:00pm

 

Speaker: Dr. Lanjing Zhang, Rutgers Department of Chemical Biology

Location : CoRE 301

Event Type: Research Talk

Abstract: There are often performance differences between intra-dataset and cross-dataset tests in machine learning (ML) modeling of biomedical data, as well as those between applying ML to all samples and a subset of them (e.g., all versus older patients). However, reducing these differences may reduce ML performances. It is thus a challenging dilemma to develop models that excel in intra-dataset testing and are generalizable to subset-sample or cross-dataset testing. Therefore, we propose a multi-criteria framework to 1). Improve ML fairness in classifying multicategory cause of deaths in cancer patients; and 2). Understand and improve performance and generalizability of ML in intra-dataset and cross-dataset testing. Among the colorectal cancer patients (n=515) of various age, sex, and racial groups in the TCGA data, all ML models exhibited biases for these sociodemographic groups. Methods to optimize model performance, including testing the model on merged groups and others, show the potential to reduce disparities in model performance for different groups. Importantly, both robust Analysis of Variance (ANOVA) and Kruskal–Wallis tests consistently identified differentially expressed genes as one of the most influential factors in both cancer types. The proposed multi-criteria framework successfully identified the model that achieved both the best cross-dataset performance and similar intra-dataset performance. In summary, generalizing ML performance is challenging, as evidenced by ML biases in classifying a subset of samples and lower performance in cross-dataset testing. We seem able to develop methods to improve ML fairness and generalizable ML performance. Specifically, ML performance distributions significantly deviated from normality, which motivates using both robust parametric and non-parametric statistical tests. We also quantified and provided possible exploitability on the factors associated with cross-dataset performances and generalizability of ML models in two cancer types. A multi-criteria framework was developed and validated to identify the models that are accurate and consistently robust cross datasets

Contact  Professor Jie Gao

Zoom link:
https://rutgers.zoom.us/j/96284815623?pwd=vNqxiPxmeOLj9vQO3Bn8yaM0GdBGaM.1