The goal of HistomicsML is to create tools that enable end users to train machine learning algorithms for whole-slide image (WSI) analysis. Labeled data is the essential ingredient for building powerful machine-learning algorithms, and HistomicsML aims to help users identify the most valuable samples for labeling within massive WSI datasets using a combination of visualization techniques and active learning. This project combines fundamental research in machine learning with high-performance computing and human-computer interaction to achieve these goals.

Active learning seeks labels for difficult samples to iteratively improve classifier performance. This human-in-the-loop training has been shown to significantly reduce the amount of data needed to build accurate classifiers. In WSI datasets this requires scanning tens of millions of image regions in near real-time. In addition to building software systems for active learning, we are researching new methods for selecting samples for labeling and for calculating features from large volumes of unlabeled data.


Sanghoon Lee, PhD
Postdoctoral Researcher
Lee Cooper, PhD
Principal Investigator


Lee S, Amgad M, Mobadersany P, McCormick M, Pollack BP, Elfandy H, Hussein H, Gutman DA, Cooper LAD. Interactive classification of whole-slide imaging data for cancer researchers. Cancer Research. 2021 Feb 15;81(4):1171-7.

Nalisnik M, Amgad M, Lee S, Halani SH, Vega JE, Brat DJ, Gutman DA, Cooper LA. Interactive phenotyping of large-scale histology imaging data with HistomicsML. Scientific reports. 2017 Nov 6;7(1):1-2.


Multiscale Framework for Molecular Heterogeneity Analysis
NLM KLM011576A