Quantifying intraleukemic heterogeneity through single-cell RNA sequencing


Introduction: Chronic Myelomonocytic Leukemia (CMML) is a heterogenous, lethal adult leukemia with a median survival of 34 months. This clonal hematopoietic malignancy is characterized persistent monocytosis and a limited number of genetic clonal abnormalities. The median age of diagnosis is 72 years of age and the only diseasemodifying, curative treatment is an allogeneic stem cell transplant, which is often forgone due to age-related comorbidities of CMML patients. Eventually, CMML will progress or transform to aggressive disease. Intraleukemic heterogeneity (ILH) at diagnosis could be a key determinant of this progression, as it informs about the genotypes and genotypes that are selected during this dynamic evolutionary process. We hypothesize that outcome and timescale of this evolutionary process are hallmarked by ILH. CMML patients offer a unique opportunity to identify molecular and cell-phenotypic predictors of evolution and therapeutic consequences of ILH in hematologic malignancies, because most patients can be longitudinally followed in a treatment naïve state before they progress. Materials and Methods: To develop the right toolkit to quantify evolutionary processes that involve many cellular subtypes, we first focused on leukemia patients’ bone marrow mononuclear cell samples, in comparison to healthy subjects, to describe ILH. Using single-cell RNA sequencing (scRNAseq) from publicly available datasets, we took the FPKM measurements for each transcript across individual cells and then used graph-based clustering on normalization-corrected FPKM values and compared the distributions over clusters and how clusters overall distributed across disease types/samples. We used a generalized diversity measure to quantify ILH, and found that it can characterize disease stage at the level of sub-population structures derived from scRNAseq data. Results and Discussion: We developed a pipeline that can be used to analyze scRNAseq data, via multi-sample normalization, clustering, and mathematical interpretation. Using publicly available data, we verified this platform with clinical data. To quantify ILH, we performed graph-based clustering on normalization-corrected FPKM values and used the cluster-structured data to calculate a generalized diversity index (GDI) over all values of q to reveal key diversity differences. This approach enabled us to distinguish between leukemic states based on the high-dimensional single cell patient samples. For low q, GDI represents the clonal richness, assuming that clusters of similar gene expression represent a ‘clone’. Intermediate values of q correspond to classical measures of sample diversity, such as the prominent Shannon index (H, q=1) often used in oncology and bioinformatics to analyze tumor evolution and single cell tumor imaging data. Our analyses show that diversity can be very similar around q=1; H can therefore be a problematic diversity indicator. However, GDI at a range of q, point to differences in the number of major drivers of tumor evolution, possibly prior to detection/sampling. Conclusions: We demonstrated the utility of scRNAseq based diversity scores to interpret cellular heterogeneity, providing and testing a pipeline to calculate GDI (qD). This measure is overall elevated in disease states and may be elevated in highly resistant leukemia. Using the previously published data, without further processing, our metric was able to accurately distinguish cancerous from healthy tissue samples, and could robustly separate other biological conditions. Our analysis shows how structured scRNAseq data can become useful when clinical sampling is combined with computational analyses and mathematical modeling.

2019 BMES Annual Meeting
Philadelphia, PA, USA


Ferrall-Fairbanks MC, Ball MC, Letson CT, Padron E, Altrock PM

Presented at BMES on Thursday, October 17, 2019 at 9:15am in the Computational Models of Cancer Session in Room 122B.