Presenter Abstracts – DS.3 Data Science: Bioinformatics

Session Chair
Dr. James Denvir, Marshall University (WV INBRE)

Dr. Victor M. Bii,  Mississippi Valley State University 

A comprehensive data mining and bioinformatics-approach identify gene signatures that promote prostate cancer (PC) progression in African descent population

Janani Kunrathur Pasupathy and Victor M. Bii

Mississippi Valley State University, Itta Bena, MS


Prostate cancer (PC) is the second most common cancer type in the United States affecting men with projected 268,490 new cases and 34,500 deaths in 2022. It is estimated that one in seven African American (AA) males will develop PC in their lifetime. The AA patients are more likely to die from low-grade PC than Caucasians counterparts and have been shown to be widely underrepresented in most clinical trials. We hypothesize that the AA PC patients might harbor genetic signatures that drive cancer progression that are different from other racial groups. We analyzed the gene expression profiles by implementing comprehensive bioinformatics-based data mining approaches on RNAseq data of AA PC patients. In our study, we identified candidate differentially expressed genes that include ELOVL fatty acid elongase 2 (ELOVL2), Sorting nexin 31 (SNX31), crystallin beta B2 (CRYBB2), CROCC pseudogene 2 (CROCCP2) and mutS homolog 2 (MSH2). These gene signatures may identify potential pathways that promote cancer progression which could be potential biomarker or drug targets that would improve treatment outcomes in AA patients.

Dr. Heather Dunn, Clemson University

Artificial Intelligence Based Prediction to Analyze Tumor Tissue Morphologies

Department of Bioengineering, Clemson University, Clemson, SC

Women of African descent are disproportionately affected by breast cancer compared to women of European descent. In the USA, African American women have a 42% higher mortality rate, a higher incidence rate before the age of 40, and are most frequently diagnosed with triple negative breast cancer when compared to Caucasian women. Approximately 30% of breast cancer cases are associated with modifiable risk factors including the lack of physical activity, excess body weight, and alcohol consumption indicating some breast cancers may be preventable. While breast cancer disparities can partially be attributed to genetics and social determinants of health, tumor biology may also contribute to survival outcomes based on emerging evidence of ethnic variations in breast tumors. 

The primary aim of this project was to investigate potential differences of breast tumor morphologies across African American and Caucasian racial groups by utilizing machine learning (ML) and artificial intelligence (AI) methods. We utilized a supervised AI method to evaluate breast cancer tumors. The model was pre-trained and adapted via transfer learning on five different neural network models, which resulted in classification accuracies between 84% and 92%. We interpreted the model results using LIME and saliency mapping as the explainers. Based on the images from our bi-racial testing set, this study confirmed significant variations of tumor and extracellular matrix regions in the different racial groups evaluated. Therefore, further analysis and characterization beyond tumor morphologies may provide new insight of disparities associated with the incidence of breast cancer.

Dr. Swarna Kanchan, Marshall University

In-silico modelling and docking exploring therapeutical potentials of antivirals against Covid-19 Nsp4 and Nsp13 helicase

Swarna Kanchan, Minu Kesheri,  Travis B. Salisbury, and James Denvir
Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV

Introduction/Background. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) relies on non-structural proteins such as Nsp4 and Nsp13 helicase, the key components of replication-transcription complex (RTC) to complete its infectious life cycle.

Hypothesis/Goal of Study. Targeting these essential viral proteins with Food and Drug Administration (FDA) approved drug molecules and delving deep into the structural intricacies of SARS-CoV-2 Nsp4 and Nsp13 helicase shall aid in combating the disease pathogenesis.

Methods and Results. Protparam was used to predict physiochemical properties such as molecular weight, theoretical pI, instability index, aliphatic index and GRAVY of the Nsp4 protein as 56183.98 Da, 7.16, 34.09, 95.50 and 0.343 respectively. Similarly, for Nsp4 protein they were predicted as 66911.06 Da, 8.66, 30.30, 84.38, -0.127 respectively. CD search tool predicted the presence of only two domains named as cv_Nsp4_TM (cd21473) in Nsp4 spanning between residue numbers 14 to 394, while another domain named as Corona_NSP4_C (pfam16348) at C terminal spanning between residue numbers 406 to 484. Evolutionary conservation analysis using Consurf for Nsp4 predicted 50 amino acid residues as functional and exposed residues while 63 amino residues as structural and buried in Nsp4 and both were found highly conserved (conservation score: 9) in SARS COV-2 virus. The Covid-19 Nsp13 crystal structure taken from the protein databank and newly generated and validated 3D model of Nsp4 using threading were further used for virtual screening of the FDA approved antiviral drugs using autodock Vina. Pibrentasvir, Elbasvir Simeprevir were the common leads identified showing higher binding affinity to both Nsp13 helicase and Nsp4 as compared to the control inhibitors.

Discussion/Conclusions. Pibrentasvir, Elbasvir and Simeprevir elucidating higher binding affinity to both Nsp13 and Nsp4 as compared to the control inhibitors might serve as potential dual-target inhibitors.

Grant/Funding Support. INBRE Grant