The iResearch Institute is the leader of national and international science research training!

iResearch Institute 2020 Student Highlights

Ethan Horowitz

CET-CNN: Modular Hierarchical Image Classification Using Conditional-Execution Tree CNNs
Mentor: David Xu, Columbia University

Sequential neural network classifiers overgeneralize between classes, therefore a tree-structured architecture was designed - Conditional-Execution Tree Convolutional Neural Network (CET-CNN) - enabling more specialized classification using subunits that classify and process inputs to pass to more subunits. CET-CNN-A subunits performed multi-class classification while CET-CNN-B subunits performed binary classification. CET-CNNs were compared to sequential networks and run with full, conditional, and single-path execution. CET-CNN-B’s achieved a higher accuracy than CET-CNN-A’s and sequential networks. CET-CNN-B’s and CET-CNN-A’s were most accurate with full-execution and single-execution, respectively. CET-CNN-B’s improved accuracy over sequential networks by 8.25%. Single-path execution reduced effective network size by up to 81%.

Angela Mao

Analyzing Water Contaminants Through Image Processing of Chlorophyll-a
Mentor: Anjali Chadha, Massachusetts Institute of Technology (MIT)

Remote sensing of water quality parameters and contaminants is a cheap and efficient alternative to traditional in-situ water sampling methods. Sensors on satellites detect the reflectance of these parameters at various wavelengths. The pigment chlorophyll-a is one of the most common parameters that can be remotely sensed. Since chlorophyll-a is prevalent in cyanobacterial blooms, it is often used to track and detect them. Many water contaminants cannot be directly tracked because they lack optical properties that satellites can detect. Satellite imagery of chlorophyll-a from the MODIS-Aqua satellite was analyzed using NASA’s SeaDAS and the image processing software ImageJ to determine trends in chlorophyll-a concentration from 2008 to 2020. A generalized additive model (GAM) was used to examine the relationship between chlorophyll-a and six different water contaminants: cadmium, arsenic, lead, ammonia nitrate, nitrate, and phosphorus. Significant relationships (p<0.05) were found between chlorophyll-a and cadmium, arsenic, lead, and ammonia nitrate. When monitoring the water quality of a body of water, only sampling areas where remote sensing indicates a high chance of the presence of a certain water contaminant will conserve resources.

Miah Margiano

Synergy of Treatment: Therapeutic Vulnerabilities in Small Cell Lung Cancer
Mentor: Madhav Subramanian, Washington University in St. Louis

Small cell lung cancer (SCLC) is the most aggressive form of cancer with a 5-year survival rate of <7%. Its high mortality rate is attributed to early metastasis and the development of treatment resistance to chemotherapeutic and radiation regimens. Despite the development of target therapeutics during the last few years, including talazoparib, prexasertib, and immunotherapy, progress has remained minimal due to a poor understanding of the underlying mechanisms that contribute to treatment resistance. The aim of the present study was to identify target genes that promote cancer survival and treatment resistance following talazoparib or prexasertib treatment and to analyze the relationship between the target genes, immune cell proportions, and PD-L1 expression, which may be exploited in a synergistic treatment modality. Ten single-cell RNA-seq datasets from the study series, GSE138474, were obtained. The samples were characterized as talazoparib treated resistant (n = 9,914), prexasertib treated resistant (n = 7,617), or untreated (n = 16,181). A bulk RNA-seq dataset containing 80 individuals was analyzed to determine the association between target gene expression on survival, number of mutations, immune cell proportions, and PD-L1 expression. Bioinformatics analyses demonstrated that AKR1C2 and PRDX1 were upregulated post talazoparib and prexasertib treatment. Upon analysis, the lack of significant difference in the immune cell proportions and PD-L1 expression suggests that talazoparib and prexasertib may not lead to better prognosis when synergized with immunotherapy. However, direct studies are needed to identify potential biomarkers affecting immunotherapy response and investigate the phenotype diversity that may lead to different response rates.

Emma Wang

Investigating Potential Agro-economic Benefits of Solar Pollinator Habitats
Mentor: Anjali Chadha, Massachusetts Institute of Technology (MIT)

American beekeepers lose approximately 42% of their colonies annually, affecting hundreds of pollination-dependent crops and costing the agro-economy $15 billion each year. This study analyzed the potential benefits of establishing solar-pollinator habitats in the U.S., as co-location of solar energy and agriculture can protect pollinators and crops and simultaneously produce energy. Crops were classified by pollinator-dependence, and overlapping areas of solar facilities and cropland were measured. Highly pollinator-dependent crops were more economically valuable than less-dependent crops, and solar-agriculture overlap areas had enormous power capacities, showing the potential to provide clean energy for millions while providing habitats for plant and pollinator species.


Satvik Dasariraju

Detection and Classification of Immature Leukocytes for Diagnosis of Acute Myeloid Leukemia Using Random Forest
Hometown: Lawrenceville, NJ
Mentor:
 Marc Huo, Stanford University

Acute Myeloid Leukemia (AML) is a fatal blood cancer that must be detected early for effective treatment, but the diagnosis is time-consuming and inaccurate. This study presents a machine learning model capable of automatic screening for immature blood cells, which are a strong sign of AML, based on geometric and color features of cells. The proposed model detected immature cells with 92.99% accuracy and classified immature cells into four types with 93.45%, demonstrating that the model can be an efficient support tool for clinicians diagnosing AML.

Satvik Elayavalli

Link Between Systemic Lupus Erythematosus and Diffuse Large B Cell Lymphoma to Identify common targets for therapy
Hometown: Bengaluru Karnataka, India
Mentor: Kendra Zhang, Columbia University

The NF-κB signaling pathway has been explored in Systemic Lupus Erythematosus (SLE), an autoimmune disease, and Diffuse Large B Cell Lymphoma (DLBCL), an aggressive non-Hodgkin’s lymphoma. This pathway synthesizes pro-inflammatory proteins in SLE and anti-apoptotic proteins in DLBCL. Datasets obtained from GEPIA2, STRING, and mirNet were analyzed using Cytoscape and highlighted microRNA 21 (mir-21) as a potential therapeutic target due to its inhibitory effects. miR-21 inhibits the gene PTEN, which activates the PI3K/AKT pathway, subsequently activating the NF-κB pathway. These relationships have not been completely explored in SLE, suggesting a potentially novel importance for miR-21 as a therapeutic target.

Aaroosh Ramadorai

Searching for the host galaxy of FRB 181017
Hometown: Lexington, MA
Mentor: Kaitlyn Shin, Massachusetts Institute of Technology (MIT)

Fast radio bursts (FRBs) are millisecond-duration radio wave pulses of unknown astrophysical origins. Here, I search for the host galaxy of the FRB 181017, discovered by the Molonglo Observatory Synthesis Telescope (UTMOST). Using comparisons with other FRBs, I propose a neutron star to be a likely origin for this FRB. I then use publicly available code and data to constrain the FRB’s redshift and celestial coordinates to obtain a sample of 13 host candidates. I consider 4 galaxies as likely candidates due to their high Hα fluxes and similar sizes to the Milky Way—both of which may imply high star formation rates, and therefore large neutron star populations from which FRB 181017 could originate.

Ryan Rudes

A Simple System for Strengthening Visual Representations
Hometown: Dix Hills, NY
Mentor: David Xu, Columbia University

Semi-supervised learning involves pretraining models upon large collections of arbitrary data; thereafter, the knowledge acquired may be applied towards a downstream task, allowing a network to generalize well with a small quantity of pre-separated training data. The recent state-of-the-art approaches to self-supervised pretraining are predominantly encompassed by contrastive learning techniques. The current state-of-the-art, SimCLR, learns a generalized understanding of visual representations by making comparisons between transformed copies of each instance in a large image dataset. We investigate the benefits of applying a learning problem of progressively growing difficulty to this dominating approach. Specifically, we propose a mechanism which enables more explicit control over the strength of the data augmentation operator throughout training, allowing us to intrinsically disincentivize the temporary exploitation of non-generalizable features, and instead, enforce the gradual attainment of reliable feature information.

Katie Sie

Bioinformatic Analyses Determine Biomarkers for Resistive Small Cell Lung Cancer
Hometown: Oakland Gardens, NY
Mentor: Madhav Subramanian, Washington University at St. Louis (WashU)

Treatment refraction is a hallmark of small cell lung cancer (SCLC), necessitating further research to mitigate poor prognosis. Datasets of untreated and DNA-damage-repair-inhibitor treated samples were obtained. Gene set enrichment analysis (GSEA) was performed to identify upregulated pathways, elucidating resistive mechanisms in treated samples. Prominent genes in identified pathways were visualized in RStudio and analyzed in GEPIA2 for their impact on survival. Reactive oxygen species pathway and TGF Beta Signaling, whose genes are involved in evading apoptosis and promoting tumor growth, were upregulated. This study identifies novel genes that hold promising potential as therapeutic targets to resensitize tumors.

Sahand Adibnia

Modeling the Onset of Parkinson’s Disease: Dopamine Oxidation and Age-Dependent Neuromelanin Accumulation as Combined Multiclass Markers
Hometown: Dublin, CA
Mentor: Kendra Zhang, Columbia University

Parkinson’s disease is characterized by the loss of neuromelanin-containing dopaminergic neurons in the substantia nigra. Non-melanized dopaminergic neurons are relatively spared. Aminochrome has been suggested as the initial trigger of neurotoxicity in Parkinson’s. The genes NQO1 and GSTM2 metabolize aminochrome into non-neurotoxic compounds, which convert to neuromelanin. The non-melanized striatum exhibits higher expression of NQO1 and GSTM2 than the substantia nigra. This indicates that striatal aminochrome is metabolized without forming neuromelanin. NQO1 expression in the substantia nigra has a negative correlation with age, indicating that aminochrome levels increase with age. This could contribute to the increased incidence of Parkinson’s in the elderly.

Srihitha Dasari

Multiclass Classification of Alzheimer’s Disease: MRI Extraction of Cortical Volumetry and Inter-Cortical Ratios as Combined Markers
Hometown: Cumming, GA
Mentor: Marc Huo, Stanford University

Clinical diagnostic limitation and progressive severity of Alzheimer’s disease necessitates improved earlier detection for effective treatment administration. This study aimed to determine features with which the accuracy of a multiclass classification could increase, extracting inter-cortical volumetric ratios as a mode of normalization over traditional absolute volumes. Cortical tissues were segmented from pre-processed T1w MRIs and their volumes were extracted and used, along with computed ratios, in three multiclass classifiers: conventional volumes, proposed ratios, and combined features. The combination of raw volumes and ratios as features achieved the greatest accuracy and certain ratios were given more feature importance over absolute volumes, indicating promising uses of ratios as volumetric features to increase performance. The proposed algorithm included more classes (5) and achieved a greater accuracy (81.03%) than state-of-the-art supervised approaches, suggesting increased ability in delineating different stages and potentially greater efficacy in medication administration.


Isabelle Garcia-Fischer

Identifying the Potential Roles of Cadherins in a Defective Wnt/B-Catenin Pathway in Hypermobile Ehlers Danlos Syndrome
Hometown: Danbury, CT
Mentor: Swati Madankumar, Barnard College

Hypermobile Ehlers Danlos Syndrome (hEDS) is a rare connective tissue disorder with no known genetic biomarker. To identify possible genetic etiologies of hEDS, the functions of genes dysregulated in hEDS skin fibroblasts were correlated with known cellular phenotypes and symptoms of hEDS. The downregulation of cadherin 2 and upregulation of cadherin 11 may cause mechanical stress and inflammation, respectively, which may disrupt Wnt/B-catenin signaling transduction, ECM protein organization, and connective tissue homeostasis in likely multiple organs in hEDS patients. This study guides future in vitro experiments to confirm the roles of these genes that can potentially guide hEDS diagnosis.

Ethan Horowitz

CET-CNN: Modular Hierarchical Image Classification Using Conditional-Execution Tree CNNs
Hometown: Manhasset, NY
Mentor: David Xu, Columbia University

Sequential neural network classifiers overgeneralize between classes, therefore a tree-structured architecture was designed - Conditional-Execution Tree Convolutional Neural Network (CET-CNN) - enabling more specialized classification using subunits that classify and process inputs to pass to more subunits. CET-CNN-A subunits performed multi-class classification while CET-CNN-B subunits performed binary classification. CET-CNNs were compared to sequential networks and run with full, conditional, and single-path execution. CET-CNN-B’s achieved a higher accuracy than CET-CNN-A’s and sequential networks. CET-CNN-B’s and CET-CNN-A’s were most accurate with full-execution and single-execution, respectively. CET-CNN-B’s improved accuracy over sequential networks by 8.25%. Single-path execution reduced effective network size by up to 81%.

Katie Sie

Bioinformatic Analyses Determine Biomarkers for Resistive Small Cell Lung Cancer
Hometown: Oakland Gardens, NY
Mentor: Madhav Subramanian, Washington University at St. Louis (Wash U)

Treatment refraction is a hallmark of small cell lung cancer (SCLC), necessitating further research to mitigate poor prognosis. Datasets of untreated and DNA-damage-repair-inhibitor treated samples were obtained. Gene set enrichment analysis (GSEA) was performed to identify upregulated pathways, elucidating resistive mechanisms in treated samples. Prominent genes in identified pathways were visualized in RStudio and analyzed in GEPIA2 for their impact on survival. Reactive oxygen species pathway and TGF Beta Signaling, whose genes are involved in evading apoptosis and promoting tumor growth, were upregulated. This study identifies novel genes that hold promising potential as therapeutic targets to resensitize tumors.

Emma Wang

Investigating Potential Agro-economic Benefits of Solar Pollinator Habitats
Hometown: Manhasset, NY
Mentor: Anjali Chadha, Massachusetts Institute of Technology (MIT)

American beekeepers lose approximately 42% of their colonies annually, affecting hundreds of pollination-dependent crops and costing the agro-economy $15 billion each year. This study analyzed the potential benefits of establishing solar-pollinator habitats in the U.S., as co-location of solar energy and agriculture can protect pollinators and crops and simultaneously produce energy. Crops were classified by pollinator-dependence, and overlapping areas of solar facilities and cropland were measured. Highly pollinator-dependent crops were more economically valuable than less-dependent crops, and solar-agriculture overlap areas had enormous power capacities, showing the potential to provide clean energy for millions while providing habitats for plant and pollinator species.