Skip to main content

2021 Regeneron International Science and Engineering Fair FINALISTS ABSTRACTS




Using Machine Learning to Repurpose FDA-Approved Drugs to Treat Cancers and Inflammatory Diseases

Aakarsh Vermani, Grade 11, Westview High School



P38 alpha (MAPK 14) is a protein kinase implicated in the pathological mechanisms of BAG3 P209L myofibrillar myopathy, cancers, and inflammatory diseases like Alzheimer’s and rheumatoid arthritis. Traditional drug discovery methods are slow, costly, and have been unable to create effective and safe p38 inhibitors to treat these diseases. This project addressed these shortcomings by using machine learning to elucidate potential p38 blockers from FDA-approved drugs. It was hypothesized that the predicted inhibitors would show a significantly higher binding affinity for p38 than a control group of random FDA-approved compounds.



Molecular Dynamics Simulation of Novel Klebsiella pneumoniae Treatment Disrupting the Outer Membrane

Eleanor Jung, Grade 10, Mt. Carmel High School



Klebsiella pneumoniae exhibits among the highest rates of antibiotic resistance and is currently the main cause of carbapenem resistant infections, yet there are no clinically effective treatments available for many who contract the Gram-negative bacterium. Artilysins, proteins each composed of an endolysin and a lipopolysaccharide (LPS) degrading peptide fused together, are a possible alternative to antibiotics for Gram-negative bacterial infections. In an earlier phase of this project, an Artilysin to target K. pneumoniae was designed and a model was created of its endolysin portion. In the current stage of this project, a computational model of the rest of the Artilysin is created, validated, and used to demonstrate stability in water at body temperature. Because fusing domains can bring about unforeseen changes to protein behavior, molecular dynamics is used to provide insight to mechanisms at the molecular level. However, with accuracy, size, and simulation time comes an increasingly expensive computational cost. To evaluate the efficacy of the designed Artilysin before investing a large amount of resources, the Artilysin is simulated above a model of an outer membrane resembling that of K. pneumoniae for 2 microseconds at body temperature. This is compared against two controls, one with the endolysin above the outer membrane and the other with only the outer membrane. Compared to the endolysin, the Artilysin is shown to disrupt membrane density deeper into the bilayer. Further investigation reveals changes in specific bonds in the LPS backbone and carbon tails that accompany the density changes.



Detection of Arsenic Contamination Using Satellite Imagery and Machine Learning

Ayush Agrawal, Grade 11, Canyon Crest Academy



Arsenic, a WHO-classified carcinogen and neurotoxin, has gained a widespread infamous reputation, especially after the Flint water crisis. Arsenic is very stable and does not easily degrade in the natural environment, causing it to easily accumulate in soil and inevitably leak into both water supplies, affecting over 200 million people globally. Unfortunately, current arsenic detection methods require extensive manual labor, tens of thousands of dollars of equipment, and multiple days of processing time.

In this study, a back propagation neural network (BPNN) was tested to determine if environmental arsenic contamination could be detected and predicted from VNIR and SWIR hyperspectral data from NASA’s EO-1 Hyperion satellite, with linear regression (PLSR) as a control. If successful, this would provide a simple, rapid, low-cost, labor-free, versatile, and efficient approach for arsenic detection. The neural network establishes a relationship between the soil’s hyperspectral data and arsenic content, then predicting contamination based on training and validation. Since satellite data contains a lot of noise and irregularities due to various factors, the data was preprocessed based on certain atmospheric windows as well as with Kernel Principal Component Analysis (kPCA), Continuum Removal (CR), and Savitzky-Golay filtering (S-G) with Second Derivative (SD) transformation.

Results were compared with commonly used models and indicated success for the BPNN model (R2 = 0.8741 and RMSE= 0.0833), achieving significant accuracy despite the generally poor quality of remote sensing data and underdeveloped hyperspectral technology. Overall, these results prove that such a methodology is scientifically practical and implementable for environmental contamination monitoring.



NEREID: Microplastic Detector Using Laser Microscopy and Image Processing Powered by the Raspberry Pi

Kyle Tianshi, Grade 9, The Cambridge School



Microplastics are a rapidly emerging contaminant in water sources. California is in the process of adopting a standardized testing method to monitor microplastics in drinking water (Bill SB1422-California Safe Drinking Water Act: Microplastics). Existing technologies such as dynamic light scattering, the turbidity meter, and the SDI kit are not able to efficiently measure microscopic solid particles that are less than 1 um, especially at very low concentrations.

The objective of this project is to develop a device that can characterize nanoscopic and microscopic particles in water. This portable detector can be used for researching microplastic contamination, industrial water quality control, and homes with no access to filtration systems.

NEREID uses 405 nm and 532 nm lasers to illuminate solid particles in the water. A digital microscope records the video of illuminated particles, and an image processing algorithm in Python analyzes the particle characteristics, such as a histogram of the particle size and particle count distribution. NEREID can measure the shape and size of particles bigger than 10 um, and it can detect particles as small as 5 nm in concentrations as low as 40,000 counts/liter in under 10 seconds.

Using a 405 nm laser, the particle fluorescence emission can be analyzed using different color filter lenses, and different solid particle contaminants can be characterized. This is an advanced feature that can distinguish microplastics from other microscopic particles. Integrated with the Raspberry Pi, NEREID is compact and portable. The mockup costs $50 when purchasing individual parts from Amazon.



Evolution of the Cat’s Eye Nebula Revealed Through Morpho-Kinematic and Hydrodynamic Modeling

Ryan Clairmont, Grade 11, Canyon Crest Academy



The Cat’s Eye Nebula (NGC 6543) has a complex, bipolar morphology that is not readily explainable by the current theory of planetary nebula formation, the interacting stellar winds model. To clarify the structure of NGC 6543, I created a 3D, spatio-kinematic model to clarify its structure using an NII image from the Hubble Space Telescope and t position velocity diagrams obtained from the San Pedro Martir Kinematic Catalogue of Planetary Nebula. The model showed that NGC 6543 has a bipolar structure with an inner ellipsoid, ansae, and jets. I also created a hydrodynamic model of NGC 6543, which demonstrates that its complex morphology likely results from two separate wind events, high density, jet-like ejections, and a precessing collimated outflow. These features are indicative of a binary central star, and allow further refinement of the Interacting Stellar Winds model.



The Age of the Coma Ber Cluster Constrained Using Amateur Data

Andrew Li, Grade 10, Canyon Crest Academy



There is a disparity in the estimated age of the Coma Ber Cluster. A recent study used the turnoff point method to determine an age estimate of 800 Myr, significantly older than the conventional 400-650 Myr in the literature. However, the study used a sample size of four stars, all were evolved and not close to the turnoff point. The goal of this project is to use a larger population size, including less evolved stars to obtain a better estimate of the turnoff point and age. It is hypothesized that the result will align with the 800 Myr estimate.

Procedures: A telescope was used with a phone and colored filters to take photographs of the cluster in two wavelengths. The images were analyzed in APT (NASA’s Aperture Photometry Tool) to get the fluxes. Then this data was processed, from which the individual magnitudes and B-V indexes, turnoff mass, and overall age of the cluster were determined.

Results: Fluxes and magnitudes were derived for three observing runs of a total of 42 stars. The mean turnoff B-V index is -0.053. The mean turnoff point mass is 2.64 M☉. The final age estimate is 893 (+363 -330) Myr, taking the mean of three results from the observing runs and adjusting the error bars to be consistent with all three results.

Conclusions: This project’s age estimate confirmed the older age of the Coma Ber Cluster. The result of 893 Myr is consistent with the result of Tang et al. (2018) and significantly older than the conventional 400-650 Myr estimate. This disparity is probably due to better measurements of the Hyades cluster (Brandt and Huang 2015), of which the age of the Coma Cluster was extrapolated from, as well as the discovery of tidal interaction in the cluster (Tang et al. 2019). This also presents implications for the formation of the Sun, as the Sun is believed to have formed in an open cluster similar to the Coma Ber Cluster. In addition, this project’s research shows that useful astronomical results can be extracted from amateur data.



Kawasak.AI: A Robust Deep Ensemble Network with Joint Detection for Differential Diagnosis of Kawasaki Disease

Ellen Xu, Grade 10, Del Norte High School



Kawasaki disease (KD) is the #1 acquired heart disease in children in the US, and its early diagnosis is crucial. However, a small and unbalanced dataset due to the uncommon nature of the disease can result in biased results when training deep learning models, which prevents the model from generalizing well for out-of-domain samples.

The goal of this project is to explore techniques for robust predictions and diagnosis. Data is collected for six key symptoms from publicly available images and a collaboration with the KD Foundation in an AI Photo Project, then adjudicated by KD experts. Random data augmentation by various transformations are used to generate a larger dataset. A transfer learning model based on VGG+ is constructed and tuned for KD diagnosis, combined with a weighted loss function to prevent minority class imbalance and biased results. In order to improve performance, techniques such as early stopping, batch size configuration, and learning rate adaptation are implemented, and metrics are chosen to provide unbiased and interpretable indicators of model performance across configurations.

The model reached 0.726 sensitivity, 0.827 specificity, 0.789 accuracy, 0.856 ROC AUC, and 18.63 DOR averaged across six key symptoms. It is determined that random data augmentation, dynamic weight balancing for unbalanced dataset handling, and K Fold Cross Validation (averaged across 10 folds) were able to produce the most optimal and reliable results. A combination of these techniques achieved promising results for KD early diagnosis, and additional explorations of ensemble models are on-going.