HBCU-UP-TIP Summer (2016) Research Program
Biomathematics and Bioinformatics

2016 summer research

(L to R): Dr. Qing Xia Lee, Dr. Sanjukta Hota, Jerry Brown, Lynnette Robinson, Jiah Toms, Sunteasja Bowen , Sharee Brewer, Shelethia Jordan, Dr. Steven Damo, Dr. Phylis Freeman-junior, Dr. Brian Nelms and Dr. Lei Qian

Abstracts of Summer Research Projects:

Investigating Dopamine Neuron Genes linked to Human Disorder

2016 summer research: Jiah TomsJiah Toms
Biology & Spanish Major
Advisor: Dr. Brian Nelms, Department of Biology
Abstract: Dopamine is a neurotransmitter that controls the central nervous system. Its normal function is to control the frontal lobes of the brain. Studies prove it to have a relationship with human disorders, such as Parkinson’s Disease, ADHD, Drug Addiction, Schizophrenia, and Depression. The purpose of my research is to determine the correlation between genes found in RNA-Seq to be enriched in C. elegans dopamine neurons with either predicted dopamine neuron functions and or connections to human disorders, such as Parkinson’s Disease, ADHD, Schizophrenia, depression, and drug addiction. We use C. elegans as a model system because of their relative conservative genes that relate worms to humans.  Genes that we find to be related to dopamine functions and human disorders will then be tested in C. elegans by comparing worms without these genes to normal worms.  We have examined a list of 534 differentially expressed genes generated by comparing RNA-Seq data from isolated dopamine neurons and whole worms. We have various bioinformatics tools to find out what genes link to dopamine and human disorders. Some promising genes that we have identified include cpx-1 (Complexin 1), faah-5 (Fatty acid amide hydrolase), K03B4.4 (GTPase-activating protein SynGAP), and C33A12.4 (Ubiquitin carboxyl-terminal hydrolase 28 isoform X12).  From this research we hope to find the relationship between dopamine neuron genes that are linked to human disorders.

Estimating the Parameters and Computing the Reproduction Number, R0, for Ebola Epidemics

2016 summer research: Jerry Brown IVJerry Brown IV
Department of Biology
Advisor: Dr. Sanjukta Hota, Department of Mathematics
Abstract: The Ebola Hemorrhagic Fever, simply known today as the Ebola Virus, has caused some of the worst recorded epidemics in history. This virus can be transmitted through contact with bodily fluids or unsanitary hospital equipment, such as needles and gloves. The research I have conducted involved analyzing the existing CDC data of Ebola epidemic in Sierra Leonne, Liberia and Guinea, three of the most fatal outbreaks of 2014 and estimating the important  parameters associated with the outbreak. While conducting this research it was ideal to determine whether or not the disease would reach an epidemic state. Based on the studies by Althaus and Chowell, the parameters were calculated and the basic reproduction number was estimated.


Machine Learning: Predicting Secondary Structure of Proteins

2016 summer research: Sharee BrewerSharee Brewer
Department of Biology
Advisor: Dr. Lei Qian, Department of Computer Science
Abstract: Proteins are key biological macromolecules essential for the structure, function, and regulation of the body’s tissues and organs. Proteins consist of long chains of amino acids, referred to as residues, that fold and bind together due to the chemical attraction between one another. Folding of the macromolecule determines not only the shape of the molecule, but also functionality.  Proteins have 4  structural levels: primary, secondary, tertiary, and quaternary. Prediction of the protein’s 3D structural relies heavily on the accurate prediction of the secondary structure of the biological molecule. The goal of the project is to develop a new algorithm by utilizing machine learning methods to improve the prediction accuracy of secondary structures of proteins. This will be achieved by utilizing datasets obtained from “Context-Based Features Enhance Protein Secondary Structure Prediction Accuracy” by authors Ashraf Yaseen and Yaohang Li. The goal is to achieve at least 85% accuracy in the testing results. Machine learning methods will include python packages such as Support Machine Vectors, Stochastic Gradient Descents, Numpy and Google’s TensorFlow.

A Mathematical Model For The Ebola Transmission Dynamics

2016 summer research: Lynnette RobinsonLynnette Robinson
Mathematics Major
Advisor: Dr. Sanjukta Hota, Department of Mathematics
 Abstract: Ebola is a deadly disease that has affected multiple countries in West Africa and still persists. This disease was discovered in 1976 and since then has had many different outbreaks in different countries. There have been 31,077 infections and 12,922 deaths to date. The 2014 outbreak severely affected 3 countries in West Africa which were Guinea, Liberia, and Sierra Leone.

In this project a SEIRQD model was developed with six nonlinear differential equations that described what happens to the population at each stage of Ebola epidemics. The system was solved using Python and plots were generated showing the profile and phases of the outbreak.  Stability analysis about the equilibrium points was performed and the basic reproduction number, , was computed. The value of  obtained through the model was compared against the value obtained from 2014-outbreak data from the Centers for Disease Control and Prevention (CDC). The variation in values of  was investigated by changing different parameter values of the model. It was found that an increase in the contact rate resulted into an increase in the value of  and an overall decrease in population.

Ebola Virus Dynamics and Immune System Response to Ebola Infection

2016 summer research: Sunteasja BowenSunteasja Bowen
Biology Major
Advisors: Dr. Sanjukta Hota
 Abstract: Ebola, also known as Ebola hemorrhagic fever, is a very infectious and generally fatal disease. Caused by an infection from the Filoviridae family, it is characterized by organ failure, fever, and severe internal bleeding. The Ebola Virus Diseases (EVD)  has a 90% fatality rate. It is known for its quick spread, usually through direct contact with bodily fluids, specifically blood, feces, and vomit. Humans generally develop symptoms within 8 to 10 days. My goal of this project is to investigate the immune system response to the Ebola virus and analyze the dynamics of Ebola quantitatively by developing a SIER-type mathematical model. Python programming language is used for numerical simulation of the system of differential equations in order to investigate the immune response in the presence of the Ebola virus. The data is collected from the CDC, WHO, and the 2014 West Africa outbreak and analyzed to understand the Ebola virus dynamics and the impact of the reproduction number in predicting the transmission of Ebola.

Fertility: Predicting Seminal Quality with Artificial Intelligence Methods

2016 summer research: Shelethia T. JordanShelethia T. Jordan
Department of Biology
Advisors: Dr. Lei Qian, Department of Computer Science
 Abstract: Researchers have found that fertility rates have drastically decreased in the last two decades, particularly in men. It was hypothesized that environmental factors may have an effect on the lower fertility rates in men. In this research, multiple artificial intelligence techniques, including Random Forest, Artificial Neural Network, Support Vector Machine and Linear Discriminant Analysis, were tested. These techniques were used to analyze the relations between environmental factors and the quality of semen.

The goal of this research was to predict the performance of fertility in males dependent upon many different environmental factors, and to analyze their impacts individually. The results of the original research showed that Multilayer Perceptron and Support Vector Machines show the highest accuracy, with prediction accuracy values of eighty-six percent. Similarly to the results of this research, Support Vector Machines and Artificial Neural Network show the highest accuracy, with prediction accuracy values of eighty-eight percent and eighty-five percent, both slightly higher than the original prediction accuracy value.

In contrast, Random Forest provided more of a visual approach that exhibited a slightly lower prediction accuracy value of sixty percent. To better analyze the relations between each environmental factor and the quality of semen, the Linear Discriminant Analysis was performed resulting in a prediction accuracy value of eighty percent. From the studied methods, Support Vector Machines, Artificial Neural Network and Linear Discriminant Analysis are the most accurate in the prediction, showing that of each environmental factor analyzed, the age and alcohol consumption of the infertile male has the most significant impact on their seminal quality.