Major: Computer Science
“Using Structural Parameters in Transcription Factor Binding Site Prediction”
This project addresses the problem of identifying transcription factor binding sites using physical parameters, as opposed to current methods that only use nucleotide sequences to predict possible binding sites. Proteins called “transcription factors” bind to sites on the DNA strand and control whether the transcription of genes will be promoted or inhibited. Identifying the sites where they bind gives researchers a clearer picture of how genes are being regulated. Typically, prediction is done by analyzing DNA sequences, using a sequence-based model of transcription factor binding. Research has shown that some transcription factors will bind to specific sites at one genomic location, but not to another location with the same nucleotide sequence. This shows that some transcription factors are dependent (for binding) on the structure of the DNA molecule and not just nucleotide sequence. The goal of this project is to calculate structural parameters from a nucleotide sequence, and then use those parameters to enhance the prediction of binding sites. This project builds on previous research by creating portable libraries for the inference of structural parameters that can be deployed inside of existing prediction programs. Currently, these curvature profiles are being used as features that are fed to Machine Learning algorithms for binding site prediction. These algorithms “learn” from existing data, and use that knowledge to predict future classification. Specifically, an Artificial Neural Network, the Random Forest algorithm, and genetic algorithms are being used for prediction.
What research experiences have you had?
Since fall 2013, I’ve been doing a research internship in the Erill Lab of the Biology department. My research has been in the field of Bioinformatics, which uses Computer Science techniques to solve problems in the field of Biology.
How did you find the research opportunity?
I found the lab website while searching for information on a Biology minor. I emailed Dr. Erill about a research internship, and we set up a meeting that week.
Who did you work with on this project?
Technically I only worked with Dr. Erill on this project, but I received a lot of assistance from other students in the lab.
Was this your first independent research project?
No, I completed an independent research project in Linear Algebra in 2012. My research was on lossy video compression. However, that project was contained to a single semester, while my current research has lasted several. This has also been much more difficult due to my limited knowledge of Biology.
Do you get course credit for this work? Paid? How much time do you put into it?
I was not directly paid for it, but I got a Undergraduate Research Assistantship Stipend, which gave me $750/semester. I put in around 10-15 hours every week, depending on my momentum. I put in twenty hours in the week leading up to URCAD, and that was the most I’ve ever done.
What academic background did you have before you started?
I was beginning my junior year when I started my research. Most of my classes had been in the Math and Computer Science departments, but I had also taken several Biology classes.
How did you learn what you needed to know to be successful in this lab?
I learned through trial and error. Time management and individual commitment are probably the two traits most important to success. Balancing schoolwork and research was difficult at first, but I’ve drastically improved my time management skills.
What was the hardest part about your research?
The hardest part was dealing failure so often in short amounts of time. Sometimes things don’t work out even though you’ve done everything “right.” This research experience has been a good lesson in staying motivated through setbacks.
What was the most unexpected thing?
The most unexpected thing was how easy it was to get involved. Other than that, I was surprised by how rewarding my research has been. It’s also surprising how difficult it can be to describe my research. I get so focused on small details that I forget the larger question and spend fifteen minutes trying to explain it.
How does this research experience relate to your work in other classes?
Last semester I took BIOL313 (Bioinformatics) and found that I had learned a lot of the material during my research. This semester I was able to integrate my CMSC478 (Intro to Machine Learning) project into my research.
What is your advice to other students about getting involved in research?
Getting involved in research is easier than it seems. My biggest fear was that I was unqualified and would make a lot of mistakes. That hasn’t happened, and it’s actually been very rewarding. Getting up to speed can be difficult, but people are willing to help.
What are your career goals?
I would like to do something that involves Artificial Intelligence and/or music. There are a lot of services that analyze listening habits to suggest new music, I think that would be fun to work on. I’m also considering whether or not to pursue a master’s degree in Computer Science. Whatever I do, I’d like it to help people outside of Computer Science.
What are you doing next for research?
I’m beginning to use Machine Learning techniques to analyze the data I collected. Hopefully I’ll find some patterns that identify sites as bound or unbound.