Have you ever heard of bioinformatics?
A cross between computer science, statistics, and biology, bioinformatics is a relatively new interdisciplinary field that utilizes computer technology to develop programs to help analyze biological data, especially genomic data.
Dr. Garrett Dancik, Associate Professor of Computer Science and Bioinformatics at Eastern Connecticut State University, has collaborated with research biologists over the years as a bioinformatician, analyzing gene expression data associated with bladder cancer.
As a professor, Dr. Dancik has dedicated much of his work to developing new and useful bioinformatics tools to streamline the genomic analysis process for researchers.
One of his most recent creations is shinyGEO, a tool developed to help biologists analyze genomic data with the goal of better understanding genetic diseases like cancer, as well as the genetics of normal growth and development.
There are some complex concepts at work behind shinyGEO, so some background is essential to understanding what Dr. Dancik’s most recent bioinformatics tool accomplishes for the field of biology. One thing that is extremely important is the fact that the human genome is 3 billion characters long, and contains around 20,000 protein-coding genes. Not only are these incredibly large numbers, they also form a huge amount of data, which makes the task of analyzing genomic information extremely tedious. Because of the sheer amount of data involved with this type of genomic research, biologists need a computer to facilitate the process.
One of the largest repositories of genomic data is the Gene Expression Omnibus (GEO). As Dr. Dancik explains, “The data is based on what’s known as gene expression, which is the idea that your genes, which are composed of DNA, provide a recipe or a template for the cell, to be able to produce a protein. It’s that process of gene expression that determines how cells behave and how cells change over time.”
What most people don’t realize is that the data hosted on the Gene Expression Omnibus is free for anyone to access and download; however, for those without programming knowledge or a programmer to help analyze and parse the information, most biological researchers would be metaphorically all dressed up with nowhere to go. “There’s an enormous amount of data available,” Dr. Dancik commented, “the bottleneck is just being able to analyze it all.”
Dr. Dancik developed shinyGEO to help alleviate this problem. Users of shinyGEO select genes that are of interest to their research by employing simple and navigable drop-down menus. Once biologists select the genes they’re interested in, shinyGEO performs a differential expression analysis in the background, meaning a sort of statistical scanning that organizes information about genes of interest across certain groups (such as tumor cells and normal cells).
Much akin to an internet browser that uses code to facilitate access to the internet and to render web content in a visually appealing format, shinyGEO takes all the GEO data of interest to researchers, analyzes it, and presents it back to them in an easily understood format. Scientists working with shinyGEO don’t need to understand any of what the program is doing, they merely select the gene or genes they’re interested in and the dataset they’d like to better understand. Then, that massive amount of data will be automatically pulled from GEO, analyzed, and organized in an accessible way — no programming knowledge required.
What does this analysis mean for the larger population? Dr. Dancik has been utilizing shinyGEO with the main goal of understanding cancer. “Basically, cancer is where you start with a normal cell, but then there’s a genetic alteration that results in changes in the gene expression process. That causes the cell to become cancerous. So by looking at the gene expression data, biologists and researchers can try to untangle what is happening at the genetic level that is causing a cell to develop into a tumor.”
An example of a common analysis that shinyGEO conducts might be that a researcher has two different groups: for example, cancer cells and normal cells. The researcher wants to understand what the differences are between those two cells in terms of gene expression. “With cancer, a common analysis would be to see if the expression of a particular gene is associated with the survival or prognosis of a cancer patient,” explained Dr. Dancik.
The larger goal of Dr. Dancik’s work with gene expression analysis is to try to make progress in what’s called personalized medicine. You might think that all medicine is personal, but this term actually is based on the idea that the same drug doesn’t necessarily work for every individual, especially with cancer. To put it simply, a drug that works for one patient might not work for another.
Personalized medicine is an ongoing evolution of medical treatment, as it would be highly valuable to decipher effective treatment possibilities at the genetic level before committing patients to a treatment plan. “The question is, if we can predict ahead of time whether a drug is likely to benefit an individual with a certain gene expression, then doctors can make better decisions in terms of how each patient is treated. Doctors would be able to tell in advance whether aggressive treatment is necessary or not.”
“Cancer is a disease with a lot of genetic mutations and genetic instability,” Dr. Dancik elaborated. “So often what happens is you have an accumulation of mutations in one tumor. That means that within a single tumor, there are different groups of cells with different mutations from one another. What the field has been learning over the course of the past couple of years is that to really understand the genetics of a tumor, you can’t just look at one region of the tumor.” This means that researchers must sequence 3 billion genetic characters across several different regions to understand what differences might exist among them. In some cases, only by this fine-tuned level of sequencing and looking at genomic data would researchers be able to figure out what is driving the growth and what would be the best way to treat a tumor.
“I think that genomic analysis will eventually just become part of the process when someone is diagnosed with cancer,” postulated Dr. Dancik. “They’ll have their genome sequenced because the information in the genome of the tumor is going to provide all sorts of information about the cancer, including what might be the best possible treatment for the individual.”
Analyzing a patient’s genetic data was previously an unwieldy expense, but thanks to the rapid development of computer technology, the cost has decreased over the course of the past two decades. While cost represented the first hurdle to wide-scale biological advancements of this kind, the second hurdle is just trying to make sense of all that genetic information. “That’s where bioinformatics comes in because the analysis is a really complicated and involved process,” said Dr. Dancik. “But it’s a very fascinating one.”
Dr. Dancik has been working at ECSU since 2013 and has since launched the bioinformatics minor at Eastern, officially offered to students in Fall 2016. Dr. Dancik also develops tools to better help students of statistics and computer science code in the statistical programming language ‘R.’ He has also recently developed an extension of the teaching software known as swirl, called swirl-tbp. The extension randomizes aspects of learning to code in ‘R,’ which he currently uses in his own courses with his students. More information about Dr. Dancik can be found at his website.