Bioinformatics specialist, Nicholas Ho, from the Kids Cancer Alliance has been running workshops for our researchers to learn the programming language R. We asked him about R and whether researchers are any good at coding – or if we should leave it to people in baseball caps!
What is R?
R is a statistical programming language developed about 20 years ago. It’s gained popularity in the last 5-10 years with the huge increase in data in, not just medical research, but business, government and other areas. R is a foundation for analysing data, finding the gold nuggets of information in big pools of data.
Why are you teaching coding?
I’m running workshops in R because I think it’s useful for researchers at Children’s Cancer Institute. They’re generating a lot of data themselves and this will help them analyse it to test their hypotheses and extract knowledge they can act on, deciding on what experiments come next.
What kind of data can R handle?
DNA and RNA sequencing data is big. But one of the more common sources of data is microarray experiments. Microarrays are like computer chips loaded up with thousands of genetic probes, each representing a particular DNA sequence or marker. Researchers add their sample of, say, tumour cell RNA to the chip and then measure gene expression and compare it to healthy cell RNA. With 20,000 or so probes to test on each chip, that’s a phenomenal amount of data to work with.
With 20,000 or so probes to test on each chip, that’s a phenomenal amount of data to work with.
R can help you analyse that data automatically instead of manually, using computer power to do the hard work for you. It’s much quicker. You can analyse 20,000 or 50,000 markers in a day, even overnight. With automation, things can be set up so you come in to work the next day and there are your results. The slow but rewarding bit is interpreting the data, seeing if it supported your hypothesis and deciding what experiments you need next.
R is also a platform for beautiful data visualisations. A correlation heat map, for example, can identify genes that have similar or opposite expression patterns. A volcano plot can visualise genes that are differentially expressed, with higher or lower expression in one state (e.g. tumour tissue) versus another (e.g. normal tissue).
Be honest – are researchers any good at coding?
At the workshops, we’ve had a lot of researchers who’ve had no computer programming background. The workshops really did build up their confidence to do more programming. They’ve said it will help them in their research, and I’m very happy about that. They’re eager to analyse data from their own projects. I’ve always found programming to be a way of thinking. It’s a step-by-step, logical thinking process. Researchers, anyone really, can benefit from that.
Top image: A volcano plot generated using R visualises gene expression data.