Research reflections APril 2019 from Parnika Mukherjee

Parnika Mukherjee (HUB)

Published April 2019 


1. Joanna Polanska’s talk at Max Planck Institute of Infection Biology, Berlin.


Early this year, I had the pleasure of hearing Prof. Joanna Polanska speak about her group's work on interaction mining at Silesian University of Technology in Poland. The talk was titled “Analysis and visualization of feature interactions in high dimensional classification, regression and survival analysis problems”.


I was interested in attending this talk because in my PhD project, I want to study the interaction of gene products. The main aim of my project and the talk were similar but the input parameters and therefore, the methods used were very different. As input, I use expression values of genes followed by pairwise gene correlation as a measure of interaction strength. Prof. Polanska’s inputs involved features, such as, age, weight and height. Her group used machine learning methods - supervised and unsupervised learning of these features to predict an output (yes/no for interaction).


She spoke about their newly developed tool, BroadSide. BroadSide is an interaction mining and feature selection tool. Feature selection is the selection of features that help in reducing redundancy and dimensions and can be used to best construct a model. BroadSide also ranks these interactions and constructs easily interpretable interaction networks. It can work on high-dimensional problems of millions of features, such as RNA-Seq, as well as on smaller sets of well chosen clinical or PCR variables.


2. Frank Seeber’s talk at GRK2046 lecture series (winter semester 2018/19)


In February this year, I was at Prof. Frank Seeber’s talk on “Innate immune responses of wild rodents upon parasite infections”. Even though I don’t design mouse experiments as a part of my PhD, I do use malaria RNA-Seq datasets that come from human, mouse and macaque experiments.


Prof. Seeber spoke about the different kinds of mice used in biological experiments, where they come from and examples for what researchers should be careful of when extrapolating results from mice to humans. One striking case study was that of the use of Thalidomide by pregnant women. Thalidomide was tested on animals as a sedative pill, but when administered to pregnant women, caused severe birth defects. The other case study was that of TGN1412. TGN1412, an immunomodulatory drug, also known as Theralizumab, was also tested on animals and found to be safe but cause multiple organ dysfunction during clinical trials.

Since there are differences between the immune systems of humans and mice, efforts have been made to humanize mice. For example, to study malaria, humanized mice with altered blood and liver are available. Then again, wild mice are very different to lab mice. This difference may affect results of biomedical experiments. For example, virulence of parasites in wild mice is not the same as in lab mice. To overcome this issue, organoids are often used.


This talk was a good introduction to the differences between diseases in animal models and diseases in humans. Later on in my project, I will have gene co-expression networks from human, mice and monkey studies and this talk served as a guideline of the kind of caution I might have to take while comparing the networks from the three organisms.


3. High-Performance Bioinformatics workshop at Cineca, Rome.


In December 2018, I participated in the “High-Performance Bioinformatics” workshop organised at Cineca, Rome. Cineca, a part of PRACE (Partnership for Advanced Computing in Europe), is a supercomputing centre in Italy. The goal of the workshop was to familiarize the participants with the architecture and programming on a supercomputer.


The use of supercomputers, or “high-performance computing”, allows a user to perform large-scale calculations parallely to accelerate analysis. They are, essentially, thousands of individual computers (CPUs) designed to work in parallel. The program of the workshop fit very well with the current stage of my project. The case-study used to demonstrate programming methods was an RNA-Seq dataset. There were a few lectures on the Biology of the dataset, RNA-Seq and RNA-Seq analysis. We had hands-on sessions for executing Bash commands and for Python programming. Finally, there were lectures on the hardware and software of a typical supercomputer in general and of Cineca in particular. We were given limited student accounts on the supercomputer to practice exercises.


In my PhD project, I need to find out correlation of gene expression between host and Plasmodium in several datasets in different combinations and then to visualise them as gene co-expression networks. To do this in a reasonable amount of time, it will highly benefit me to have large computational capacity in terms of memory and storage space. Both of these aspects are provided by supercomputers. Although, I am trying to accomplish some of these tasks on the already powerful server we use at work, I would like to see how much faster it would be on a supercomputer. I predict that at some point in my project, I will add datasets to the already existing ones and repeating the analysis on a supercomputer might save me an enormous amount of time.