Skip to content

Variant Visualisation

Important

This is an advanced module.

This module was written by Mathieu Bourgey and the original on-line version is available here.


Key Learning Outcomes

After completing this practical the trainee should be able to:

  • Generate Circos like graphics using R

Resources You’ll be Using

Tools Used

R:
https://cran.r-project.org/

R package circlize:
https://cran.r-project.org/web/packages/circlize/index.html


Author Information

Primary Author(s):
Mathieu Bourgey mathieu.bourgey@mcgill.ca

Contributor(s):


Introduction

This short workshop will show you how to visualize your data.

We will be working on 3 types of somatic calls:

  • SNV calls from MuTect (vcf)

  • SV calls from DELLY (vcf)

  • CNV calls from SCoNEs (tsv)


Prepare the Environment

We will use a dataset derived from the analysis of whole genome sequencing paired normal/tumour samples.

The call files are contained in the folder visualization:

  • mutect.somatic.vcf
  • delly.somatic.vcf
  • scones.somatic.tsv

Many tools are available to do this and the most commonly known is Circos. Circos is a really not user-friendly. In this tutorial, we show you an easy alternative to build a circular representation of genomic data.

First we need to go in the folder visualization to do the analysis:

cd /home/trainee/visualization/

Let’s see what is in this folder:

ls

circos.R delly.somatic.vcf mutect.somatic.vcf scones.somatic.tsv

Take a look at the data files.

This data has not been restricted to a short piece of a chromosome and SNVs have already been filtered:

less mutect.somatic.vcf
less delly.somatic.vcf
less scones.somatic.tsv

Question

What can you see from this data?

Answer
  • the filtered output of 3 different software: Mutect (SNVs), Delly (SVs), SCoNEs (CNVs).

  • The 3 files show 2 different formats (vcf, tsv).

  • Almost all type of variants are represented here: mutations, deletion, inversion, translocation, large amplification and deletion (CNVs).

Question

Why don’t we use the vcf format for all types of calls?

Answer

The 1000 Genomes project tries to use/include SV calls in the vcf format. Some tools like Delly use this format for SVs. It is a good idea to try to include everything together but this not the be the best way to handle SVs and CNVs.

Question

Why?

Answer

Due to the nature of these calls, you can not easily integrate the positional information of the two breakpoints (that could be located far away or in an other chromosome) using a single position format.


The analysis will be done using the R program:

R

We will use the circlize package from the cran R project. This package generates circular plots and has the advantage of being able to provide pre-built functions for genomic data. One of the main advantages of this tool is the use of bed format as input data.

library(circlize)


Let’s import the variants:

snp=read.table("mutect.somatic.vcf")
sv=read.table("delly.somatic.vcf")
cnv=read.table("scones.somatic.tsv",header=T)


We need to set up the generic graphical parameters:

x11()
par(mar = c(1, 1, 1, 1))
circos.par("start.degree" = 90)
circos.par("track.height" = 0.05)
circos.par("canvas.xlim" = c(-1.3, 1.3), "canvas.ylim" = c(-1.3, 1.3))


Let’s draw hg19 reference ideograms:

circos.initializeWithIdeogram(species = "hg19")


We need to ensure our data to fits the hg19 standards so the main thing to check is that we have chr at the beginning of the chromosome names.

We can now draw 1 track for somatic mutations:

snv_tmp=read.table("mutect.somatic.vcf",comment.char="#")
snv=cbind(paste("chr",as.character(snp[,1]),sep=""),snp[2],snp[,2]+1)
circos.genomicTrackPlotRegion(snv,stack=TRUE, panel.fun = function(region, value, ...) {
    circos.genomicPoints(region, value, cex = 0.05, pch = 9,col='orange' , ...)
})


Let’s draw the 2 tracks for CNVs. One track for duplications in red and one blue track for deletions.

dup=cnv[cnv[,5]>2,]
dup[,1]=paste("chr",as.character(dup[,1]),sep="")
del=cnv[cnv[,5]<2,]
del[,1]=paste("chr",as.character(del[,1]),sep="")
circos.genomicTrackPlotRegion(dup, stack = TRUE,panel.fun = function(region, value, ...) {
        circos.genomicRect(region, value, col = "red",bg.border = NA, cex=1 , ...)
})
circos.genomicTrackPlotRegion(del, stack = TRUE,panel.fun = function(region, value, ...) {
        circos.genomicRect(region, value, col = "blue",bg.border = NA, cex=1 , ...)
})

We can clearly see a massive deletion in chromosome 3.


To finish we just need to draw 3 tracks + positional links to represent SVs.

Unfortunately the vcf format has not been designed for SVs. SVs are defined by 2 breakpoints and the vcf format stores the second one in the info field. So we will need to extract this information to draw these calls.

chrEnd=NULL
posEnd=NULL
for (i in 1:dim(sv)[1]) {
    addInfo=strsplit(as.character(sv[i,8]),split=";")
    chrInf=strsplit(addInfo[[1]][3],split="=")
    chrEnd=c(chrEnd,chrInf[[1]][2])
    posInf=strsplit(addInfo[[1]][4],split="=")
    posEnd=c(posEnd,posInf[[1]][2])
}
svTable=data.frame(paste("chr",sv[,1],sep=""),as.numeric(sv[,2]),as.numeric(posEnd),paste("chr",chrEnd,sep=""),as.character(sv[,5]))


Now that we have reformatted the SV calls, let’s draw them.

typeE=c("<DEL>","<INS>","<INV>")
colE=c("blue","black","green")
for (i in 1:3) {
        bed_list=svTable[svTable[,5]==typeE[i],]
        circos.genomicTrackPlotRegion(bed_list,stack=TRUE, panel.fun = function(region, value, ...) {
                circos.genomicPoints(region, value, cex = 0.5, pch = 16, col = colE[i], ...)
        })
}

bed1=cbind(svTable[svTable[,5]=="<TRA>",1:2],svTable[svTable[,5]=="<TRA>",2]+5)
bed2=cbind(svTable[svTable[,5]=="<TRA>",c(4,3)],svTable[svTable[,5]=="<TRA>",3]+5)

for (i in 1:dim(bed1)[1]) {
    circos.link(bed1[i,1],bed1[i,2],bed2[i,1],bed2[i,2])
}


A good graph needs a title and legend:

title("Somatic calls (SNV - SV - CNV)")
legend(0.7,1.4,legend=c("SNV", "CNV-DUPLICATION","CNV-DELETION","SV-DELETION","SV-INSERTION","SV-INVERSION"),col=c("orange","red","blue","blue","black","green","red"),pch=c(16,15,15,16,16,16,16,16),cex=0.75,title="Tracks:",bty='n')
legend(0.6,0.95,legend="SV-TRANSLOCATION",col="black",lty=1,cex=0.75,lwd=1.2,bty='n')


You should obtain a plot like this one:

image


Now save the image to the visualization directory as a .pdf:

dev.copy2pdf(file = "/home/trainee/visualization/variant_visualization.pdf")
dev.off()

Finally exit R
q("yes")


Acknowledgements

Mathieu Bourgey would like to thank and acknowledge Louis Letourneau for this help and for sharing his material. The format of the tutorial has been inspired from Mar Gonzalez Porta. I also want to acknowledge Joel Fillon, Louis Letrouneau (again), Francois Lefebvre, Maxime Caron and Guillaume Bourque for the help in building these pipelines and working with all the various datasets.

License

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. This means that you are able to copy, share and modify the work, as long as the result is distributed under the same license.