Use a Custom Genome

Overview

With a properly configured web server, and a few annotation files, you can use igvR with any genome of interest. We demonstrate that here with covid-19, the SARS-CoV-2 genome, obtained from the NCBI, and hosted on an nginx webserver running at the Institute for Systems Biology.

Explicit loading of “custom” hg38

We begin, however, by showing the explicit configuration required to loading the hg38 annotated genome hosted by igv.org, to familiarize you with the full range of parameters you can use if you have the data.

library(igvR)
igv <- igvR()
setBrowserWindowTitle(igv, "hg38 explicit")
setCustomGenome(igv,
                id="hg38",
                genomeName="Human (GRCh38/hg38)",
                fastaURL="https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa",
                fastaIndexURL="https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa.fai",
                cytobandURL="https://s3.amazonaws.com/igv.broadinstitute.org/annotations/hg38/cytoBandIdeo.txt",
                chromosomeAliasURL=NA,
                geneAnnotationName="Refseq Genes",
                geneAnnotationURL="https://s3.amazonaws.com/igv.org.genomes/hg38/refGene.txt.gz",
                geneAnnotationTrackHeight=500,
                geneAnnotationTrackColor="darkBlue",
                initialLocus="chr5:88,621,308-89,001,037",
                visibilityWindow=5000000)

Use the SARS-CoV-2 genome


base.url <- "https://igv-data.systemsbiology.net/testFiles/sarsGenome"
fasta.file <- sprintf("%s/%s", base.url,"Sars_cov_2.ASM985889v3.dna.toplevel.fa")
fastaIndex.file <-  sprintf("%s/%s", base.url, "Sars_cov_2.ASM985889v3.dna.toplevel.fa.fai")
annotation.file <-  sprintf("%s/%s", base.url, "Sars_cov_2.ASM985889v3.101.gff3")

Sys.sleep(2)

setCustomGenome(igv,
                id="Sars_cov_2",
                genomeName="Sars_cov_2.ASM985889v3",
                fastaURL=fasta.file,
                fastaIndexURL=fastaIndex.file,
                geneAnnotationURL=annotation.file,
                geneAnnotationName="ASM985889v3",
                geneAnnotationTrackHeight=500,
                geneAnnotationTrackColor="darkBlue",
                visibilityWindow=30000)

Configure and run an nginx webserver, with CORS and Byte-Range support

If you work with a novel organism, or use a non-standard assembly, then the open source free nginx web server is a good choice. I have had some success with a simple Python Flask server, but recently ran into errors which I could not fix, and so switched to (and now recommend) nginx.

CORS: cross-origin resource sharing

Javascript interpreters running in compliant web browsers will (with some exceptions) not load code or data, within a running script, if that code or data comes from a web host other than the host from which the script came.

The CORS protocol eases this restriction - counter-intuitively, to my mind. If the cross-origin data has a header which announces CORS saftey, then Javascript will proceed. Thus nginx (or some alternative server you use) must be CORS enabled

Byte-range support

Genome files are often very large and therefore only read into igv.js and manageable chunks. Your webserver must respond to those “chunk” requests, called “byte-range” support.

A sample nginx configuration.

This is what we use at the Institute for Systems Biology to serve the SARS-CoV-2 genome and annotation. This file is called default.conf and is supplied in the Docker command shown below.

server {
    listen       80;
    server_name  localhost;

    #access_log  /var/log/nginx/host.access.log  main;                                                                                                                                                              

    location / {
    root   /usr/share/nginx/html;
    index  index.html index.htm;
    # max requests per keepalive connection                                                                                                                                                                     
    keepalive_requests 55000;

    # awesome OPTIONS header                                                                                                                                                                                    
    if ($request_method = 'OPTIONS') {
        add_header "Access-Control-Allow-Origin" $http_origin;
        add_header "Vary" "Origin";
        add_header 'Access-Control-Allow-Credentials' 'true';
        add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
        add_header 'Access-Control-Allow-Headers' 'DNT,Range,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Acopenept-Ranges,Content-Encoding,Content-R$
        # Tell client that this pre-flight info is valid for 20 days                                                                                                                                        
        add_header 'Access-Control-Max-Age' 1728000;
        add_header 'Content-Type' 'text/plain charset=UTF-8';
        add_header 'Content-Length' 0;
        return 204;
    }
        if ($request_method = 'POST') {
        add_header 'Access-Control-Allow-Origin' '*';
        add_header 'Access-Control-Allow-Credentials' 'true';
        add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
        add_header 'Access-Control-Allow-Headers' 'DNT,Range,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Acopenept-Ranges,Content-Encoding,Content-R$
        add_header 'Access-Control-Expose-Headers' 'Accept-Ranges,Content-Encoding,Content-Length,Content-Range,Cache-Control,Content-Language,Content-Type,Expires,Last-Modified,Pragma,Date';
    }
        if ($request_method = 'GET') {
        add_header "Access-Control-Allow-Origin" $http_origin;
        add_header "Vary" "Origin";
        add_header 'Access-Control-Allow-Credentials' 'true';
        add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
        add_header 'Access-Control-Allow-Headers' 'DNT,Range,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Acopenept-Ranges,Content-Encoding, Content-$
        add_header 'Access-Control-Expose-Headers' 'Accept-Ranges,Content-Encoding,Content-Length,Content-Range,Cache-Control,Content-Language,Content-Type,Expires,Last-Modified,Pragma,Date';
    }
    }

    # redirect server error pages to the static page /50x.html                                                                                                                                                      
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
    root   /usr/share/nginx/html;
    }
}

Run nginx out of a docker container.

The Docker container provided for free use by nginx.com is all you need. The -v options mount two host directories so that they are visible within the running container.

We run the nginx contaier on host port 60050 (container port 80) - and other routers and proxies map 60050 to a DNS-specified virtual host, visible outside our ISB firewall, as https://igv-data.systemsbiology.net

That virtual host setup, the DNS entry, and the proxying are not covered here.

docker run -p 60050:80 \
      -v /yourConfigurationDirectory/fullPath/goesHere/default.conf:/etc/nginx/conf.d/default.conf:ro \
      -v /yourDataDirectory/fullPath/goesHere/:/usr/share/nginx/html:ro \
      --restart always \
      -d nginx

Session Info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] BiocStyle_2.35.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37       R6_2.5.1            fastmap_1.2.0       xfun_0.49           maketools_1.3.1    
#>  [6] cachem_1.1.0        knitr_1.49          htmltools_0.5.8.1   png_0.1-8           rmarkdown_2.29     
#> [11] buildtools_1.0.0    lifecycle_1.0.4     cli_3.6.3           sass_0.4.9          jquerylib_0.1.4    
#> [16] compiler_4.4.2      sys_3.4.3           tools_4.4.2         evaluate_1.0.1      bslib_0.8.0        
#> [21] yaml_2.3.10         BiocManager_1.30.25 jsonlite_1.8.9      rlang_1.1.4