GEOfastq
can be installed from Bioconductor as
follows:
The NCBI Gene Expression
Omnibus (GEO) offers a convenient interface to explore
high-throughput experimental data such as RNA-seq. GEO deposits RNA-seq
data as sra files to the Sequence Read Archive (SRA) which can be
converted to fastq files using fastq-dump
. This conversion
process can be quite slow and it is usually more convenient to download
fastq files for a GEO accession generated by the European Nucleotide
Archive (ENA). GEOfastq
crawls GEO to retrieve metadata and
ENA fastq urls, and then downloads them.
To get fastq data for a GEO series, we first retrieve the metadata for a GEO accession:
Next, we extract the sample accessions for this study and retrieve the GEO metadata and ENA fastq url for an example:
gsm_names <- extract_gsms(gse_text)
gsm_name <- gsm_names[182]
srp_meta <- crawl_gsms(gsm_name)
#> 1 GSMs to process
Now that we have retrieved the necessary metadata, we are ready to download the fastq files for this sample:
The following package and versions were used in the production of this vignette.
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] GEOfastq_1.15.0 rmarkdown_2.28
#>
#> loaded via a namespace (and not attached):
#> [1] doParallel_1.0.17 cli_3.6.3 knitr_1.48 rlang_1.1.4
#> [5] xfun_0.48 jsonlite_1.8.9 RCurl_1.98-1.16 buildtools_1.0.0
#> [9] plyr_1.8.9 htmltools_0.5.8.1 maketools_1.3.1 sys_3.4.3
#> [13] sass_0.4.9 evaluate_1.0.1 jquerylib_0.1.4 bitops_1.0-9
#> [17] fastmap_1.2.0 yaml_2.3.10 foreach_1.5.2 lifecycle_1.0.4
#> [21] compiler_4.4.1 codetools_0.2-20 Rcpp_1.0.13 digest_0.6.37
#> [25] R6_2.5.1 parallel_4.4.1 bslib_0.8.0 tools_4.4.1
#> [29] iterators_1.0.14 cachem_1.1.0