Title: | Downloads ENA Fastqs With GEO Accessions |
---|---|
Description: | GEOfastq is used to download fastq files from the European Nucleotide Archive (ENA) starting with an accession from the Gene Expression Omnibus (GEO). To do this, sample metadata is retrieved from GEO and the Sequence Read Archive (SRA). SRA run accessions are then used to construct FTP and aspera download links for fastq files generated by the ENA. |
Authors: | Alex Pickering [cre, aut] |
Maintainer: | Alex Pickering <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.15.0 |
Built: | 2024-10-31 06:08:53 UTC |
Source: | https://github.com/bioc/GEOfastq |
Get GSE text from GEO
crawl_gse(gse_name)
crawl_gse(gse_name)
gse_name |
GEO study name to get metadata for |
Character vector of lines on GSE record.
gse_text <- crawl_gse('GSE111459')
gse_text <- crawl_gse('GSE111459')
Goes to each GSM page to get SRX then to each SRX page to get some more metadata.
crawl_gsms(gsm_names, max.workers = 50)
crawl_gsms(gsm_names, max.workers = 50)
gsm_names |
Character vector of GSMs. |
max.workers |
Maximum number of parallel workers to split task betweem |
data.frame
srp_meta <- crawl_gsms("GSM3031462") # returns NULL because records on dbGAP for privacy reasons srp_meta <- crawl_gsms("GSM2439650") # example with empty values srp_meta <- crawl_gsms('GSM4043025')
srp_meta <- crawl_gsms("GSM3031462") # returns NULL because records on dbGAP for privacy reasons srp_meta <- crawl_gsms("GSM2439650") # example with empty values srp_meta <- crawl_gsms('GSM4043025')
Extract GSMs needed to download RNA-seq data for a series
extract_gsms(gse_text)
extract_gsms(gse_text)
gse_text |
GSE text returned from |
Character vector of sample GSMs for the series gse_name
gse_text <- crawl_gse('GSE111459') gsm_names <- extract_gsms(gse_text)
gse_text <- crawl_gse('GSE111459') gsm_names <- extract_gsms(gse_text)
Gets part of path to download bulk RNAseq sample from EBI or NCBI
get_dldir(srr, type = c("ebi", "ncbi"))
get_dldir(srr, type = c("ebi", "ncbi"))
srr |
SRR/ERR run name |
type |
Either |
String path used by get_fastqs
.
get_dldir('SRR014242')
get_dldir('SRR014242')
First tries to get RNA-Seq fastq files from EBI.
get_fastqs(srp_meta, data_dir, method = c("ftp", "aspera"), max_rate = "1g")
get_fastqs(srp_meta, data_dir, method = c("ftp", "aspera"), max_rate = "1g")
srp_meta |
|
data_dir |
Path to folder that fastq files will be downloaded to. Will be created if doesn't exist. |
method |
One of |
max_rate |
Used when |
Named vector of integer return codes from ascp
or
download.file
. Names are SRR runs.
gsm_name <- 'GSM3926903' srp_meta <- crawl_gsms(gsm_name) data_dir <- tempdir() res <- get_fastqs(srp_meta, data_dir)
gsm_name <- 'GSM3926903' srp_meta <- crawl_gsms(gsm_name) data_dir <- tempdir() res <- get_fastqs(srp_meta, data_dir)