Package 'packFinder' reference manual

Title:	de novo Annotation of Pack-TYPE Transposable Elements
Description:	Algorithm and tools for in silico pack-TYPE transposon discovery. Filters a given genome for properties unique to DNA transposons and provides tools for the investigation of returned matches. Sequences are input in DNAString format, and ranges are returned as a dataframe (in the format returned by as.dataframe(GRanges)).
Authors:	Jack Gisby [aut, cre] , Marco Catoni [aut]
Maintainer:	Jack Gisby <[email protected]>
License:	GPL-2
Version:	1.19.0
Built:	2025-03-05 05:35:21 UTC
Source:	https://github.com/bioc/packFinder

Arabidopsis thaliana Refseq Genome Chromosome 3 Subset

Description

The chromosome 3 reference sequence for Arabidopsis thaliana as a DNAStringSet. Can be used as a test data set, as in the associated introduction vignette. The DNA sequence between bases 10,500,000 and 14,300,000 was extracted for use in this dataset.

Usage

data(arabidopsisThalianaRefseq)
data(arabidopsisThalianaRefseq)

Format

A DNAStringSet object containing a DNAString for Arabidopsis thaliana's chromosome 3 sequence.

Author(s)

Jack Gisby

Source

The Arabidopsis thaliana genome was downloaded from the NCBI refseq database on 20/SEP/2019, using getGenome, and chromosome 3 was extracted. The genome may also be accessed from the NCBI ftp server: ftp://ftp.ncbi.nlm.nih.gov/genomes.

Examples

data(arabidopsisThalianaRefseq)

packMatches <- packSearch(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    elementLength = c(300, 3500),
    tsdLength = 3
)

data(arabidopsisThalianaRefseq)

packMatches <- packSearch(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    elementLength = c(300, 3500),
    tsdLength = 3
)

BLAST Analysis of PackTYPE Elements

Description

Run BLAST against user-specified databases of non-transposon and transposon-related proteins. Can be used to classify transposons based on their internal sequences.

Usage

blastAnalysis(
  packMatches,
  Genome,
  blastPath,
  protDb = NULL,
  autoDb = NULL,
  minE = 0.001,
  blastTask = "blastn-short",
  maxHits = 100,
  threads = 1,
  saveFolder = NULL,
  tirCutoff = 0
)
blastAnalysis(
  packMatches,
  Genome,
  blastPath,
  protDb = NULL,
  autoDb = NULL,
  minE = 0.001,
  blastTask = "blastn-short",
  maxHits = 100,
  threads = 1,
  saveFolder = NULL,
  tirCutoff = 0
)

Arguments

`packMatches`	A dataframe of potential Pack-TYPE transposable elements, in the format given by `packSearch`. This dataframe is in the format produced by coercing a `link[GenomicRanges:GRanges-class]{GRanges}` object to a dataframe: `data.frame(GRanges)`. Will be saved as a FASTA file for VSEARCH.
`Genome`	A DNAStringSet object containing sequences referred to in `packMatches` (the object originally used to predict the transposons `packSearch`).
`blastPath`	Path to the BLAST+ executable, or name of the BLAST+ application for Linux/MacOS users.
`protDb`	For assigning Pack-TYPE elements. Path to the blast database containing nucleotide or protein sequences to be matched against internal transposon sequences. Can be generated using BLAST+, or with `link{makeBlastDb}`.
`autoDb`	For assigning autonomous elements. Path to the blast database containing nucleotide or protein sequences to be matched against internal transposon sequences. Can be generated using BLAST+, or with `link{makeBlastDb}`.
`minE`	Blast results with e values greater than the specified cutoff will be ignored.
`blastTask`	Type of BLAST+ task, defaults to "blastn-short".
`maxHits`	Maximum hits returned by BLAST+ per query.
`threads`	Allowable number of threads to be utilised by BLAST+.
`saveFolder`	Directory to save BLAST+ results in; defaults to the working directory.
`tirCutoff`	How many bases to ignore at the terminal ends of the transposons to prevent hits to TIR sequences.

Value

No return value; executes BLAST+ to generate hits which are stored in a .blast file in the chosen directory.

Author(s)

Jack Gisby

References

For further information, see the NCBI BLAST+ application documentation and help pages (https://www.ncbi.nlm.nih.gov/pubmed/20003500?dopt=Citation).

Examples

## Not run: 
packMatches <- data(packMatches)
Genome <- data(arabidopsisThalianaRefseq)

blastAnalysis(packMatches, Genome, 
    protDb = "C:/data/TAIR10_CDS", 
    autoDb = "C:/data/TAIR10_transposons", 
    blastPath = "C:/blast/bin/blastn.exe")

## End(Not run)

## Not run: 
packMatches <- data(packMatches)
Genome <- data(arabidopsisThalianaRefseq)

blastAnalysis(packMatches, Genome, 
    protDb = "C:/data/TAIR10_CDS", 
    autoDb = "C:/data/TAIR10_transposons", 
    blastPath = "C:/blast/bin/blastn.exe")

## End(Not run)

Functional Annotation of PackTYPE Elements

Description

Uses hits, previously generated using blast, to annotate transposon hits. Transposons with non-redundant transposase hits are classed as autonomous ("auto"), while others are classed as "other" or "pack" based on whether the element has non-redundant hits to other proteins.

Usage

blastAnnotate(protHits, autoHits, packMatches)
blastAnnotate(protHits, autoHits, packMatches)

Arguments

`protHits`	BLAST results for non-transposon related genes or proteins (as a `data.frame`). Generated using `blastAnalysis`.
`autoHits`	BLAST results for transposon related genes or proteins (as a `data.frame`). Generated using `blastAnalysis`.
`packMatches`	A dataframe of potential Pack-TYPE transposable elements, in the format given by `packSearch`. This dataframe is in the format produced by coercing a `link[GenomicRanges:GRanges-class]{GRanges}` object to a dataframe: `data.frame(GRanges)`. Will be saved as a FASTA file for VSEARCH.

Value

Returns the original packMatches dataframe, with the addition of a "classification" column containing one of the following values:

auto - elements that match known transposases or transposon-related proteins are classified as autonomous elements
pack - elements that match other proteins or genic sequences may be classified as Pack-TYPE elements
other - elements that generate no significant hits

Note

Requires that the query ids in the protein and autonomous hits match the row names in packMatches.

Author(s)

Jack Gisby

References

For further information, see the NCBI BLAST+ application documentation and help pages (https://www.ncbi.nlm.nih.gov/pubmed/20003500?dopt=Citation).

Examples

data("packMatches")

# read in some protein hits
p <- data.frame(
    query_id = c(2, 3),
    subject_id = c("prot", "hyp")
)

# read in some autonomous hits
a <- data.frame(
    query_id = c(3, 4),
    subject_id = c("transposase", "mutator")
)

blastAnnotate(p, a, packMatches)

data("packMatches")

# read in some protein hits
p <- data.frame(
    query_id = c(2, 3),
    subject_id = c("prot", "hyp")
)

# read in some autonomous hits
a <- data.frame(
    query_id = c(3, 4),
    subject_id = c("transposase", "mutator")
)

blastAnnotate(p, a, packMatches)

Collapse Overlapping Sequences

Description

The sequences predicted by packSearch often overlap, which may be due to the presence of closely interspersed elements or false TIR identification. In such cases, these elements can be combined using link[GenomicRanges:GRanges-class]{GRanges} in order to collapse overlapping elements, preventing over-estimation of transposon numbers. Also removes duplicate elements that have been generated in the case of multiple searches.

Usage

collapseSeqs(packMatches, Genome)
collapseSeqs(packMatches, Genome)

Arguments

packMatches

A dataframe containing genomic ranges and names referring to sequences to be extracted. This dataframe is in the format produced by coercing a link[GenomicRanges:GRanges-class]{GRanges} object to a dataframe: data.frame(GRanges).

Must contain the following features:

start - the predicted element's start base sequence position.
end - the predicted element's end base sequence position.
seqnames - character string referring to the sequence name in Genome to which start and end refer to.

Genome

A DNAStringSet object containing sequences referred to in packMatches (the object originally used to predict the transposons packSearch).

Value

A set of non-overlapping transposon sequences in the format of the input dataframe.

Author(s)

Jack Gisby

Examples

data(packMatches)
data(arabidopsisThalianaRefseq)

packMatches$start <- 1
packMatches$end <- 10

collapseSeqs(packMatches, arabidopsisThalianaRefseq)

data(packMatches)
data(arabidopsisThalianaRefseq)

packMatches$start <- 1
packMatches$end <- 10

collapseSeqs(packMatches, arabidopsisThalianaRefseq)

Remove Low Quality Sequences

Description

Takes transposable elements detected by packSearch and removes those with large numbers of wildcard ("N") bases. Used by packClust and packAlign to remove poor quality sequences that may interfere with the quality of sequence alignments.

Usage

filterWildcards(packMatches, Genome, maxWildcards = 0.05)
filterWildcards(packMatches, Genome, maxWildcards = 0.05)

Arguments

`packMatches`	A dataframe containing genomic ranges and names referring to sequences to be extracted.
`Genome`	The original set of sequences used to generate the transposons detected by `packSearch`.
`maxWildcards`	The maximal allowable proportion of wildcards in the sequence of each match (defaults to `0.05`).

Value

The original dataframe, packMatches, with sequences removed that are found to contain a proportion of wildcards ("N") greater than that specified in maxWildcards.

Author(s)

Jack Gisby

Examples

data(arabidopsisThalianaRefseq)
data(packMatches)
    
filteredMatches <- filterWildcards(
    packMatches, 
    arabidopsisThalianaRefseq, 
    maxWildcards = 0.05
)

data(arabidopsisThalianaRefseq)
data(packMatches)
    
filteredMatches <- filterWildcards(
    packMatches, 
    arabidopsisThalianaRefseq, 
    maxWildcards = 0.05
)

Extract Sequences of Pack-TYPE Elements

Description

Method to quickly extract the sequences of predicted Pack-TYPE elements (as created by packSearch).

Usage

getPackSeqs(packMatches, Genome, output = "DNAStringSet")
getPackSeqs(packMatches, Genome, output = "DNAStringSet")

Arguments

packMatches

Must contain the following features:

start - the predicted element's start base sequence position.
end - the predicted element's end base sequence position.
seqnames - character string referring to the sequence name in Genome to which start and end refer to.

Genome

A DNAStringSet object containing sequences referred to in packMatches (the object originally used to predict the transposons packSearch).

output

The type of object to be returned:

output = "DNAStringSet", returns a DNAStringSet object (default).
output = "character", returns a character vector.

Value

transposon sequences extracted from packMatches. At default returns the sequences as a DNAStringSet or, if output is set to "character", returns a character vector.

Author(s)

Jack Gisby

Examples

data(arabidopsisThalianaRefseq)

packMatches <- packSearch(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    elementLength = c(300, 3500),
    tsdLength = 3
)

packSeqs <- getPackSeqs(packMatches, arabidopsisThalianaRefseq)

data(arabidopsisThalianaRefseq)

packMatches <- packSearch(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    elementLength = c(300, 3500),
    tsdLength = 3
)

packSeqs <- getPackSeqs(packMatches, arabidopsisThalianaRefseq)

Retrieve Saved packFinder Results (.csv)

Description

Retrieves a dataframe of potential Pack-TYPE elements, previously saved using packSearch followed by packsToCsv.

Usage

getPacksFromCsv(file)
getPacksFromCsv(file)

Arguments

file

File path to predicted transposons in CSV format.

Value

Dataframe in the format used by packSearch.

Author(s)

Jack Gisby

Examples

data(packMatches)

packMatches <- getPacksFromCsv(
    system.file("extdata", "packMatches.csv", package = "packFinder")
)

data(packMatches)

packMatches <- getPacksFromCsv(
    system.file("extdata", "packMatches.csv", package = "packFinder")
)

Retrieve Saved packFinder Results (.fasta)

Description

Retrieves a dataframe of potential Pack-TYPE elements, previously saved using packSearch followed by packsToFasta. Parses the .fasta file and title field containing:

seqnames - name of origin sequence
start - transposon base start position on origin sequence
end - transposon base end position on origin sequence
width - width of transposon
strand - direction of transposon ("+", "-" or "*")
TSD - terminal site duplication (TSD) sequence

Usage

getPacksFromFasta(file)
getPacksFromFasta(file)

Arguments

file

Path to predicted transposons in FASTA format.

Value

Dataframe in the format used by packSearch.

Author(s)

Jack Gisby

Examples

data(arabidopsisThalianaRefseq)
data(packMatches)

packMatches <- getPacksFromFasta(
    system.file("extdata", "packMatches.fasta", package = "packFinder")
)

data(arabidopsisThalianaRefseq)
data(packMatches)

packMatches <- getPacksFromFasta(
    system.file("extdata", "packMatches.fasta", package = "packFinder")
)

Retrieve packFinder Results from GRanges Object

Description

A link[GenomicRanges:GRanges-class]{GRanges} object, potentially generated using packSearch and packsToGRanges, can be converted to a dataframe. If a GRanges object is supplied without TSD information, this can be calculated and appended to the final dataframe.

Usage

getPacksFromGRanges(packGRanges, Genome = NULL, tsdLength = NULL)
getPacksFromGRanges(packGRanges, Genome = NULL, tsdLength = NULL)

Arguments

`packGRanges`	`link[GenomicRanges:GRanges-class]{GRanges}` object to be coerced.
`Genome`	(optional) Sequences referred to by `packGRanges`.
`tsdLength`	(optional) Length of TSD sequences.

Value

Dataframe in the format used by packSearch. If Genome and tsdLength are supplied, then TSD sequences are retrieved and returned as part of the dataframe.

Author(s)

Jack Gisby

Examples

data(packMatches)

GRangesObject <- packsToGRanges(packMatches)
packMatches <- getPacksFromGRanges(GRangesObject)

data(packMatches)

GRangesObject <- packsToGRanges(packMatches)
packMatches <- getPacksFromGRanges(GRangesObject)

Get Flanking Terminal Site Duplication Sequences

Description

Gets the flanking TSD sequences of TIRs or predicted Pack-TYPE transposable elements. A dataframe of these elements can be in tirMatches.

Usage

getTsds(tirMatches, Genome, tsdLength, strand = "+", output = "character")
getTsds(tirMatches, Genome, tsdLength, strand = "+", output = "character")

Arguments

`tirMatches`	A dataframe containing genomic ranges and names referring to TIR sequences or predicted Pack-TYPE transposable elements. Should be in the format used by `packSearch`.
`Genome`	A `DNAStringSet` object containing sequences referred to in `tirMatches`.
`tsdLength`	The length of the TSD region to be retrieved (integer).
`strand`	The strand of the TIR; "+" for forward, "-" for reverse. If the TSD sequences of transposable elements are being predicted, then this parameter can be left as default ("+"); if the TSD sequences of TIRs are being found then the strand direction must be supplied.
`output`	The type of object to be returned: output = "DNAStringSet", returns a `DNAStringSet` object. output = "character", returns a `character` vector (default).

Details

Called by packSearch. It is recommended to use the general pipeline function packSearch for identification of potential pack elements, which returns TSD sequences as a feature of results, however each stage may be called individually.

Value

Flanking TSD sequences as a vector of characters, or if output is specified as "DNAStringSet", TSD sequences will be returned as a DNAStringSet object.

Author(s)

Jack Gisby

Examples

data(arabidopsisThalianaRefseq)
data(packMatches)

tsdSeqs <- getTsds(packMatches, arabidopsisThalianaRefseq, 3)

data(arabidopsisThalianaRefseq)
data(packMatches)

tsdSeqs <- getTsds(packMatches, arabidopsisThalianaRefseq, 3)

Pack Element Filtering

Description

Primary filtering stage for the packSearch algorithm. Identifies potential Pack-TYPE transposable elements based on proximity of matching inverted repeats and equality of TSD sequences.

Usage

identifyPotentialPackElements(
  forwardMatches,
  reverseMatches,
  Genome,
  elementLength,
  tsdMismatch = 0
)
identifyPotentialPackElements(
  forwardMatches,
  reverseMatches,
  Genome,
  elementLength,
  tsdMismatch = 0
)

Arguments

`forwardMatches`	A dataframe containing genomic ranges and names referring to forwards-facing TIR sequences and their respective TSD sequences.
`reverseMatches`	A dataframe containing genomic ranges and names referring to reverse-facing TIR sequences and their respective TSD sequences.
`Genome`	A DNAStringSet object containing the matches referred to in `forwardMatches` and `reverseMatches`
`elementLength`	A vector of two integers containing the minimum and maximum transposable element length.
`tsdMismatch`	An integer referring to the allowable mismatch (substitutions or indels) between a transposon's TSD sequences. `matchPattern` from Biostrings is used for pattern matching.

Details

Used by packSearch as a primariy filtering stage. Identifies matches likely to be transposons based on their TIR region, from identifyTirMatches, and their TSD region, from getTsds. It is recommended to use the general pipeline function packSearch for identification of potential pack elements, however each stage may be called individually. Note that only exact TSD matches are considered, so supplying long sequences for TSD elements may lead to false-negative results.

Value

A dataframe, packMatches, containing the locations of potential Pack-TYPE transposable elements in Genome.

Author(s)

Jack Gisby

Examples

data(arabidopsisThalianaRefseq)

forwardMatches <- identifyTirMatches(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    tsdLength = 3,
    strand = "+"
)

reverseMatches <- identifyTirMatches(
    Biostrings::reverseComplement(Biostrings::DNAString("CACTACAA")),
    arabidopsisThalianaRefseq,
    tsdLength = 3,
    strand = "-"
)

packMatches <- identifyPotentialPackElements(
    forwardMatches, 
    reverseMatches,
    arabidopsisThalianaRefseq, 
    c(300, 3500)
)

data(arabidopsisThalianaRefseq)

forwardMatches <- identifyTirMatches(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    tsdLength = 3,
    strand = "+"
)

reverseMatches <- identifyTirMatches(
    Biostrings::reverseComplement(Biostrings::DNAString("CACTACAA")),
    arabidopsisThalianaRefseq,
    tsdLength = 3,
    strand = "-"
)

packMatches <- identifyPotentialPackElements(
    forwardMatches, 
    reverseMatches,
    arabidopsisThalianaRefseq, 
    c(300, 3500)
)

Identify Terminal Inverted Repeat Matches

Description

Searches a DNAStringSet for potential TIRs based on sequence similarity.

Usage

identifyTirMatches(
  tirSeq,
  Genome,
  mismatch = 0,
  strand = "*",
  tsdLength,
  fixed = TRUE
)
identifyTirMatches(
  tirSeq,
  Genome,
  mismatch = 0,
  strand = "*",
  tsdLength,
  fixed = TRUE
)

Arguments

`tirSeq`	A `DNAString` object to be searched for.
`Genome`	A `DNAStringSet` object containing the `DNAString` objects to be searched.
`mismatch`	The allowable mismatch between `tirSeq` and a given slice of `Genome`. Includes indels.
`strand`	The directionality of the search string ("+" or "-"). Note that this does affect the search for tirSeqs, if you wish to search the reverse strand you should use the reverse complement of your sequence.
`tsdLength`	Integer referring to the length of the flanking TSD region.
`fixed`	Logical that will be passed to the 'fixed' argument of `matchPattern`. Determines the behaviour of IUPAC ambiguity codes when searching for TIR sequences.

Details

Called by packSearch. Used by packSearch as an initial filtering stage. matchPattern from Biostrings is used for pattern matching. It is recommended to use the general pipeline function packSearch for identification of potential pack elements, however each stage may be called individually.

Value

A dataframe, tirMatches, containing identified matches. The dataframe is in the format generated by packSearch.

Author(s)

Jack Gisby

Examples

data(arabidopsisThalianaRefseq)

forwardMatches <- identifyTirMatches(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    tsdLength = 3,
    strand = "+"
)

data(arabidopsisThalianaRefseq)

forwardMatches <- identifyTirMatches(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    tsdLength = 3,
    strand = "+"
)

Make Blast Database

Description

Generates a BLAST database to be queried. Required for identifying sequences using the BLAST+ software.

Usage

makeBlastDb(fastaFile, dbPath, blastPath, dbType = "nucl")
makeBlastDb(fastaFile, dbPath, blastPath, dbType = "nucl")

Arguments

`fastaFile`	FASTA file containing sequences to generate a BLAST database from.
`dbPath`	Path to save the BLAST database to.
`blastPath`	Path/name of BLAST program to use. Name of the application for Linux/MacOS, absolute path for the executable for windows users.
`dbType`	Type of BLAST database to create, e.g. "nucl" for a nucleotide database.

Value

No return value; generates a blast database in the chosen directory.

Author(s)

Jack Gisby

References

For further information, see the NCBI BLAST+ application documentation and help pages (https://www.ncbi.nlm.nih.gov/pubmed/20003500?dopt=Citation).

Examples

## Not run: 
makeBlastDb("genes.fasta", "blastdb.db", "C:/blast.exe")

## End(Not run)

## Not run: 
makeBlastDb("genes.fasta", "blastdb.db", "C:/blast.exe")

## End(Not run)

Global Alignment with VSEARCH

Description

A global pairwise alignment of pack-TYPE elements by sequence similarity. mIt may be useful to run packClust to identify groups of similar transposable elements, before generating alignments of each group.

Usage

packAlign(
  packMatches,
  Genome,
  identity = 0,
  threads = 1,
  identityDefinition = 2,
  maxWildcards = 0.05,
  saveFolder,
  vSearchPath = "vsearch"
)
packAlign(
  packMatches,
  Genome,
  identity = 0,
  threads = 1,
  identityDefinition = 2,
  maxWildcards = 0.05,
  saveFolder,
  vSearchPath = "vsearch"
)

Arguments

`packMatches`	A dataframe of potential Pack-TYPE transposable elements, in the format given by `packSearch`. This dataframe is in the format produced by coercing a `link[GenomicRanges:GRanges-class]{GRanges}` object to a dataframe: `data.frame(GRanges)`. Will be saved as a FASTA file for VSEARCH.
`Genome`	A DNAStringSet object containing sequences referred to in `packMatches` (the object originally used to predict the transposons `packSearch`).
`identity`	The sequence identity of two transposable elements in `packMatches` required to be grouped into a cluster.
`threads`	The number of threads to be used by VSEARCH.
`identityDefinition`	The pairwise identity definition used by VSEARCH. Defaults to 2, the standard VSEARCH definition.
`maxWildcards`	The maximal allowable proportion of wildcards in the sequence of each match (defaults to `0.05`).
`saveFolder`	The folder to save saveFolder files (uc, blast6out, FASTA)
`vSearchPath`	When the package is run on windows systems, the location of the VSEARCH executable file must be given; this should be left as default on Linux/MacOS systems.

Value

Saves alignment information, including a uc, blast6out and a pairwise alignment fasta file, to the specified location. Returns the uc summary file generated by the alignment.

Note

In order to align sequences using VSEARCH, the executable file must first be installed.

Author(s)

Jack Gisby

References

VSEARCH may be downloaded from https://github.com/torognes/vsearch, along with a manual documenting the program's parameters. See https://www.ncbi.nlm.nih.gov/pubmed/27781170 for further information.

Examples

data(arabidopsisThalianaRefseq)
data(packMatches)

# packAlign run on a Linux/MacOS system
## Not run: 
    packAlign(packMatches, Genome)

## End(Not run)

# packAlign run on a Windows system
## Not run: 
    packAlign(packMatches, Genome, 
            vSearchPath = "path/to/vsearch/vsearch.exe")

## End(Not run)

data(arabidopsisThalianaRefseq)
data(packMatches)

# packAlign run on a Linux/MacOS system
## Not run: 
    packAlign(packMatches, Genome)

## End(Not run)

# packAlign run on a Windows system
## Not run: 
    packAlign(packMatches, Genome, 
            vSearchPath = "path/to/vsearch/vsearch.exe")

## End(Not run)

Pipeline for BLAST/Classification of PackTYPE Elements

Description

Run BLAST against user-specified databases of non-transposon and transposon-relates proteins. Can be used to classify transposons based on their internal sequences.

Usage

packBlast(
  packMatches,
  Genome,
  blastPath,
  protDb,
  autoDb,
  minE = 0.001,
  blastTask = "blastn-short",
  maxHits = 100,
  threads = 1,
  saveFolder = NULL,
  tirCutoff = 100,
  autoCutoff = 1e-05,
  autoLength = 150,
  autoIdentity = 70,
  autoScope = NULL,
  protCutoff = 1e-05,
  protLength = 250,
  protIdentity = 70,
  protScope = 0.3
)
packBlast(
  packMatches,
  Genome,
  blastPath,
  protDb,
  autoDb,
  minE = 0.001,
  blastTask = "blastn-short",
  maxHits = 100,
  threads = 1,
  saveFolder = NULL,
  tirCutoff = 100,
  autoCutoff = 1e-05,
  autoLength = 150,
  autoIdentity = 70,
  autoScope = NULL,
  protCutoff = 1e-05,
  protLength = 250,
  protIdentity = 70,
  protScope = 0.3
)

Arguments

`packMatches`	A dataframe of potential Pack-TYPE transposable elements, in the format given by `packSearch`. This dataframe is in the format produced by coercing a `link[GenomicRanges:GRanges-class]{GRanges}` object to a dataframe: `data.frame(GRanges)`. Will be saved as a FASTA file for VSEARCH.
`Genome`	A DNAStringSet object containing sequences referred to in `packMatches` (the object originally used to predict the transposons `packSearch`).
`blastPath`	Path to the BLAST+ executable, or name of the BLAST+ application for Linux/MacOS users.
`protDb`	For assigning Pack-TYPE elements. Path to the blast database containing nucleotide or protein sequences to be matched against internal transposon sequences. Can be generated using BLAST+, or with `link{makeBlastDb}`.
`autoDb`	For assigning autonomous elements. Path to the blast database containing nucleotide or protein sequences to be matched against internal transposon sequences. Can be generated using BLAST+, or with `link{makeBlastDb}`.
`minE`	Blast results with e values greater than the specified cutoff will be ignored. This will be passed to BLASTN and applied to both transposon and non-transposon matches.
`blastTask`	Type of BLAST+ task, defaults to "blastn-short".
`maxHits`	Maximum hits returned by BLAST+ per query.
`threads`	Allowable number of threads to be utilised by BLAST+.
`saveFolder`	Directory to save BLAST+ results in; defaults to the working directory.
`tirCutoff`	How many bases to ignore at the terminal ends of the transposons to prevent hits to TIR sequences.
`autoCutoff`	Blast results for transposon-related elements will be filtered to ignore those with e values above the specified cutoff.
`autoLength`	Blast results for transposon-related elements containing hits with alignment lengths lower than this value will be ignored
`autoIdentity`	Blast results for transposon-related elements containing hits with sequence identities lower than this value will be ignored
`autoScope`	If specified, transposon-related blast results below the specified value will be ignored. Note that the dataframe of transposon matches must also be supplied to calculate scope. Scope is the proportion of the transposon's internal sequence occupied by the BLAST hit.
`protCutoff`	Blast results for genic/other matches will be filtered to ignore those with e values above the specified cutoff.
`protLength`	Blast results for genic/other matches containing hits with alignment lengths lower than this value will be ignored
`protIdentity`	Blast results for genic/other matches containing hits with sequence identities lower than this value will be ignored
`protScope`	If specified, genic/other blast matches below the specified value will be ignored. Note that the dataframe of transposon matches must also be supplied to calculate scope. Scope is the proportion of the transposon's internal sequence occupied by the BLAST hit.

Value

Returns the original packMatches dataframe, with the addition of a "classification" column containing one of the following values:

auto - elements that match known transposases or transposon-related proteins are classified as autonomous elements
pack - elements that match other proteins or genic sequences may be classified as Pack-TYPE elements
other - elements that generate no significant hits

Author(s)

Jack Gisby

References

For further information, see the NCBI BLAST+ application documentation and help pages (https://www.ncbi.nlm.nih.gov/pubmed/20003500?dopt=Citation).

Examples

## Not run: 
packMatches <- data(packMatches)
Genome <- data(arabidopsisThalianaRefseq)

packBlast(packMatches, Genome, 
    protDb = "C:/data/TAIR10_CDS", 
    autoDb = "C:/data/TAIR10_transposons", 
    blastPath = "C:/blast/bin/blastn.exe")

## End(Not run)

## Not run: 
packMatches <- data(packMatches)
Genome <- data(arabidopsisThalianaRefseq)

packBlast(packMatches, Genome, 
    protDb = "C:/data/TAIR10_CDS", 
    autoDb = "C:/data/TAIR10_transposons", 
    blastPath = "C:/blast/bin/blastn.exe")

## End(Not run)

Cluster Transposons with VSEARCH

Description

Cluster potential pack-TYPE elements by sequence similarity. Resulting groups may be aligned with packAlign, or the clusters may be analysed with tirClust

Usage

packClust(
  packMatches,
  Genome,
  identity = 0.6,
  threads = 1,
  identityDefinition = 2,
  maxWildcards = 0.05,
  strand = "both",
  saveFolder = NULL,
  vSearchPath = "vsearch"
)
packClust(
  packMatches,
  Genome,
  identity = 0.6,
  threads = 1,
  identityDefinition = 2,
  maxWildcards = 0.05,
  strand = "both",
  saveFolder = NULL,
  vSearchPath = "vsearch"
)

Arguments

`packMatches`	A dataframe of potential Pack-TYPE transposable elements, in the format given by `packSearch`. This dataframe is in the format produced by coercing a `link[GenomicRanges:GRanges-class]{GRanges}` object to a dataframe: `data.frame(GRanges)`. Will be saved as a FASTA file for VSEARCH.
`Genome`	A DNAStringSet object containing sequences referred to in `packMatches` (the object originally used to predict the transposons `packSearch`).
`identity`	The sequence identity of two transposable elements in `packMatches` required to be grouped into a cluster.
`threads`	The number of threads to be used by VSEARCH.
`identityDefinition`	The pairwise identity definition used by VSEARCH. Defaults to 2, the standard VSEARCH definition.
`maxWildcards`	The maximal allowable proportion of wildcards in the sequence of each match (defaults to `0.05`).
`strand`	The strand direction (+, - or *) to be clustered.
`saveFolder`	The folder to save output files (uc, blast6out, FASTA)
`vSearchPath`	When the package is run on windows systems, the location of the VSEARCH executable file must be given; this should be left as default on Linux/MacOS systems.

Value

Saves cluster information, including a uc and blast6out file, to the specified location. Returns the given packMatches dataframe with an additional column, cluster, containing cluster IDs.

Note

In order to cluster sequences using VSEARCH, the executable file must first be installed.

Author(s)

Jack Gisby

References

VSEARCH may be downloaded from https://github.com/torognes/vsearch. See https://www.ncbi.nlm.nih.gov/pubmed/27781170 for further information.

Examples

data(arabidopsisThalianaRefseq)
data(packMatches)

# packClust run on a Linux/MacOS system
## Not run: 
    packClust(packMatches, Genome)

## End(Not run)

# packClust run on a Windows system
## Not run: 
    packClust(packMatches, Genome, 
            vSearchPath = "path/to/vsearch/vsearch.exe")

## End(Not run)

data(arabidopsisThalianaRefseq)
data(packMatches)

# packClust run on a Linux/MacOS system
## Not run: 
    packClust(packMatches, Genome)

## End(Not run)

# packClust run on a Windows system
## Not run: 
    packClust(packMatches, Genome, 
            vSearchPath = "path/to/vsearch/vsearch.exe")

## End(Not run)

packFinder: a package for the de novo Annotation of Pack-TYPE Transposable Elements

Description

Algorithm and tools for in silico pack-TYPE transposon discovery. Filters a given genome for properties unique to DNA transposons and provides tools for the investigation of returned matches.

Main Algorithm

The goal of packFinder was to implement a simple tool for the prediction of potential Pack-TYPE elements. packFinder uses the following prior knowledge, provided by the user, to detect transposons:

Terminal Inverted Repeat (TIR) Base Sequence
Length of Terminal Site Duplication (TSD)
Length of the Transposon

These features provide enough information to detect autonomous and pack-TYPE elements. For a transposon to be predicted by packFinder its TSD sequences must be identical to each other, its forward TIR sequence must match the base sequence provided and its reverse TIR sequence must match its reverse complement.

Transposons are therefore predicted by searching a given genome for these characteristics, and further analysis steps can reveal the nature of these elements - while the packFinder tool is sensitive for the detection of transposons, it does not discriminate between autonomous and Pack-TYPE elements. Autonomous elements will contain a transposase gene within the terminal inverted repeats and tend to be larger than their Pack-TYPE counterparts; pack-TYPE elements instead capture sections of host genomes. Following cluster analysis, BLAST can be used to discern which predicted elements are autonomous (transposase-containing) and with are true Pack-TYPE elements.

Workflow

An example of a standard workflow can be found using browseVignettes(package = "packFinder"). The primary functions include:

packSearch - the packSearch algorithm uses simple pattern matching to detect DNA transposons.
packClust - VSEARCH is used for clustering elements based on sequence similarity.

Having obtained the sequences of transposable elements in a given genome, it is recommende to carry out a BLAST search for each transposon cluster. This can identify which elements are likely autonomous, and which may be Pack-TYPE.

The packFinder functions report the position of elements in a given genome using a dataframe in the format of packMatches. This dataframe is in the format produced by coercing a link[GenomicRanges:GRanges-class]{GRanges} object to a dataframe: data.frame(GRanges).

Author(s)

Jack Gisby

Sample packFinder Output

Description

A sample output from packSearch with cluster information. This dataframe is in the format produced by coercing a link[GenomicRanges:GRanges-class]{GRanges} object to a dataframe: data.frame(GRanges).

Usage

data(packMatches)
data(packMatches)

Format

A dataframe of 9 obs. and 7 variables.

Details

Was obtained from running packSearch on the Arabidopsis thaliana chromosome 3 reference sequence, followed by clustering using packClust. Contains the following features:

start - the predicted element's start base sequence position.
end - the predicted element's end base sequence position.
seqnames - character string referring to the sequence name in Genome to which start and end refer to.

The dataset was generated as in the example below.

Examples

data(arabidopsisThalianaRefseq)

packMatches <- packSearch(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    elementLength = c(300, 3500),
    tsdLength = 3
)

data(arabidopsisThalianaRefseq)

packMatches <- packSearch(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    elementLength = c(300, 3500),
    tsdLength = 3
)

packFinder Algorithm Pipeline

Description

General use pipeline function for the Pack-TYPE transposon finding algorithm.

Usage

packSearch(
  tirSeq,
  Genome,
  mismatch = 0,
  elementLength,
  tsdLength,
  tsdMismatch = 0,
  fixed = TRUE
)
packSearch(
  tirSeq,
  Genome,
  mismatch = 0,
  elementLength,
  tsdLength,
  tsdMismatch = 0,
  fixed = TRUE
)

Arguments

`tirSeq`	A `DNAString` object containing the TIR sequence to be searched for.
`Genome`	A `DNAStringSet` object to be searched.
`mismatch`	The maximum edit distance to be considered for TIR matches (indels + substitions). See `matchPattern` for details.
`elementLength`	The maximum element length to be considered, as a vector of two integers. E.g. `c(300, 3500)`
`tsdLength`	Integer referring to the length of the flanking TSD region.
`tsdMismatch`	An integer referring to the allowable mismatch (substitutions or indels) between a transposon's TSD sequences. `matchPattern` from Biostrings is used for pattern matching.
`fixed`	Logical that will be passed to the 'fixed' argument of `matchPattern`. Determines the behaviour of IUPAC ambiguity codes when searching for TIR sequences.

Details

Finds potential pack-TYPE elements based on:

Similarity of TIR sequence to tirSeq
Proximity of potential TIR sequences
Directionality of TIR sequences
Similarity of TSD sequences

The algorithm finds potential forward and reverse TIR sequences using identifyTirMatches and their associated TSD sequence via getTsds. The main filtering stage, identifyPotentialPackElements, filters matches to obtain a dataframe of potential PACK elements. Note that this pipeline does not consider the possibility of discovered elements being autonomous elements, so it is recommended to cluster and/or BLAST elements for further analysis. Furthermore, only exact TSD matches are considered, so supplying long sequences for TSD elements may lead to false-negative results.

Value

A dataframe, containing elements identified by thealgorithm. These may be autonomous or pack-TYPE elements. Will contain the following features:

start - the predicted element's start base sequence position.
end - the predicted element's end base sequence position.
seqnames - character string referring to the sequence name in Genome to which start and end refer to.
width - the width of the predicted element.
strand - the strand direction of the transposable element. This will be set to "*" as the packSearch function does not consider transposons to have a direction - only TIR sequences. Passing the packMatches dataframe to packClust will assign a direction to each predicted Pack-TYPE element.

This dataframe is in the format produced by coercing a link[GenomicRanges:GRanges-class]{GRanges} object to a dataframe: data.frame(GRanges). Downstream functions, such as packClust, use this dataframe to manipulate predicted transposable elements.

Note

This algorithm does not consider:

Autonomous elements - autonomous elements will be predicted by this algorithm as there is no BLAST step. It is recommended that, after clustering elements using packClust, the user analyses each group to determine which predicted elements are autonomous and which are likely Pack-TYPE elements. Alternatively, databases such as Repbase (https://www.girinst.org/repbase/) supply annotations for autonomous transposable elements that can be used to filter autonomous matches.
TSD Mismatches - if two TIRs do not have exact matches for their terminal site duplications they will be ignored. Supplying longer TSD sequences will likely lead to a lower false-positive rate, however may also cause a greater rate of false-negative results.

Pattern matching is done via matchPattern.

Author(s)

Jack Gisby

Examples

data(arabidopsisThalianaRefseq)

packMatches <- packSearch(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    elementLength = c(300, 3500),
    tsdLength = 3
)

data(arabidopsisThalianaRefseq)

packMatches <- packSearch(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    elementLength = c(300, 3500),
    tsdLength = 3
)

Save packFinder Results in CSV Format (.csv)

Description

Saves a dataframe of potential Pack-TYPE elements, usually generated via packSearch. May be retrieved using getPacksFromCsv.

Usage

packsToCsv(packMatches, file)
packsToCsv(packMatches, file)

Arguments

packMatches

A dataframe containing genomic ranges and names referring to sequences to be extracted. Can be obtained from packSearch or generated from a GRanges object, after conversion to a dataframe. Must contain the following features:

start - the predicted element's start base sequence position.
end - the predicted element's end base sequence position.
seqnames - character string referring to the sequence name in Genome to which start and end refer to.

file

CSV file save path.

Value

Save location of csv file.

Author(s)

Jack Gisby

Examples

data(packMatches)

packsToCsv(
    packMatches, 
    system.file("extdata", "packMatches.csv", package = "packFinder")
)

data(packMatches)

packsToCsv(
    packMatches, 
    system.file("extdata", "packMatches.csv", package = "packFinder")
)

Save packFinder Results in FASTA Format (.fasta)

Description

Saves a dataframe of potential Pack-TYPE elements, usually generated via packSearch. May be retrieved using getPacksFromFasta.

Usage

packsToFasta(packMatches, file, Genome)
packsToFasta(packMatches, file, Genome)

Arguments

packMatches

taframe containing genomic ranges and names referring to sequences to be extracted. Can be obtained from packSearch or generated from a GRanges object, after conversion to a dataframe. Must contain the following features:

start - the predicted element's start base sequence position.
end - the predicted element's end base sequence position.
seqnames - character string referring to the sequence name in Genome to which start and end refer to.

file

FASTA file save path.

Genome

A DNAStringSet object containing sequences referred to in packMatches (the object originally used to predict the transposons packSearch).

Value

Save location of Fasta file.

Author(s)

Jack Gisby

Examples

data(arabidopsisThalianaRefseq)
data(packMatches)

packsToFasta(
    packMatches, 
    system.file("extdata", "packMatches.fasta", package = "packFinder"), 
    arabidopsisThalianaRefseq
)

data(arabidopsisThalianaRefseq)
data(packMatches)

packsToFasta(
    packMatches, 
    system.file("extdata", "packMatches.fasta", package = "packFinder"), 
    arabidopsisThalianaRefseq
)

Export packFinder Results to a GRanges Object

Description

A dataframe containing genomic ranges and names referring to sequences to be extracted, likely obtained from packSearch, can be converted to a GRanges object. Can be converted back to a dataframe using getPacksFromGRanges. Additional features, such as clusters and TSD sequences, will be included in the object as metadata columns.

Usage

packsToGRanges(packMatches)
packsToGRanges(packMatches)

Arguments

packMatches

start - the predicted element's start base sequence position.
end - the predicted element's end base sequence position.
seqnames - character string referring to the sequence name in Genome to which start and end refer to.

Value

A GRanges object containing the ranges contained in packMatches and additional metadata columns. May be easily converted between dataframe and GRanges format for use in the packFinder package and link[GenomicRanges:GRanges-class]{GRanges} package. Note that most functions in the packFinder package require sequence ranges to be provided in dataframe format.

Author(s)

Jack Gisby

Examples

data(packMatches)
packGRanges <- packsToGRanges(packMatches)

data(packMatches)
packGRanges <- packsToGRanges(packMatches)

Convert NCBI BLAST+ Files to Dataframe

Description

Reads .blast6out files (NCBI Blast Format) generated by the VSEARCH clustering and alignment algorithms.

Usage

readBlast(
  file,
  minE = 1,
  length = 0,
  identity = 0,
  removeExactMatches = FALSE,
  scope = NULL,
  packMatches = NULL
)
readBlast(
  file,
  minE = 1,
  length = 0,
  identity = 0,
  removeExactMatches = FALSE,
  scope = NULL,
  packMatches = NULL
)

Arguments

`file`	The file path of the blast file.
`minE`	Blast results with e values greater than the specified cutoff will be ignored.
`length`	Blast results alignment lengths lower below this value will be ignored
`identity`	Blast results with target sequence identities below this value will be ignored.
`removeExactMatches`	If true, matches with 100 be ignored to prevent self-hits.
`scope`	If specified, blast results below the specified value will be ignored. Note that the dataframe of transposon matches must also be supplied to calculate scope. Scope is the proportion of the transposon's internal sequence occupied by the BLAST hit.
`packMatches`	taframe containing genomic ranges and names referring to sequences to be extracted. Can be obtained from `packSearch` or generated from a `GRanges` object, after conversion to a dataframe. Must contain the following features: start - the predicted element's start base sequence position. end - the predicted element's end base sequence position. seqnames - character string referring to the sequence name in `Genome` to which `start` and `end` refer to.

Details

blast6out file is tab-separated text file compatible with NCBI BLAST m8 and NCBI BLAST+ outfmt 6 formats. One cluster/alignment can be found for each line.

Value

A dataframe containing the converted .blast6out file. The file contains the following features:

Query sequence ID
Target sequence ID
Percenty sequence identity
Alignment length
Number of mismatches
Number of gaps
Base position of alignment start in query sequence
Base position of alignment end in query sequence
Base position of alignment start in target sequence
Base position of alignment end in target sequence
E-value
Bit score

Author(s)

Jack Gisby

References

For further information, see the NCBI BLAST+ application documentation and help pages (https://www.ncbi.nlm.nih.gov/pubmed/20003500?dopt=Citation). VSEARCH may be downloaded from https://github.com/torognes/vsearch; see https://www.ncbi.nlm.nih.gov/pubmed/27781170 for further information.

Examples

readBlast(system.file(
    "extdata", 
    "packMatches.blast6out", 
    package = "packFinder"
))

readBlast(system.file(
    "extdata", 
    "packMatches.blast6out", 
    package = "packFinder"
))

Convert .uc Files to Dataframe

Description

Reads .uc files (USEARCH Cluster Format) generated by the VSEARCH clustering and alignment algorithms.

Usage

readUc(file, output = "cluster")
readUc(file, output = "cluster")

Arguments

file

The file path of the .uc file.

output

The type of analysis that was carried out to produce the .uc file.

If output is specified as "cluster", VSEARCH clustering was carried out.
If output is specified as "alignment", VSEARCH pairwise global alignment was carried out.

Note that clustering produces one "H" record for each sequence, and one "C" record for each cluster, while an alignment produces an "H" record for each alignment (see details).

Details

USEARCH cluster format is a tab separated text file that contains clustering and/or alignment information for a set of sequences. For each sequence a record type, "H, C or N", is provided providing information about the type of "hit" in the dataframe. These refer to:

H - Hit - for alignments, indicates an identified alignment of two supplied sequences. For clustering, indicates the cluster assignment for a query.
C - Cluster record - a record for each cluster generated.
N - No hit - indicates that no cluster was assigned or no alignment was found with a target sequence. For clustering, a query with no hits becomes the centroid of a new cluster.

Additionally, for each record a "compressed alignment" is generated. This is the alignment represented in a compact format including the letters "M", "D", and "I". Before each letter, the number of consecutive columns of the given letter type is also given. The letter types are as follows:

"M" - Match - Identical bases between the query and target sequence
"D" - Deletion - A gap in the target sequence
"I" - Insertion - A gap in the query sequence

An example of this would be "13M", referring to 13 consecutive matches between the query and target sequence.

Value

A dataframe containing the converted .uc file. The fields contained within are as follows:

Record type - "H, C or N", see details for further information.
Cluster designation (output = "cluster" only)
Sequence length, or cluster size
Percent identity to target
The nucleotide strand (output = "cluster" only)
A compressed alignment - see details for further information.
ID of query sequence
ID of target sequence ("H" records only)

Author(s)

Jack Gisby

References

VSEARCH may be downloaded from https://github.com/torognes/vsearch. See https://www.ncbi.nlm.nih.gov/pubmed/27781170 for further information.

Examples

readUc(system.file(
    "extdata", 
    "packMatches.uc", 
    package = "packFinder"
))

readUc(system.file(
    "extdata", 
    "packMatches.uc", 
    package = "packFinder"
))

Analyse TIR Sequences of Pre-clustered Transposable Elements

Description

Takes transposable elements clustered by VSEARCH, packClust, and produces consensus sequences for the terminal inverted repeats of each. Allows for the visualisation of TIR similarities between clusters for both forward and reverse strands.

Usage

tirClust(
  packMatches,
  Genome,
  tirLength = 25,
  plot = TRUE,
  plotSavePath = NULL,
  k = 5,
  output = "consensus"
)
tirClust(
  packMatches,
  Genome,
  tirLength = 25,
  plot = TRUE,
  plotSavePath = NULL,
  k = 5,
  output = "consensus"
)

Arguments

`packMatches`	A dataframe containing genomic ranges and names referring to sequences to be extracted. This dataframe is in the format produced by coercing a `link[GenomicRanges:GRanges-class]{GRanges}` object to a dataframe: `data.frame(GRanges)`. Must contain the following features: start - the predicted element's start base sequence position. end - the predicted element's end base sequence position. seqnames - character string referring to the sequence name in `Genome` to which `start` and `end` refer to.
`Genome`	A DNAStringSet object containing sequences referred to in `packMatches` (the object originally used to predict the transposons `packSearch`).
`tirLength`	The TIR size to be considered. Consensus sequences will be generated based on the first and last `tirLength` bases of a transposon.
`plot`	Argument specifying whether the TIR consensus sequences should be plottted as a dendrogram.
`plotSavePath`	File path for the dendrogram plot. If unspecified, the dendrogram plot is not saved.
`k`	The k-mer size to be used for calculating a distance matrix between TIR consensus sequences. See `kdistance`. Larger word sizes will not be suitable for longer TIR sequences, due to processing time required. Additionally, k must be greater than the TIR sequence length.
`output`	Controls the output of `tirClust`. If output is specified as "consensus", the consensus sequences of each TIR cluster will be returned; else, if output is specified as "dendrogram", a dendrogram object will be returned for creation of customisable plots.

Value

If output is specified as "consensus" (default), returns a list of consensus sequences for each cluster specified in packMatches as a DNAStringSet. Else if output is specified as "dendrogram", returns a dendrogram object used to create hierarchical clustering diagrams.

Author(s)

Jack Gisby

Examples

data(arabidopsisThalianaRefseq)
data(packMatches)

tirClust(packMatches, arabidopsisThalianaRefseq)

data(arabidopsisThalianaRefseq)
data(packMatches)

tirClust(packMatches, arabidopsisThalianaRefseq)

Package 'packFinder'

Help Index

Arabidopsis thaliana Refseq Genome Chromosome 3 Subset

Description

Usage

Format

Author(s)

Source

See Also

Examples

BLAST Analysis of PackTYPE Elements

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Functional Annotation of PackTYPE Elements

Description

Usage

Arguments

Value

Note

Author(s)

References

See Also

Examples

Collapse Overlapping Sequences

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Remove Low Quality Sequences

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Extract Sequences of Pack-TYPE Elements

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Retrieve Saved packFinder Results (.csv)

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Retrieve Saved packFinder Results (.fasta)

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Retrieve packFinder Results from GRanges Object

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Get Flanking Terminal Site Duplication Sequences

Description

Usage