Title: | Reporting and data analysis functionalities for Rep-Seq datasets of antibody libraries |
---|---|
Description: | AbSeq is a comprehensive bioinformatic pipeline for the analysis of sequencing datasets generated from antibody libraries and abseqR is one of its packages. abseqR empowers the users of abseqPy (https://github.com/malhamdoosh/abseqPy) with plotting and reporting capabilities and allows them to generate interactive HTML reports for the convenience of viewing and sharing with other researchers. Additionally, abseqR extends abseqPy to compare multiple repertoire analyses and perform further downstream analysis on its output. |
Authors: | JiaHong Fong [cre, aut], Monther Alhamdoosh [aut] |
Maintainer: | JiaHong Fong <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 1.25.0 |
Built: | 2024-12-06 05:59:10 UTC |
Source: | https://github.com/bioc/abseqR |
Conducts abundance analysis
.abundanceAnalysis(abundanceDirectories, abunOut, sampleNames, combinedNames, mashedNames, skipDgene = FALSE, .save = TRUE)
.abundanceAnalysis(abundanceDirectories, abunOut, sampleNames, combinedNames, mashedNames, skipDgene = FALSE, .save = TRUE)
abundanceDirectories |
list type. List of sample directories |
abunOut |
string type. Output directory |
sampleNames |
vector type. 1-1 correspondence with abundanceDirectories |
combinedNames |
string type. Title "combined" sample names |
mashedNames |
string type. File "mashed" names - avoid special chars |
skipDgene |
logical type. Skip D gene plots? |
.save |
logical type. Save ggplot as Rdata |
None
Abundance distribution
.abundancePlot(files, sampleNames, outputDir, skipDgene = FALSE, .save = TRUE)
.abundancePlot(files, sampleNames, outputDir, skipDgene = FALSE, .save = TRUE)
files |
list type. list of files in abundance directory |
sampleNames |
vector type. 1-1 correspondance to files |
outputDir |
string type. |
skipDgene |
logical type. Skip D germline abundance plot if TRUE. |
.save |
logical type. Save Rdata ggplot item |
None
Plots alignment quality vs:
mismatches
gaps
bitscore
percent identity
subject start
.alignQualityHeatMaps(abundanceDirectory, sampleName)
.alignQualityHeatMaps(abundanceDirectory, sampleName)
abundanceDirectory |
character type. fully qualified path to abundance directory |
sampleName |
character type. sample name |
list of ggplotly heatmaps
Collect primer names from FASTA
.allPrimerNames(primerFile)
.allPrimerNames(primerFile)
primerFile |
string type. Path to primer file |
vector of primer names as seen in primerFile
Plots amino acid composition logo
.aminoAcidBar(df, scale, region, germ = "")
.aminoAcidBar(df, scale, region, germ = "")
df |
dataframe |
scale |
logical. scale to proportion? |
region |
string. which region is this |
germ |
string. V germline family |
ggplot2 object
Plots 2 kinds: scaled and unscaled composition logos
.aminoAcidPlot(compositionDirectory, outdir, sampleName, regions = c("FR1", "CDR1", "FR2", "CDR2", "FR3", "CDR3", "FR4"), .save = TRUE)
.aminoAcidPlot(compositionDirectory, outdir, sampleName, regions = c("FR1", "CDR1", "FR2", "CDR2", "FR3", "CDR3", "FR4"), .save = TRUE)
compositionDirectory |
string type. |
outdir |
string type. |
sampleName |
string type. |
regions |
logical type. vector of FR/CDR regions to plot |
.save |
logical type. save ggplot object |
none
Plots the distribution of valid, faulty, and missing start codon in IGV germlines (repeated for gene and family levels).
.analyzeUpstreamValidity(upstreamDirectories, upstreamOut, expectedLength, upstreamLengthRange, sampleNames, combinedNames, mashedNames, .save = TRUE)
.analyzeUpstreamValidity(upstreamDirectories, upstreamOut, expectedLength, upstreamLengthRange, sampleNames, combinedNames, mashedNames, .save = TRUE)
upstreamDirectories |
list type. List of sample directories |
upstreamOut |
string type. Output directory |
expectedLength |
int type. Expected length of upstream sequences. (i.e. upstream_end - upstream_start + 1). If this is infinite, no plots will be generated. |
upstreamLengthRange |
string type. start_end format |
sampleNames |
vector type. 1-1 with upstream directories |
combinedNames |
string type. Title friendly "combined" sample names |
mashedNames |
string type. File friendly "mashed-up" sample names |
.save |
logical type. Save Rdata? |
None
Annotation analysis
.annotAnalysis(annotDirectories, annotOut, sampleNames, mashedNames, .save = TRUE)
.annotAnalysis(annotDirectories, annotOut, sampleNames, mashedNames, .save = TRUE)
annotDirectories |
list type. List of sample directories |
annotOut |
string type. Output directory |
sampleNames |
vector type. 1-1 with annotDirectories |
mashedNames |
string type. File output "mashed" sample names |
.save |
logical type. Saves ggplot object |
none
alignlen
slotAccessor for alignlen
slot
.asRepertoireAlignLen(object, collapse = " - ")
.asRepertoireAlignLen(object, collapse = " - ")
object |
AbSeqRep object |
collapse |
character type, collapse the range using this string. |
character type. If collapse is a string, then the ranges are represented as 'start - end' in a string, if collapse is NULL, returns a character vector of length two, denoting the start and end value respectively.
bitscore
slotAccessor for bitscore
slot
.asRepertoireBitscore(object, collapse = " - ")
.asRepertoireBitscore(object, collapse = " - ")
object |
AbSeqRep object |
collapse |
character type, collapse the range using this string. |
character type. If collapse is a string, then the ranges are represented as 'start - end' in a string, if collapse is NULL, returns a character vector of length two, denoting the start and end value respectively.
chain
slotAccessor for chain
slot
.asRepertoireChain(object)
.asRepertoireChain(object)
object |
AbSeqRep object |
character type, the chain type of this sample
outdir
slotAccessor for the outdir
slot
.asRepertoireDir(object)
.asRepertoireDir(object)
object |
AbSeqRep object |
character type, the output directory of this object
name
slotAccessor for the name
slot
.asRepertoireName(object)
.asRepertoireName(object)
object |
AbSeqRep object |
character type, the sample name of this object.
primer3end
slotAccessor for the primer3end
slot
.asRepertoirePrimer3(object)
.asRepertoirePrimer3(object)
object |
AbSeqRep object |
character type, the FASTA file name for primer 3' end sequences
primer5end
slotAccessor for the primer5end
slot
.asRepertoirePrimer5(object)
.asRepertoirePrimer5(object)
object |
AbSeqRep object |
character type, the FASTA file name for primer 5' end sequences
qstart
slotAccessor for qstart
slot
.asRepertoireQueryStart(object, collapse = " - ")
.asRepertoireQueryStart(object, collapse = " - ")
object |
AbSeqRep object |
collapse |
character type, collapse the range using this string. |
character type. If collapse is a string, then the ranges are represented as 'start - end' in a string, if collapse is NULL, returns a character vector of length two, denoting the start and end value respectively.
sstart
slotAccessor for sstart
slot
.asRepertoireSubjectStart(object, collapse = " - ")
.asRepertoireSubjectStart(object, collapse = " - ")
object |
AbSeqRep object |
collapse |
character type, collapse the range using this string. |
character type. If collapse is a string, then the ranges are represented as 'start - end' in a string, if collapse is NULL, returns a character vector of length two, denoting the start and end value respectively.
upstream
slotAccessor for the upstream
slot
.asRepertoireUpstream(object)
.asRepertoireUpstream(object)
object |
AbSeqRep object |
character type
Creates a box plot
.boxPlot(dataframes, sampleNames, plotTitle, xlabel = "", ylabel = "", subs = "")
.boxPlot(dataframes, sampleNames, plotTitle, xlabel = "", ylabel = "", subs = "")
dataframes |
list type. List of sample dataframes |
sampleNames |
vector type. 1-1 with dataframes |
plotTitle |
string type |
xlabel |
string type |
ylabel |
string type |
subs |
string type |
ggplot2 object
Calculates the "standard" diversity indices
.calculateDInd(df)
.calculateDInd(df)
df |
clonotype dataframe. Vegan format: +—————————+ | S.1| S.2| S.3 | S.4 | ... | (each species should have its own column) +—————————+ | v1 |v2 | v3 | .... | (each species' count values are placed in the corresponding column) +—————————+ |
dataframe with the column headers: shannon , simpson.con , simpson.inv , simpson.gini , renyi.0 , renyi.1 , renyi.2 , renyi.Inf , hill.0 , hill.1 , hill.2 , hill.Inf
renyi.0 => species richness renyi.1 => shannon entropy renyi.2 => inv.gini renyi.Inf => min.entropy
finally: hill_a = exp(renyi_a)
Employ common techniques to calculate LBE for unseen species and commonly used diversity indices
.calculateDiversityEstimates(diversityDirectories, diversityOut, sampleNames)
.calculateDiversityEstimates(diversityDirectories, diversityOut, sampleNames)
diversityDirectories |
list type. List of directories to diversity dir |
diversityOut |
string type. Output directory |
sampleNames |
vector type. 1-1 with diversityDirectories sample names |
None
Convert file names to human friendly text
.canonicalizeTitle(str)
.canonicalizeTitle(str)
str |
string type |
string
str
Helper function to capitalize the first letter of str
.capitalize(str)
.capitalize(str)
str |
string type |
string, str
capitalized
Checks if abseqPy has a metadata line that suggests the orientation
.checkVert(filename)
.checkVert(filename)
filename |
csv filename |
True if CSV metadata says "plot vertically"
Marginal histogram of clonotypes (blue for shared, grey for total). The y axis is scaled by sqrt (but it doesn't really matter anyway, since we're stripping away the y-ticks)
.cloneDistHist(df.original, otherClones, lim.min, flip)
.cloneDistHist(df.original, otherClones, lim.min, flip)
df.original |
dataframe with all clones |
otherClones |
clones from the other dataframe |
lim.min |
x-axis minimum limit |
flip |
logical type |
ggplot2 object
Marginal density graph of clonotypes (blue for shared, grey for total, purple for exclusive clones)
.cloneDistMarginal(df.original, otherClones, lim.min, flip)
.cloneDistMarginal(df.original, otherClones, lim.min, flip)
df.original |
dataframe with all clones |
otherClones |
clones from the other dataframe |
lim.min |
x-axis minimum limit |
flip |
logical type |
ggplot2 object
Comprehensive clonotype analyses
.clonotypeAnalysis(diversityDirectories, clonotypeOut, sampleNames, mashedNames, .save = TRUE)
.clonotypeAnalysis(diversityDirectories, clonotypeOut, sampleNames, mashedNames, .save = TRUE)
diversityDirectories |
list type. List of directories to diversity dir |
clonotypeOut |
string type. Output directory |
sampleNames |
vector type. 1-1 with diversityDirectories |
mashedNames |
string type. Prefix for ooutput files using "mashed-up" |
.save |
logical type. Save ggplot object? |
Nothing
index.html
file that redirects to all collated HTML filesCollate all HTML reports into a single directory and cretate an entry
index.html
file that redirects to all collated HTML files
.collateReports(reports, individualSamples, outputDirectory)
.collateReports(reports, individualSamples, outputDirectory)
reports |
list/vector type. Collection of strings that are path(s) to <sample>_report.html |
individualSamples |
list type. list of AbSeqRep objects. Used to extract filtering information and % read counts. |
outputDirectory |
string type. Where should the report be placed. |
Nothing
Collect the intersection of all primer names within a collection of primer files
.commonPrimerNames(primerFiles)
.commonPrimerNames(primerFiles)
primerFiles |
list / vector type. Collection of primer files |
vector type. Vector of primerNames that are present in ALL primerFiles. NULL if there's no intersection at all
Conducts pearson and spearman correlation analysis on dataframe
.correlationTest(df)
.correlationTest(df)
df |
dataframe with at least the following 2 columns: +—————–+ | prop.x | prop.y | +—————–+ |..... | .... | +—————–+ where prop.x and prop.y are normalized counts (i.e. frequencies) of the clones They may contain 0 in a column to denote it being missing from sample x or y. |
named list of pearson, pearson.p, spearman, spearman.p
Computes the distance between pariwise samples
.distanceMeasure(df)
.distanceMeasure(df)
df |
dataframe with at least the following 2 columns: +—————–+ | prop.x | prop.y | +—————–+ |..... | .... | +—————–+ where prop.x and prop.y are normalized counts (i.e. frequencies) of the clones They may contain 0 in a column to denote it being missing from sample x or y. |
named list of bray.curtis, jaccard, and morisita.horn
Title Diversity analysis
.diversityAnalysis(diversityDirectories, diversityOut, sampleNames, mashedNames, .save = TRUE)
.diversityAnalysis(diversityDirectories, diversityOut, sampleNames, mashedNames, .save = TRUE)
diversityDirectories |
list type. List of directories to diversity dir |
diversityOut |
string type. Output directory |
sampleNames |
vector type. 1-1 with diversityDirectories |
mashedNames |
string type. Prefix for output files using "mashed-up" sample names |
.save |
logical type. Save ggplot object? |
None
Creates and returns an empty plot
.emptyPlot()
.emptyPlot()
empty ggplot2 object
A sample_vs_sample directory will not have these files.
.findRepertoires(directory)
.findRepertoires(directory)
directory |
string. Path up until <abseqPy_outputdir>/RESULT_DIR/ |
vector of strings that are samples in 'directory', note, this is NOT a full path, but just the sample/repertoire name itself
Generates all FR/CDR spectratypes
.generateAllSpectratypes(diversityDirectories, diversityOut, sampleNames, mashedNames, .save = TRUE)
.generateAllSpectratypes(diversityDirectories, diversityOut, sampleNames, mashedNames, .save = TRUE)
diversityDirectories |
list type. List of directories to diversity dir |
diversityOut |
string type. Output directory |
sampleNames |
vector type. 1-1 with diversityDirectories |
mashedNames |
string type. Prefix for output files using "mashed-up" sample names |
.save |
logical type. Save ggplot object? |
Nothing
This function is needed because we are delaying the generation of reports until after all threads/processes have joined. There's currently an issue with rmarkdown::render() in parallel execution, see: https://github.com/rstudio/rmarkdown/issues/499
.generateDelayedReport(root, compare, interactivePlot)
.generateDelayedReport(root, compare, interactivePlot)
root |
string, project root directory. |
compare |
vector of strings, each string is a comparison defined by the user (assumes that this value has been checked). |
interactivePlot |
logical, whether or not to plot interactive plotly plots. |
a named list of samples, each an AbSeqRep object found in "root"
AbSeqRep
and
AbSeqCRep
ojectsGenerates HTML report from AbSeqRep
and
AbSeqCRep
ojects
.generateReport(object, root, outputDir, interactivePlot = TRUE, .indexHTML = "#")
.generateReport(object, root, outputDir, interactivePlot = TRUE, .indexHTML = "#")
object |
AbSeqCRep type. |
root |
string type. Root directory of the sample(s) |
outputDir |
string type. The path where the HTML will be generated |
interactivePlot |
logical type. Interactive or not |
.indexHTML |
character type. The back button will redirect to this link. This is typically used to redirect users back to index.html page |
path (including HTML name) where the report (HTML file) was saved to
In the aesthetics of diversity plots (rarefaction, recapture, and duplication), the line types should emphasise the most important antibody region, they're ranked in ascending order of: "FR4", "FR1", "FR2", "FR3", "CDR1", "CDR2", "CDR3", "V".
.getLineTypes(regions)
.getLineTypes(regions)
regions |
a list/vector of strings (regions) |
vector of strings, each corresponding to the appropriate line type
for regions
.
Often enough, the CSV values supplied do not contain raw counts but percentages (so this value will let us know exactly the sample size).
.getTotal(filename)
.getTotal(filename)
filename |
csv filename |
string, sample size.
Plots a plotly heatmap from provided matrix
.hmFromMatrix(m, title, xlabel = "", ylabel = "")
.hmFromMatrix(m, title, xlabel = "", ylabel = "")
m |
matrix type |
title |
character type |
xlabel |
character type |
ylabel |
character type |
list with keys: static and interactive (ggplot2 object and plotly object respectivelyb)
sampleDirectory
Returns all samples found under sampleDirectory
.inferAnalyzed(sampleDirectory)
.inferAnalyzed(sampleDirectory)
sampleDirectory |
string, path to sample directory. |
un-normalized path to all samples under sampleDirectory
Given a dataframe with the columns "from", "to", and value.var, return a symmetric matrix (with diagonal values = diag). I.e. a call to isSymmetric(return_value_of_this_function) will always be TRUE.
.loadMatrixFromDF(dataframe, value.var, diag, unidirectional = TRUE)
.loadMatrixFromDF(dataframe, value.var, diag, unidirectional = TRUE)
dataframe |
dataframe with 3 required columns, namely: +—————————————+ | from | to | value.var | ... | +—————————————+ | | | | | +—————————————+ where value.var is the string provided in the function parameter |
value.var |
the column to use as the matrix value |
diag |
what should the diagonal values be if the dataframe doesn't provide them |
unidirectional |
logical type. If the dataframe provided has the reverse pairs (i.e. a from-to pair AND a to-from pair with the save values in the value.var column), then this should be FALSE. Otherwise, this function will flip the from-to columns to generate a symmetric dataframe (and hence, a symmetric matrix). |
a symmetric matrix with rownames(mat) == colnames(mat) The diagonal values are filled with diag if the dataframe itself doesn't have diagonal data
Loads AbSeqCRep or AbSeqRep objects from a list of sampleNames
.loadSamplesFromString(sampleNames, root, warnMove = TRUE)
.loadSamplesFromString(sampleNames, root, warnMove = TRUE)
sampleNames |
vector, singleton or otherwise |
root |
string type. root directory |
warnMove |
logical type. Warning message ("message" level, not "warning" level) if the directory has been moved? |
AbSeqRep or AbSeqCRep object depending on sampleNames
Conduct all vs all pairwise comparison analyses
.pairwiseComparison(dataframes, sampleNames, outputPath, .save = TRUE)
.pairwiseComparison(dataframes, sampleNames, outputPath, .save = TRUE)
dataframes |
list of dataframes |
sampleNames |
1-1 vector corresponding to dataframes |
outputPath |
string |
.save |
logical |
nothing
V-J association plot
.plotCirclize(sampleName, path, outputdir)
.plotCirclize(sampleName, path, outputdir)
sampleName |
string type |
path |
string type. Path to _vjassoc.csv |
outputdir |
string type |
None
Plots barplot for all sample in dataframes. If length(sampleNames) == 1, then the bars will also have y-values (or x if horizontal plot) labels on them. Use 'perc' to control if the values are percentages.
.plotDist(dataframes, sampleNames, plotTitle, vert = TRUE, xlabel = "", ylabel = "", perc = TRUE, subs = "", sorted = TRUE, cutoff = 15, legendPos = "right")
.plotDist(dataframes, sampleNames, plotTitle, vert = TRUE, xlabel = "", ylabel = "", perc = TRUE, subs = "", sorted = TRUE, cutoff = 15, legendPos = "right")
dataframes |
list type. List of dataframes |
sampleNames |
vector type. 1-1 correspondence to dataframes. |
plotTitle |
string type. |
vert |
boolean type. True if the plot should be vertical |
xlabel |
string type |
ylabel |
string type |
perc |
boolean type. True if data's axis is a percentage proportion (instead of 0-1) only used if length(sampleNames) == 1 |
subs |
string type |
sorted |
boolean type. True if bar plot should be sorted in descending order |
cutoff |
int type. Number of maximum ticks to show (x on vert plots, y on hori plots). |
legendPos |
string type. Where to position legend (see ggplot's theme()) |
ggplot2 object
region
Plots rarefaction, recapture, and de-dup plots for specified region
.plotDiversityCurves(region, diversityDirectories, sampleNames, mashedNames, diversityOut, .save = TRUE)
.plotDiversityCurves(region, diversityDirectories, sampleNames, mashedNames, diversityOut, .save = TRUE)
region |
string type. One of: "cdr", "cdr_v", and "fr". "cdr" means CDR1-3, "cdr_v" means CDR3 and V only, and finally "fr" means FR1-4. |
diversityDirectories |
list type. List of directories to diversity dir |
sampleNames |
vector type. 1-1 with diversityDirectories |
mashedNames |
string type. Prefix for output files using "mashed-up" |
diversityOut |
string type. Output directory sample names |
.save |
logical type. Save ggplot object? |
Nothing
bins singletons, doubletons, and higher order clonotypes into a line plot
.plotDuplication(files, sampleNames, regions = c("CDR3", "V"))
.plotDuplication(files, sampleNames, regions = c("CDR3", "V"))
files |
list type. List of strings to _cdr_v_duplication.csv pathname |
sampleNames |
vector type. Vector of strings each representing sample names |
regions |
vector type. Which regions to include in the plot. Default = c("CDR3", "V") |
ggplot2 object
Plots the distribution of indels (gaps), indels in out-of-frame sequences, and the distribution of mismatches for CDRs, FRs, IGV, IGD, and IGJ.
.plotErrorDist(productivityDirectories, prodOut, sampleNames, combinedNames, mashedNames, .save = TRUE)
.plotErrorDist(productivityDirectories, prodOut, sampleNames, combinedNames, mashedNames, .save = TRUE)
productivityDirectories |
list type. List of directories |
prodOut |
string type. Output directory |
sampleNames |
vector type. 1-1 with productivity directories |
combinedNames |
string type. Title friendly "combined" sample names |
mashedNames |
string type. File friendly "mashed-up" sample names |
.save |
logical type. Save Rdata? |
None
Plots the distribution of in-frame unproductive, out-of-frame unproductive, and productive IGV germlines.
.plotIGVErrors(productivityDirectories, prodOut, sampleNames, combinedNames, mashedNames, .save = TRUE)
.plotIGVErrors(productivityDirectories, prodOut, sampleNames, combinedNames, mashedNames, .save = TRUE)
productivityDirectories |
list type. List of directories |
prodOut |
string type. Output directory |
sampleNames |
vector type. 1-1 with productivity directories |
combinedNames |
string type. Title friendly "combined" sample names |
mashedNames |
string type. File friendly "mashed-up" sample names |
.save |
logical type, save Rdata? |
None
upstreamLengthRange
Given an upstream length range,
plot the distributions of IGV family without showing their actual lengths. If
their actual lengths matter, refer
to .plotIGVUpstreamLenDistDetailed
.
.plotIGVUpstreamLenDist(upstreamDirectories, upstreamOut, upstreamLengthRange, lengthType, sampleNames, combinedNames, mashedNames, .save = TRUE)
.plotIGVUpstreamLenDist(upstreamDirectories, upstreamOut, upstreamLengthRange, lengthType, sampleNames, combinedNames, mashedNames, .save = TRUE)
upstreamDirectories |
list type. List of sample directories |
upstreamOut |
string type. Output directory |
upstreamLengthRange |
The range of upstream sequences to be included in this plot. This is usually determined by abseqPy and the format should be as follows: "min_max", e.g.: 1_15 means range(1, 15) inclusive.string type. |
lengthType |
string type. "" (the empty string) denotes everything and "_short" denotes a short sequence. abseqPy dictates this because it's used for locating the files. |
sampleNames |
vector type. 1-1 with upstream directories |
combinedNames |
string type. Title friendly "combined" sample names |
mashedNames |
string type. File friendly "mashed-up" sample names |
.save |
logical type. Save Rdata? |
None
A boxplot for each IGV families showing the IQR of upstream
lengths. In contrast to .plotIGVUpstreamLenDist
,
which only shows the distribution of IGV families
over upstreamLengthRange
.
.plotIGVUpstreamLenDistDetailed(upstreamDirectories, upstreamOut, upstreamLengthRange, lengthType, sampleNames, combinedNames, mashedNames, .save = TRUE)
.plotIGVUpstreamLenDistDetailed(upstreamDirectories, upstreamOut, upstreamLengthRange, lengthType, sampleNames, combinedNames, mashedNames, .save = TRUE)
upstreamDirectories |
list type. List of sample directories |
upstreamOut |
string type. Output directory |
upstreamLengthRange |
The range of upstream sequences to be included in this plot. This is usually determined by abseqPy and the format should be as follows: "min_max", e.g.: 1_15 means range(1, 15) inclusive.string type. |
lengthType |
string type. "" (the empty string) denotes everything and "_short" denotes a short sequence. abseqPy dictates this because it's used for locating the files. |
sampleNames |
vector type. 1-1 with upstream directories |
combinedNames |
string type. Title friendly "combined" sample names |
mashedNames |
string type. File friendly "mashed-up" sample names |
.save |
logical type. Save Rdata? |
None
category
and pend
, the primer
IGV indelled distribution in a bar plotPlots the abundace of indelled primers relative to IGV germlines
.plotPrimerIGVStatus(primer, pend, category, primerDirectories, sampleNames, primerOut, combinedNames, mashedNames, .save = TRUE)
.plotPrimerIGVStatus(primer, pend, category, primerDirectories, sampleNames, primerOut, combinedNames, mashedNames, .save = TRUE)
primer |
string, primer name |
pend |
string, either 3 or 5 (primer end) |
category |
string, either "all", "productive", or "outframe" |
primerDirectories |
string type. Path to primer analysis directory |
sampleNames |
vector type. 1-1 with primerDirectories |
primerOut |
string type. output directory |
combinedNames |
string type. Title friendly "combined" sample names |
mashedNames |
string type. File friendly "mashed-up" sample names |
.save |
logical type. Save Rdata? |
None
category
and
5' or 3' pend
Plots the distribution of primer integrity for a given category
and
5' or 3' pend
.plotPrimerIntegrity(primerIntegrity, pend, category, primerDirectories, sampleNames, primerOut, combinedNames, mashedNames, .save = TRUE)
.plotPrimerIntegrity(primerIntegrity, pend, category, primerDirectories, sampleNames, primerOut, combinedNames, mashedNames, .save = TRUE)
primerIntegrity |
string. One of "stopcodon", "integrity", "indelled", "indel_pos" |
pend |
string, either 3 or 5 (primer end) |
category |
string, either "all", "productive", or "outframe" |
primerDirectories |
string type. Path to primer analysis directory |
sampleNames |
vector type. 1-1 with primerDirectories |
primerOut |
string type. output directory |
combinedNames |
string type. Title friendly "combined" sample names |
mashedNames |
string type. File friendly "mashed-up" sample names |
.save |
logical type. Save Rdata? |
None
Plots the number of unique clonotypes (on the y-axis) drawn from a sample size on the x axis. The number of unique clonotypes is averaged over 5 repeated rounds.
.plotRarefaction(files, sampleNames, regions = c("CDR3", "V"))
.plotRarefaction(files, sampleNames, regions = c("CDR3", "V"))
files |
list type. A list of files consisting of path to samples |
sampleNames |
vector type. A vector of strings, each being the name of samples in files |
regions |
vector type. A vector of strings, regions to be included. Defaults to c("CDR3", "V") |
ggplot2 object
Plots the percent of recapture clonotypes (on the y-axis) drawn from a repeated (with replacement) sample size on the x axis. The percentage of recaptured clonotypes is averaged over 5 recapture rounds.
.plotRecapture(files, sampleNames, regions = c("CDR3", "V"))
.plotRecapture(files, sampleNames, regions = c("CDR3", "V"))
files |
list type. List of _cdr_v_recapture.csv.gz files. |
sampleNames |
vector type. A vector of strings each representing the name of samples in files. |
regions |
vector type. A vector of strings, regions to be included in the plot. defaults to c("CDR3", "V") |
ggplot2 object
Monolith AbSeq Plot function - the "driver" program
.plotSamples(sampleNames, directories, analysis, outputDir, primer5Files, primer3Files, upstreamRanges, skipDgene = FALSE)
.plotSamples(sampleNames, directories, analysis, outputDir, primer5Files, primer3Files, upstreamRanges, skipDgene = FALSE)
sampleNames |
vector type. Vector of sample names in strings |
directories |
vector type. Vector of directories in strings, must be 1-1 with sampleNames |
analysis |
vector / list type. What analysis to plot for. If sampleNames or directories is > 1 (i.e. AbSeqCRep), then make sure that it's an intersection of all analysis conducted by the repertoires, otherwise, it wouldn't make sense |
outputDir |
string type. Where to dump the output |
primer5Files |
vector / list type. Collection of strings that the sample used for primer5 analysis. If sample N doesn't have a primer 5 file, leave it as anthing but a valid file path. |
primer3Files |
vector / list type. Collection of strings that the sample used for primer 3 analysis. If sample N doesn't have a primer 3 file, leave it as anthing but a valid file path. |
upstreamRanges |
list type. Collection of "None"s or vector denoting upstreamStart and upstreamEnd for each sample. |
skipDgene |
logical type. Whether or not to skip D gene distribution plot |
none
Plots length distribution
.plotSpectratype(dataframes, sampleNames, region, title = "Spectratype", subtitle = "", xlabel = "Length(AA)", ylabel = "Distribution", showLabel = FALSE)
.plotSpectratype(dataframes, sampleNames, region, title = "Spectratype", subtitle = "", xlabel = "Length(AA)", ylabel = "Distribution", showLabel = FALSE)
dataframes |
list type. List of dataframes. |
sampleNames |
vector type. 1-1 correspondance with dataframes |
region |
string type. Region that will be displayed in the plot title. This specifies which region this spectratype belongs to. If not supplied, a default (start, end) range will be displayed instead |
title |
string type. Ignored if region is specified. |
subtitle |
string type |
xlabel |
string type |
ylabel |
string type |
showLabel |
bool type. Show geom_text? - Ignored if samples > 1 |
ggplot2 object
Plot upstream distribution
.plotUpstreamLength(upstreamDirectories, upstreamOut, expectedLength, upstreamLengthRange, sampleNames, combinedNames, mashedNames, .save = TRUE)
.plotUpstreamLength(upstreamDirectories, upstreamOut, expectedLength, upstreamLengthRange, sampleNames, combinedNames, mashedNames, .save = TRUE)
upstreamDirectories |
list type. List of sample directories |
upstreamOut |
string type. Output directory |
expectedLength |
int type. Expected length of upstream sequences. (i.e. upstream_end - upstream_start + 1). |
upstreamLengthRange |
string type. start_end format |
sampleNames |
vector type. 1-1 with upstream directories |
combinedNames |
string type. Title friendly "combined" sample names |
mashedNames |
string type. File friendly "mashed-up" sample names |
.save |
logical type. Save Rdata? |
None
upstreamLengthRange
Given an upstream length range, plot the distribution of upstream sequence lengths.
.plotUpstreamLengthDist(upstreamDirectories, upstreamOut, upstreamLengthRange, lengthType, sampleNames, combinedNames, mashedNames, .save)
.plotUpstreamLengthDist(upstreamDirectories, upstreamOut, upstreamLengthRange, lengthType, sampleNames, combinedNames, mashedNames, .save)
upstreamDirectories |
list type. List of sample directories |
upstreamOut |
string type. Output directory |
upstreamLengthRange |
The range of upstream sequences to be included in this plot. This is usually determined by abseqPy and the format should be as follows: "min_max", e.g.: 1_15 means range(1, 15) inclusive.string type. |
lengthType |
string type. "" (the empty string) denotes everything and "_short" denotes a short sequence. abseqPy dictates this because it's used for locating the files. |
sampleNames |
vector type. 1-1 with upstream directories |
combinedNames |
string type. Title friendly "combined" sample names |
mashedNames |
string type. File friendly "mashed-up" sample names |
.save |
logical type. Save Rdata? |
None
Conducts primer specificity analysis
.primerAnalysis(primerDirectories, primer5Files, primer3Files, primerOut, sampleNames, combinedNames, mashedNames, .save = TRUE)
.primerAnalysis(primerDirectories, primer5Files, primer3Files, primerOut, sampleNames, combinedNames, mashedNames, .save = TRUE)
primerDirectories |
string type. Path to primer analysis directory |
primer5Files |
vector / list type. 5' end primer files |
primer3Files |
vector / list type. 3' end primer files |
primerOut |
string type. output directory |
sampleNames |
vector type. 1-1 with primerDirectories |
combinedNames |
string type. Title friendly "combined" sample names |
mashedNames |
string type. File friendly "mashed-up" sample names |
.save |
logical type. Save Rdata? |
None
A wrapper for plotDist
.prodDistPlot(productivityDirectories, sampleNames, title, reg, outputFileName, region, .save = TRUE)
.prodDistPlot(productivityDirectories, sampleNames, title, reg, outputFileName, region, .save = TRUE)
productivityDirectories |
vector type. directories where all productivity csv files lives (usually <samplename>/productivity/) |
sampleNames |
vector type. |
title |
string type. |
reg |
string type. Regular expression to find the right files for this particular distribution plot |
outputFileName |
string type.
Vector of file names to save in the order of |
region |
string type. Most of the dist plots are regional based. use "" if no regions are involved |
.save |
logical type. Save Rdata? |
None
Conducts productivty analysis
.productivityAnalysis(productivityDirectories, prodOut, sampleNames, combinedNames, mashedNames, .save = TRUE)
.productivityAnalysis(productivityDirectories, prodOut, sampleNames, combinedNames, mashedNames, .save = TRUE)
productivityDirectories |
list type. List of directories |
prodOut |
string type. Output directory |
sampleNames |
vector type. 1-1 with productivity directories |
combinedNames |
string type. Title friendly "combined" sample names |
mashedNames |
string type. File friendly "mashed-up" sample names |
.save |
logical type. Save Rdata |
None
Shows the percentage of 1. productivity, 2. non-functional + reason for being unproductive, i.e. "Stop Codon" or "Out of frame" or "Stop & Out"
.productivityPlot(dataframes, sampleNames)
.productivityPlot(dataframes, sampleNames)
dataframes |
list type. List of sample dataframes |
sampleNames |
vector type. 1-1 with dataframes |
ggplot2 object
Return value specifed by key from AbSeq's summary file
.readSummary(sampleRoot, key)
.readSummary(sampleRoot, key)
sampleRoot |
sample's root directory. For example,
|
key |
character type. Possible values are (though they might change)
|
value associated with key from summary file. "NA" (in string) if the field is not available refer to util.R for the key values
Title Shows varying regions for a given clonotype defined by its CDR3
.regionAnalysis(path, sampleName, top = 15)
.regionAnalysis(path, sampleName, top = 15)
path |
string type. Path to diversity folder where <sampleName>_clonotype_diversity_region_analysis.csv.gz is located |
sampleName |
string type |
top |
int type. Top N number of clones to analyze |
ggplot2 object
Reports abundance-based (Lower bound) diversity estimates using the Vegan package
.reportLBE(df)
.reportLBE(df)
df |
clonotype dataframe. Vegan format: +—————————+ | S.1| S.2| S.3 | S.4 | ... | (each species should have its own column) +—————————+ | v1 |v2 | v3 | .... | (each species' count values are placed in the corresponding column) +—————————+ |
dataframe with the format: +—————————————————————-+ | S.obs | S.chao1 | se.chao1 | S.ACE | se.ACE | s.jack1 | s.jack2| +—————————————————————-+ | v1 | v2 .... | +—————————————————————-+
It's a convinient function that does the check and saves at the same time, for brevity within other areas of the code (to eliminate repeated if checks).
.saveAs(.save, filename, plot)
.saveAs(.save, filename, plot)
.save |
logical type. Whether or not we should save. |
filename |
string. |
plot |
ggplot object. |
nothing
Title Creates a scatter plot
.scatterPlot(df1, df2, name1, name2, cloneClass)
.scatterPlot(df1, df2, name1, name2, cloneClass)
df1 |
dataframe for sample 1 |
df2 |
dataframe for sample 2 |
name1 |
string type, Sample 1 name |
name2 |
string type. Sample 2 name |
cloneClass |
string type. What region was used to classify clonotypes - appears in title. For example, CDR3 or V region |
ggplot2 object
Creates a complex scatter plot
.scatterPlotComplex(df.union, df1, df2, name1, name2, cloneClass)
.scatterPlotComplex(df.union, df1, df2, name1, name2, cloneClass)
df.union |
a 'lossless' dataframe created by intersecting sample1 and sample2's dataframes. It should contain NAs where clones that appear in one sample doesn't appear in the other. For example: +————————————————-+ | Clonotype | prop.x | prop.y | Count.x | Count.y | +————————————————-+ | ABCDEF NA 0.01 NA 210 | | ...... | +————————————————-+ |
df1 |
dataframe for sample 1 |
df2 |
dataframe for sample 2 |
name1 |
string type, Sample 1 name |
name2 |
string type. Sample 2 name |
cloneClass |
string type. What region was used to classify clonotypes - appears in title. For example, CDR3 or V region this plotting techique was shamelessly plagarised from https://github.com/mikessh/vdjtools/blob/master/src/main/resources/rscripts/intersect_pair_scatter.r (VDJTools) with minor modifications |
ggplot2 object
Generates all the required plots for Secretion signal analysis. This includes upstream length distributions and upstream sequence validity.
.secretionSignalAnalysis(secDirectories, secOut, sampleNames, combinedNames, mashedNames, upstreamRanges, .save = TRUE)
.secretionSignalAnalysis(secDirectories, secOut, sampleNames, combinedNames, mashedNames, upstreamRanges, .save = TRUE)
secDirectories |
list type. Secretion signal directories where files are located |
secOut |
string type. Where to dump output |
sampleNames |
vector type. 1-1 with secDirectories |
combinedNames |
string type. Title friendly string |
mashedNames |
string type. File name friendly string |
upstreamRanges |
list type. Upstream ranges for each sample. If length(secDirectories) > 1, the plots will only be generated for upstream ranges that are present in ALL samples. (i.e. the intersection) |
.save |
logical type, save Rdata? |
none
Substitutes the first occurance of 'key' with 'value' in 'filename'
.substituteStringInFile(filename, key, value, fixed = FALSE)
.substituteStringInFile(filename, key, value, fixed = FALSE)
filename |
character type |
key |
character type |
value |
character type |
fixed |
logical type |
None
Gives count, mean, standard deviation, standard error of the mean, and confidence interval (default 95%).
adapted from http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions
.summarySE(data = NULL, measurevar, groupvars = NULL, na.rm = FALSE, conf.interval = 0.95, .drop = TRUE)
.summarySE(data = NULL, measurevar, groupvars = NULL, na.rm = FALSE, conf.interval = 0.95, .drop = TRUE)
data |
a data frame. |
measurevar |
the name of a column that contains the variable to be summariezed |
groupvars |
a vector containing names of columns that contain grouping variables |
na.rm |
a boolean that indicates whether to ignore NA's |
conf.interval |
the percent range of the confidence interval (default is 95%) |
.drop |
logical. |
dataframe
Title Clonotype table
.topNDist(dataframes, sampleNames, top = 10)
.topNDist(dataframes, sampleNames, top = 10)
dataframes |
list type. List of dataframes. |
sampleNames |
vector type. vector of strings representing sample names should have one-to-one correspondence with dataframes |
top |
int type. Top N clonotypes to plot |
None
Generates all the required plots for 5' UTR analysis. This includes upstream length distributions and upstream sequence validity.
.UTR5Analysis(utr5Directories, utr5Out, sampleNames, combinedNames, mashedNames, upstreamRanges, .save = TRUE)
.UTR5Analysis(utr5Directories, utr5Out, sampleNames, combinedNames, mashedNames, upstreamRanges, .save = TRUE)
utr5Directories |
list type. 5UTR directories where files are located |
utr5Out |
string type. Where to dump output |
sampleNames |
vector type. 1-1 with utr5Directories |
combinedNames |
string type. Title friendly string |
mashedNames |
string type. File name friendly string |
upstreamRanges |
list type. Upstream ranges for each sample. If length(utr5Directories) > 1, the plots will only be generated for upstream ranges that are present in ALL samples. (i.e the intersection) |
.save |
logical type, save Rdata? |
none
Title Creates Venndiagram for clonotype intersection
.vennIntersection(dataframes, sampleNames, outFile, top = Inf)
.vennIntersection(dataframes, sampleNames, outFile, top = Inf)
dataframes |
list type. List of sample dataframes. Only accepts 2 - 5 samples. Warning message will be generated for anything outside of the range |
sampleNames |
vector type. 1-1 with dataframes |
outFile |
string type. Filename to be saved as |
top |
int type. Top N cutoff, defaults to ALL clones if not specified |
Nothing
Combines 2 AbSeqCRep objects together for comparison
## S4 method for signature 'AbSeqCRep,AbSeqCRep' e1 + e2
## S4 method for signature 'AbSeqCRep,AbSeqCRep' e1 + e2
e1 |
AbSeqCRep. |
e2 |
AbSeqCRep. |
AbSeqCRep object. Calling abseqR
's
functions on this object will always result in a comparison.
abseqReport
returns a list
of AbSeqRep
s
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # The provided example data has PCR1, PCR2, and PCR3 samples contained within # pcr12 and pcr13 are instances of AbSeqCRep pcr12 <- samples[["PCR1"]] + samples[["PCR2"]] pcr13 <- samples[["PCR1"]] + samples[["PCR3"]] # all_S is also an instance of AbSeqCRep all_S <- pcr12 + pcr13 # you can now call the report function on this object # report(all_S) # uncomment this line to execute report
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # The provided example data has PCR1, PCR2, and PCR3 samples contained within # pcr12 and pcr13 are instances of AbSeqCRep pcr12 <- samples[["PCR1"]] + samples[["PCR2"]] pcr13 <- samples[["PCR1"]] + samples[["PCR3"]] # all_S is also an instance of AbSeqCRep all_S <- pcr12 + pcr13 # you can now call the report function on this object # report(all_S) # uncomment this line to execute report
Combines a AbSeqCRep object with a AbSeqRep object together for comparison
## S4 method for signature 'AbSeqCRep,AbSeqRep' e1 + e2
## S4 method for signature 'AbSeqCRep,AbSeqRep' e1 + e2
e1 |
AbSeqCRep. |
e2 |
AbSeqRep. |
AbSeqCRep object. Calling abseqR
's
functions on this object will always result in a comparison.
abseqReport
returns a list
of AbSeqRep
s
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # The provided example data has PCR1, PCR2, and PCR3 samples contained within # pcr12 is an instance of AbSeqCRep pcr12 <- samples[["PCR1"]] + samples[["PCR2"]] # pcr3 is instance of AbSeqRep pcr3 <- samples[["PCR3"]] # pcr123 is an instance of AbSeqCRep pcr123 <- pcr12 + pcr3 # you can now call the report function on this object # report(pcr123) # uncomment this line to execute report
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # The provided example data has PCR1, PCR2, and PCR3 samples contained within # pcr12 is an instance of AbSeqCRep pcr12 <- samples[["PCR1"]] + samples[["PCR2"]] # pcr3 is instance of AbSeqRep pcr3 <- samples[["PCR3"]] # pcr123 is an instance of AbSeqCRep pcr123 <- pcr12 + pcr3 # you can now call the report function on this object # report(pcr123) # uncomment this line to execute report
Combines a AbSeqRep object with a AbSeqCRep object together for comparison
## S4 method for signature 'AbSeqRep,AbSeqCRep' e1 + e2
## S4 method for signature 'AbSeqRep,AbSeqCRep' e1 + e2
e1 |
AbSeqRep. |
e2 |
AbSeqCRep. |
AbSeqCRep object. Calling abseqR
's
functions on this object will always result in a comparison.
abseqReport
returns a list
of AbSeqRep
s
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # The provided example data has PCR1, PCR2, and PCR3 samples contained within # pcr1 is an instance of AbSeqRep pcr1 <- samples[["PCR1"]] # pcr23 is instance of AbSeqCRep pcr23 <- samples[["PCR2"]] + samples[["PCR3"]] # pcr123 is an instance of AbSeqCRep pcr123 <- pcr1 + pcr23 # you can now call the report function on this object # report(pcr123) # uncomment this line to execute report
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # The provided example data has PCR1, PCR2, and PCR3 samples contained within # pcr1 is an instance of AbSeqRep pcr1 <- samples[["PCR1"]] # pcr23 is instance of AbSeqCRep pcr23 <- samples[["PCR2"]] + samples[["PCR3"]] # pcr123 is an instance of AbSeqCRep pcr123 <- pcr1 + pcr23 # you can now call the report function on this object # report(pcr123) # uncomment this line to execute report
Combines 2 AbSeqRep objects together for comparison
## S4 method for signature 'AbSeqRep,AbSeqRep' e1 + e2
## S4 method for signature 'AbSeqRep,AbSeqRep' e1 + e2
e1 |
AbSeqRep object. |
e2 |
AbSeqRep object. |
AbSeqCRep object. Calling abseqR
's
functions on this object will always result in a comparison.
abseqReport
returns a list
of AbSeqRep
s
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # The provided example data has PCR1, PCR2, and PCR3 samples contained within # pcr1 and pcr2 are instances of AbSeqRep pcr1 <- samples[["PCR1"]] pcr2 <- samples[["PCR2"]] # pcr12 is an instance of AbSeqCRep pcr12 <- pcr1 + pcr2 # you can now call the report function on this object # report(pcr12) # uncomment this line to execute report
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # The provided example data has PCR1, PCR2, and PCR3 samples contained within # pcr1 and pcr2 are instances of AbSeqRep pcr1 <- samples[["PCR1"]] pcr2 <- samples[["PCR2"]] # pcr12 is an instance of AbSeqCRep pcr12 <- pcr1 + pcr2 # you can now call the report function on this object # report(pcr12) # uncomment this line to execute report
AbSeqCRep is a collection of AbSeqRep S4 objects.
This S4 class contains multiple samples(repertoires) and it can be
"combined" with other samples by using the +
operator to
create an extended AbSeqCRep object.
This value, in turn, can be used as the first argument to
report which generates a comparison between all samples included
in the +
operation.
Users do not manually construct this class, but rather indirectly obtain
this class object as a return value from the +
operation between two
AbSeqRep objects, which are in turn, obtained indirectly from
abseqReport and report functions. It is also possible to
obtain this class object by +
(adding) AbSeqCRep objects.
AbSeqCRep
repertoires
list of AbSeqRep objects.
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # The provided example data has PCR1, PCR2, and PCR3 samples contained within # pcr12 and pcr13 are instances of AbSeqCRep pcr12 <- samples[["PCR1"]] + samples[["PCR2"]] pcr13 <- samples[["PCR1"]] + samples[["PCR3"]] # all_S is also an instance of AbSeqCRep all_S <- pcr12 + pcr13
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # The provided example data has PCR1, PCR2, and PCR3 samples contained within # pcr12 and pcr13 are instances of AbSeqCRep pcr12 <- samples[["PCR1"]] + samples[["PCR2"]] pcr13 <- samples[["PCR1"]] + samples[["PCR3"]] # all_S is also an instance of AbSeqCRep all_S <- pcr12 + pcr13
The AbSeqRep object contains all metadata associated with
the AbSeq (python backend) run conducted on it. This S4 class represents
a single sample(repertoire) and it can be "combined" with other samples
by using the +
operator to create an AbSeqCRep object.
This value, in turn, can be used as the first argument to
report which generates a comparison between all samples included
in the +
operation.
Users do not manually construct this class, but rather indirectly obtain this class object as a return value from the abseqReport and report functions.
AbSeqRep
f1
character. Path to FASTA/FASTQ file 1.
f2
character. Path to FASTA/FASTQ file 2.
chain
character. Type of chain, possible values:
hv
lv
kv
klv
each representing Heavy, Lambda and Kappa respectively.
task
character. Type of analysis conducted, possible values:
all
annotate
abundance
diversity
productivity
fastqc
primer
5utr
rsasimple
seqlen
secretion
seqlenclass
name
character. Name of analysis.
bitscore
numeric. Part of filtering criteria: V gene bitscore filter value.
qstart
numeric. Part of filtering criteria: V gene query start filter value.
sstart
numeric. Part of filtering criteria: V gene subject start filter value.
alignlen
numeric. Part of filtering criteria: V gene alignment length filter value.
clonelimit
numeric. Number of clones to export into csv file. This is
only relevant in -t all
or -t diversity
where clonotypes
are exported into <outdir>/<name>/diversity/clonotypes
detailedComposition
logical. Plots composition logo by IGHV families if set to true, otherwise, plots logos by FR/CDRs.
log
character. Path to log file.
merger
character. Merger used to merge paired-end reads.
fmt
character. File format of file1
and (if present) file2
.
Possible values are FASTA or FASTQ.
sites
character. Path to restriction sites txt
file.
This option is only used if -t rsasimple
primer5end
ANY. Path to 5' end primer FASTA file.
primer3end
ANY. Path to 3' end primer FASTA file.
trim5
numeric. Number of nucleotides to trimd at the 5' end;
trim3
numeric. Number of nucleotides to trimd at the 3' end;
outdir
character. Path to output directory
primer5endoffset
numeric. Number of nucleotides to offset before aligning
5' end primers in primer5end
FASTA file.
threads
numeric. Number of threads to run.
upstream
character. Index (range) of upstream nucleotides to analyze.
This option is only used if -t 5utr
or -t secretion
.
seqtype
character. Sequence type, possible values are either dna
or protein
.
database
character. Path to IgBLAST database.
actualqstart
numeric. Query sequence's starting index (indexing starts from 1). This value overrides the inferred query start position by AbSeq.
fr4cut
logical. The end of FR4 is marked as the end of the sequence if
set to TRUE, otherwise the end of the sequence is either the end of the read
itself, or trimmed to --trim3 <num>
.
domainSystem
character. Domain system to use in IgBLAST, possible
values are either imgt
or kabat
.
abseqReport
returns a list
of AbSeqRep
objects.
# this class is not directly constructed by users, but as a return # value from the abseqReport method. # Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0)
# this class is not directly constructed by users, but as a return # value from the abseqReport method. # Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0)
Plots all samples in the output directory supplied to abseqPy's
--outdir
or -o
argument. Users can optionally specify
which samples in directory
should be compared. Doing so generates
additional plots for clonotype comparison and the usual plots will also
conveniently include these samples using additional aes
thetics.
Calling this function with a valid directory
will always return a
named list
of objects; these individual objects can be
combined using the +
operator to form a new comparison, in which the
report function accepts as its first parameter.
abseqReport(directory, report, compare, BPPARAM)
abseqReport(directory, report, compare, BPPARAM)
directory |
string type. directory as specified in
|
report |
(optional) integer type. The possible values are:
each higher value also does what the previous values do. For example, |
compare |
(optional) vector of strings. From the samples in found in |
BPPARAM |
(optional) BiocParallel backend. Configures the parallel implementation. Refer to BiocParallel for more information. By default, use all available cores. |
named list. List of AbSeqRep objects. The names of
the list elements are taken directly from the repertoire object itself.
This return value is consistent with the return value of report
.
report
. Analogous function, but takes input from
an AbSeqRep or AbSeqCRep object instead.
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) ### 1. The `report` parameter usage example: # report = 0; don't plot, don't collate a HTML report, don't show anything interactive samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # samples is now a named list of AbSeqRep objects # report = 1; just plot pngs; don't collate a HTML report; nothing interactive # samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 1) # samples is now a named list of AbSeqRep objects # report = 2; plot pngs; collate a HTML report; HTML report will NOT be interactive # samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 2) # samples is now a named list of AbSeqRep objects # report = 3 (default); plot pngs; collate a HTML report; HTML report will be interactive # samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 3) # samples is now a named list of AbSeqRep objects ### 2. Using the return value of abseqReport: # NOTE, often, this is used to load multiple samples from different directories # using abseqReport (with report = 0), then the samples are added together # before calling the report function. This is most useful when the samples # live in different abseqPy output directory. # Note that the provided example data has PCR1, PCR2, and PCR3 # samples contained within the directory stopifnot(names(samples) == c("PCR1", "PCR2", "PCR3")) # as a hypothetical example, say we found something # interesting in PCR1 and PCR3, and we want to isolate them: # we want to explicitly compare PCR1 with PCR3 pcr13 <- samples[["PCR1"]] + samples[["PCR3"]] # see abseqR::report for more information. # abseqR::report(pcr13) # uncomment this line to run ### BPPARAM usage: # 4 core machine, use all cores - use whatever value that suits you nproc <- 4 # samples <- abseqReport(file.path(abseqPyOutput, "ex"), # BPPARAM = BiocParallel::MulticoreParam(nproc)) # run sequentially - no multiprocessing # samples <- abseqReport(file.path(abseqPyOutput, "ex"), # BPPARAM = BiocParallel::SerialParam()) # see https://bioconductor.org/packages/release/bioc/html/BiocParallel.html # for more information about how to use BPPARAM and BiocParallel in general.
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) ### 1. The `report` parameter usage example: # report = 0; don't plot, don't collate a HTML report, don't show anything interactive samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # samples is now a named list of AbSeqRep objects # report = 1; just plot pngs; don't collate a HTML report; nothing interactive # samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 1) # samples is now a named list of AbSeqRep objects # report = 2; plot pngs; collate a HTML report; HTML report will NOT be interactive # samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 2) # samples is now a named list of AbSeqRep objects # report = 3 (default); plot pngs; collate a HTML report; HTML report will be interactive # samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 3) # samples is now a named list of AbSeqRep objects ### 2. Using the return value of abseqReport: # NOTE, often, this is used to load multiple samples from different directories # using abseqReport (with report = 0), then the samples are added together # before calling the report function. This is most useful when the samples # live in different abseqPy output directory. # Note that the provided example data has PCR1, PCR2, and PCR3 # samples contained within the directory stopifnot(names(samples) == c("PCR1", "PCR2", "PCR3")) # as a hypothetical example, say we found something # interesting in PCR1 and PCR3, and we want to isolate them: # we want to explicitly compare PCR1 with PCR3 pcr13 <- samples[["PCR1"]] + samples[["PCR3"]] # see abseqR::report for more information. # abseqR::report(pcr13) # uncomment this line to run ### BPPARAM usage: # 4 core machine, use all cores - use whatever value that suits you nproc <- 4 # samples <- abseqReport(file.path(abseqPyOutput, "ex"), # BPPARAM = BiocParallel::MulticoreParam(nproc)) # run sequentially - no multiprocessing # samples <- abseqReport(file.path(abseqPyOutput, "ex"), # BPPARAM = BiocParallel::SerialParam()) # see https://bioconductor.org/packages/release/bioc/html/BiocParallel.html # for more information about how to use BPPARAM and BiocParallel in general.
Plots all samples in the object
argument
and saves the analysis in outputDir
.
Users can optionally specify which samples
in object
should be compared. Doing so generates
additional plots for clonotype comparison and
the usual plots will also conveniently include these samples
using additional aes
thetics.
This method is analogous to abseqReport
.
The only difference is that this method accepts AbSeqRep or
AbSeqCRep objects as its first parameter, and the
outputDir
specifies where to store the result.
report(object, outputDir, report = 3) ## S4 method for signature 'AbSeqRep' report(object, outputDir, report = 3) ## S4 method for signature 'AbSeqCRep' report(object, outputDir, report = 3)
report(object, outputDir, report = 3) ## S4 method for signature 'AbSeqRep' report(object, outputDir, report = 3) ## S4 method for signature 'AbSeqCRep' report(object, outputDir, report = 3)
object |
AbSeqRep or AbSeqCRep object to plot. |
outputDir |
string type. Directory where analysis will be saved to. |
report |
(optional) integer type. The possible values are:
each value also does what the previous values do. For example, |
named list. List of AbSeqRep objects. The names of
the list elements are taken directly from the repertoire object itself.
This return value is consistent with the return value of abseqReport
.
abseqReport
. Analogus function, but takes input from
a string that signifies the output directory of abseqPy as the first
arugment instead.
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # The provided example data has PCR1, PCR2, and PCR3 samples contained within # We can use the + operator to combine samples, thus requesting the # report function to compare them: pcr12 <- samples[["PCR1"]] + samples[["PCR2"]] # generate plots and report for this new comparison # report(pcr12, "PCR1_vs_PCR2") # generate plots only # report(pcr12, "PCR1_vs_PCR2", report = 1) # generate plots, and a non-interactive report # report(pcr12, "PCR1_vs_PCR2", report = 2) # generate plots, and an interactive report # report(pcr12, "PCR1_vs_PCR2", report = 3) # this is the default
# Use example data from abseqR as abseqPy's output, substitute this # with your own abseqPy output directory abseqPyOutput <- tempdir() file.copy(system.file("extdata", "ex", package = "abseqR"), abseqPyOutput, recursive=TRUE) samples <- abseqReport(file.path(abseqPyOutput, "ex"), report = 0) # The provided example data has PCR1, PCR2, and PCR3 samples contained within # We can use the + operator to combine samples, thus requesting the # report function to compare them: pcr12 <- samples[["PCR1"]] + samples[["PCR2"]] # generate plots and report for this new comparison # report(pcr12, "PCR1_vs_PCR2") # generate plots only # report(pcr12, "PCR1_vs_PCR2", report = 1) # generate plots, and a non-interactive report # report(pcr12, "PCR1_vs_PCR2", report = 2) # generate plots, and an interactive report # report(pcr12, "PCR1_vs_PCR2", report = 3) # this is the default