Title: | Analytical R tools for Mass Spectrometry |
---|---|
Description: | artMS provides a set of tools for the analysis of proteomics label-free datasets. It takes as input the MaxQuant search result output (evidence.txt file) and performs quality control, relative quantification using MSstats, downstream analysis and integration. artMS also provides a set of functions to re-format and make it compatible with other analytical tools, including, SAINTq, SAINTexpress, Phosfate, and PHOTON. Check [http://artms.org](http://artms.org) for details. |
Authors: | David Jimenez-Morales [aut, cre] , Alexandre Rosa Campos [aut, ctb] , John Von Dollen [aut], Nevan Krogan [aut] , Danielle Swaney [aut, ctb] |
Maintainer: | David Jimenez-Morales <[email protected]> |
License: | GPL (>= 3) + file LICENSE |
Version: | 1.25.0 |
Built: | 2024-11-14 05:48:38 UTC |
Source: | https://github.com/bioc/artMS |
The configuration file in yaml
format contains
the configuration details required to run artmsQuantification()
, which
includes quality control functions
artms_config
artms_config
The configuration (yaml
) file contains the following sections:
evidence
: /path/to/the/evidence.txt
keys
: /path/to/the/keys.txt
contrasts
: /path/to/the/contrast.txt
summary
: /path/to/the/summary.txt
output
: /path/to/the/output/results/results.txt
basic: 1 # 1 = yes; 0 = no
extended: 1 # 1 = yes; 0 = no
extendedSummary: 0 # 1 = yes; 0 = no
enabled : 1 # 1 = yes; 0 = no
silac:
enabled : 0 # 1 for SILAC experiments
filters:
enabled : 1
contaminants : 1
protein_groups : remove #remove, keep
modifications : ab # PH, UB, AB, APMS
sample_plots : 1 # correlation plots
enabled : 1
msstats_input : # blank if not previous msstats input file is available
profilePlots : none # before, after, before-after, none
normalization_method : equalizeMedians # globalStandards (include a reference protein(s) ), equalizeMedians, quantile, 0
normalization_reference : #should be a value in the Protein column
summaryMethod : TMP # "TMP"(default) means Tukey's median polish, which is robust estimation method. "linear" uses linear mixed model. "logOfSum" conducts log2 (sum of intensities) per run.
censoredInt : NA # Missing values are censored or at random. 'NA' (default) assumes that all 'NA's in 'Intensity' column are censored. '0' uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline should use '0'. Null assumes that all NA intensites are randomly missing.
MBimpute : 1 # only for summaryMethod="TMP" and censoredInt='NA' or '0'. TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. FALSE uses the values assigned by cutoffCensored
For all othe features, please check documentation for MSstats' dataProcess function
output_extras :
enabled : 1 # if 0, it wont do anything in this section
annotate :
enabled: 1 # 1|0 whether to annotate the proteins in the results or not
species : HUMAN # Supported species: HUMAN, MOUSE, ANOPHELES, ARABIDOPSIS, BOVINE, WORM, CANINE, FLY, ZEBRAFISH, ECOLI_STRAIN_K12, ECOLI_STRAIN_SAKAI, CHICKEN, RHESUS, MALARIA, CHIMP, RAT, YEAST, PIG, XENOPUS
plots:
volcano: 1
heatmap: 1
LFC : -1.5 1.5 # Range of minimal log2fc
FDR : 0.05
heatmap_cluster_cols : 0
heatmap_display : log2FC # log2FC or pvalue
The list of protein complexes has been enriched with mitochondria proteins from mouse, as described in this paper:
2018 - Ruchi Masand, Esther Paulo, Dongmei Wu , Yangmeng Wang, Danielle L. Swaney, David Jimenez-Morales, Nevan J. Krogan, and Biao Wang Proteome Imbalance of Mitochondrial Electron Transport Chain in Brown Adipocytes Leads to Metabolic Benefits. Cell Metab. 2018 Mar 06; 27(3):616-629.e4
artms_data_corum_mito_database
artms_data_corum_mito_database
Tab delimited file.
To find out more about the format and columns available at CORUM, please visit this link
LAST CORUM DOWNLOAD DATE: 2017-08-01
LPN PATHOGEN: Legionella pneumophila subsp. pneumophila (strain Philadelphia 1 / ATCC 33152 / DSM 7513) UNIPROT IDS
artms_data_pathogen_LPN
artms_data_pathogen_LPN
A data.frame of Entry IDs
TB PATHOGEN: Mycobacterium tuberculosis (strain ATCC 35801 / TMC 107 / Erdman) UNIPROTS IDS
artms_data_pathogen_TB
artms_data_pathogen_TB
A data.frame of Entry IDs
The configuration file with default options to run the available PH dataset with 'artmsQuantification()“
artms_data_ph_config
artms_data_ph_config
The configuration (yaml
) file contains the following sections:
evidence
: Empty. To test an example, run artms_data_ph_config$files$evidence <- artms_data_ph_evidence
keys
: Empty To test an example datasets run artms_data_ph_config$files$keys <- artms_data_ph_keys
contrasts
: Empty. To test the example datasets, run artms_data_ph_config$files$contrasts <- artms_data_ph_contrast
summary
:
output
: "results.txt"
basic: 0
extended: 0
extendedSummary: 0
enabled : 1
silac:
enabled : 0
filters:
enabled : 1
contaminants : 1
protein_groups : remove
modifications : PH
sample_plots : 1
enabled : 1
msstats_input : # blank if not previous msstats input file is available
profilePlots : none # before, after, before-after, none
normalization_method : equalizeMedians
normalization_reference : #should be a value in the Protein column
summaryMethod : TMP
censoredInt : NA
cutoffCensored : minFeature
MBimpute : 1
feature_subset: all
output_extras :
enabled : 1
annotate :
enabled: 1
species : HUMAN
plots:
volcano: 1
heatmap: 1
LFC : -1 1
FDR : 0.05
heatmap_cluster_cols : 0
heatmap_display : log2FC
Contrast file with the relative quantification to be performed for the two conditions available in the example dataset: "Cal33-HSC6". See vignette for more details on how to prepare the contrast file.
artms_data_ph_contrast
artms_data_ph_contrast
list with one comparison: "Cal33-HSC6"
Evidence file from a PH experiment consisting of two
head and neck cancer cell lines ("Conditions" "Cal33"
and "HSC6"
).
Unfortunately, the number of lines was reduced to 1/20 due
to bioconductor limitations on data size, but it should be enough to test the
qc and quantification functions. The number of total columns
from the original evidence file was also reduced to 36
(out of the original 66 columns).
Check colnames(artms_data_ph_evidence)
for details
artms_data_ph_evidence
artms_data_ph_evidence
A data frame with all the columns available in an evidence file generated with MaxQuant version 1.6.2.3
the artMS
keys file provides the details of the experimental
design for any given proteomics experiment.
This particular example belongs to a PH experiment consisting of two
head and neck cancer cell lines ("Conditions" "Cal33"
and "HSC6"
),
with 2 biological replicates each (in this reduced version)
artms_data_ph_keys
artms_data_ph_keys
Tab delimited file with the following columns:
Raw file processed. Each one should be a unique biological (or technical) replicate
Type of labeling. L
is used for label free
experiments
Label for conditions. VERY IMPORTANT: Only alpha-numeric
characters and underscore (_)
are allowed
Label for the Biological replicates. VERY IMPORTANT:
Use the same labeling for bioreplicate as the Condition, but adding a
dash (-)
corresponding to the number of biological replicate.
For example, for Condition
"Cal"
, use Cal-1
, Cal-2
, Cal-3
, etc
for the bioreplicates
The MS run number
Normalized data obtained from the artmsQuantification()
step
of the PH dataset (global analysis)
artms_data_ph_msstats_modelqc
artms_data_ph_msstats_modelqc
A data frame resulting from running the latest version of
MSstats::groupComparison
function required as input for
artmsAnalysisQuantifications()
Relative quantification results obtained running MSstats
on the available PH datasets (global analysis).
Changes in protein phosphorylation were quantified between two conditions
(check artms_data_ph_contrast
)
artms_data_ph_msstats_results
artms_data_ph_msstats_results
A data frame resulting from running the latest version of MSstats
Dataset randomly generated for testing purposes
artms_data_randomDF
artms_data_randomDF
A data frame with 100 rows and 10 variables:
Dataset generated using this code
data.frame(replicate(10,sample(0:1,100,rep=TRUE)))
Analysis of relative quantifications, including:
Annotations
Summary files in different format (xls, txt) and shapes (long, wide)
Numerous summary plots
Enrichment analysis using Gprofiler
PCA of quantifications
Clustering analysis
Basic imputation of missing values
To run this function, the following packages must be installed on your system:
From bioconductor:
BiocManager::install(c("ComplexHeatmap", "org.Mm.eg.db"))
From CRAN:
install.packages(c("factoextra", "FactoMineR", "gProfileR", "PerformanceAnalytics"))
artmsAnalysisQuantifications( log2fc_file, modelqc_file, species, output_dir = "analysis_quant", outliers = c("keep", "iqr", "std"), enrich = TRUE, l2fc_thres = 1, choosePvalue = c("adjpvalue", "pvalue"), isBackground = "nobackground", isPtm = "global", mnbr = 2, pathogen = "nopathogen", plotPvaluesLog2fcDist = TRUE, plotAbundanceStats = TRUE, plotReproAbundance = TRUE, plotCorrConditions = TRUE, plotCorrQuant = TRUE, plotPCAabundance = TRUE, plotFinalDistributions = TRUE, plotPropImputation = TRUE, plotHeatmapsChanges = TRUE, plotTotalQuant = TRUE, plotClusteringAnalysis = TRUE, data_object = FALSE, printPDF = TRUE, verbose = TRUE )
artmsAnalysisQuantifications( log2fc_file, modelqc_file, species, output_dir = "analysis_quant", outliers = c("keep", "iqr", "std"), enrich = TRUE, l2fc_thres = 1, choosePvalue = c("adjpvalue", "pvalue"), isBackground = "nobackground", isPtm = "global", mnbr = 2, pathogen = "nopathogen", plotPvaluesLog2fcDist = TRUE, plotAbundanceStats = TRUE, plotReproAbundance = TRUE, plotCorrConditions = TRUE, plotCorrQuant = TRUE, plotPCAabundance = TRUE, plotFinalDistributions = TRUE, plotPropImputation = TRUE, plotHeatmapsChanges = TRUE, plotTotalQuant = TRUE, plotClusteringAnalysis = TRUE, data_object = FALSE, printPDF = TRUE, verbose = TRUE )
log2fc_file |
(char) MSstats results file location |
modelqc_file |
(char) MSstats modelqc file location |
species |
(char) Select one species. Species currently supported for a full analysis (including enrichment analysis):
|
output_dir |
(char) Name for the folder to output the results from the function. Default is current directory (recommended to provide a new folder name). |
outliers |
(char) It allows to keep or remove outliers. Options:
|
enrich |
(logical) Performed enrichment analysis using GprofileR?
Only available for species HUMAN and MOUSE.
|
l2fc_thres |
(int) log2fc cutoff for enrichment analysis (default,
|
choosePvalue |
(char) specify whether |
isBackground |
(char) background of gene names for enrichment analysis.
|
isPtm |
(char) Is a ptm-site quantification?
|
mnbr |
(int) PARAMETER FOR NAIVE IMPUTATION:
"minimal number of biological replicates" for "naive
imputation" and filtering. Default: |
pathogen |
(char) Is there a pathogen in the dataset as well?
if it does not, then use |
plotPvaluesLog2fcDist |
(logical) If |
plotAbundanceStats |
(logical) If |
plotReproAbundance |
(logical) If |
plotCorrConditions |
(logical) If |
plotCorrQuant |
(logical) if |
plotPCAabundance |
(logical) if |
plotFinalDistributions |
(logical) if |
plotPropImputation |
(logical) if |
plotHeatmapsChanges |
(logical) if |
plotTotalQuant |
(logical) if |
plotClusteringAnalysis |
(logical) if |
data_object |
(logical) flag to indicate whether the required files are data objects. Default is FALSE |
printPDF |
If |
verbose |
(logical) |
(data.frame) summary of quantifications, including annotations, enrichments, etc
# Testing that the files cannot be empty artmsAnalysisQuantifications(log2fc_file = NULL, modelqc_file = NULL, species = NULL, output_dir = NULL)
# Testing that the files cannot be empty artmsAnalysisQuantifications(log2fc_file = NULL, modelqc_file = NULL, species = NULL, output_dir = NULL)
Adding the species name to every protein.
This makes more sense if there are more than one species in the dataset,
which must be specified in the pathogen
option. Influenza is a special
case that it does not need to be specified, as far as the proteins were
originally annotated as INFLUENZAGENE_STRAIN
(strains covered H1N1
, H3N2
, H5N1
), as for example, NS1_H1N1
artmsAnnotateSpecie(df, pathogen = "nopathogen", species)
artmsAnnotateSpecie(df, pathogen = "nopathogen", species)
df |
(data.frame) with a |
pathogen |
(char) Is there a pathogen in the dataset as well?
if it does not, then use |
species |
(char) Host organism (supported for now: |
(data.frame) The same data.frame but with an extra column specifying the species
# Adding a new column with the main species of the data. Easy. # But the main functionality is to add both the host-species and a pathogen, # which is not illustrated in this example data_with_specie <- artmsAnnotateSpecie(df = artms_data_ph_msstats_results, species = "human")
# Adding a new column with the main species of the data. Easy. # But the main functionality is to add both the host-species and a pathogen, # which is not illustrated in this example data_with_specie <- artmsAnnotateSpecie(df = artms_data_ph_msstats_results, species = "human")
Annotate gene name and symbol based on uniprot ids. It will
take the column from your data.frame specified by the columnid
argument,
search for the gene symbol, name, and entrez based on the species (species
argument) and merge the information back to the input data.frame
artmsAnnotationUniprot(x, columnid, species, verbose = TRUE)
artmsAnnotationUniprot(x, columnid, species, verbose = TRUE)
x |
(data.frame) to be annotated (or file path and name) |
columnid |
(char) The column with the uniprotkb ids |
species |
(char) The species name. Check |
verbose |
(logical) |
(data.frame) with two new columns: Gene
and Protein.name
# This example adds annotations to the example evidence file included in # artMS, based on the column 'Proteins'. evidence_anno <- artmsAnnotationUniprot(x = artms_data_ph_evidence, columnid = 'Proteins', species = 'human')
# This example adds annotations to the example evidence file included in # artMS, based on the column 'Proteins'. evidence_anno <- artmsAnnotationUniprot(x = artms_data_ph_evidence, columnid = 'Proteins', species = 'human')
Input an evidence file from MaxQuant and a file containing a list of proteins of interest (optional). The function will summarize from the evidence file and report back the average intensity, average retention time, and the average caliberated retention time. If a list of proteins is provided, then only those proteins will be summarized and returned.
artmsAvgIntensityRT( evidence_file, protein_file = NULL, output_file = FALSE, verbose = TRUE )
artmsAvgIntensityRT( evidence_file, protein_file = NULL, output_file = FALSE, verbose = TRUE )
evidence_file |
(char) The filepath to the MaxQuant searched data (evidence) file (txt tab delimited file). |
protein_file |
(char) The file path to a file or vector containing a list of proteins of interest. |
output_file |
(char) The file name for the results
(must have the extension |
verbose |
(logical) |
An R object with the results and a file with the results (if the output_file argument is provided). It contains averages of Intensity, Retention Time, Caliberated Retention Time
ave_int <- artmsAvgIntensityRT(evidence_file = artms_data_ph_evidence)
ave_int <- artmsAvgIntensityRT(evidence_file = artms_data_ph_evidence)
Making easier to change a column name in any data.frame
artmsChangeColumnName(dataset, oldname, newname)
artmsChangeColumnName(dataset, oldname, newname)
dataset |
(data.frame) with the column name you want to change |
oldname |
(char) the old column name |
newname |
(char) the new name for that column |
(data.frame) with the new specified column name
artms_data_ph_evidence <- artmsChangeColumnName( dataset = artms_data_ph_evidence, oldname = "Phospho..STY.", newname = "PH_STY")
artms_data_ph_evidence <- artmsChangeColumnName( dataset = artms_data_ph_evidence, oldname = "Phospho..STY.", newname = "PH_STY")
artMS
enables the relative quantification of untargeted
polar metabolites using the alignment table generated by Markview.
MarkerView is an ABSciex software that supports the files
generated by Analyst software (.wiff
) used to run our specific mass
spectrometer (ABSciex Triple TOF 5600+).
It also supports .t2d
files generated by the
Applied Biosystems 4700/4800 MALDI-TOF.
MarkerView software is used to align mass spectrometry data from several
samples for comparison. Using the import feature in the software, .wiff
files (also .t2d
MALDI-TOF files and tab-delimited .txt
mass spectra data
in mass-intensity format) are loaded for retention time alignment.
Once the data files are selected, a series of windows will appear wherein
peak finding, alignment, and filtering options can be entered and selected.
These options include minimum spectral peak width, minimum retention time
peak width, retention time and mass tolerance, and the ability to filter
out peaks that do not appear in more than a user selected number of samples.
'artmsConvertMetabolomics“ processes the markview file to enable QC analysis and relative quantification using the artMS functions
artmsConvertMetabolomics(input_file, out_file, id_file = NULL, verbose = TRUE)
artmsConvertMetabolomics(input_file, out_file, id_file = NULL, verbose = TRUE)
input_file |
(char) Markview input file |
out_file |
(char) Output file name |
id_file |
(char) KEGG database |
verbose |
(logical) |
(text file) Outputs the converted output name
# Testing that the arguments cannot be null artmsConvertMetabolomics(input_file = NULL, out_file = NULL)
# Testing that the arguments cannot be null artmsConvertMetabolomics(input_file = NULL, out_file = NULL)
Protein abundance dot plots for each unique uniprot id. It can take a long time
artmsDataPlots(input_file, output_file, verbose = TRUE)
artmsDataPlots(input_file, output_file, verbose = TRUE)
input_file |
(char) File path and name to the |
output_file |
(char) Output file (path) name (add the |
verbose |
(logical) |
(pdf) file with each individual protein abundance plot for each conditions
## Not run: artmsDataPlots(input_file = "results/ab-results-mss-normalized.txt", output_file = "results/ab-results-mss-normalized.pdf") ## End(Not run)
## Not run: artmsDataPlots(input_file = "results/ab-results-mss-normalized.txt", output_file = "results/ab-results-mss-normalized.pdf") ## End(Not run)
Enrichment analysis of the selected proteins
artmsEnrichLog2fc( dataset, species, background, heatmaps = FALSE, output_name = "enrichment.txt", verbose = TRUE )
artmsEnrichLog2fc( dataset, species, background, heatmaps = FALSE, output_name = "enrichment.txt", verbose = TRUE )
dataset |
(data.frame) with a |
species |
(char) Specie, only supported "human" or "mouse" |
background |
(vector) Background genes for the enrichment analysis. |
heatmaps |
(logical) if |
output_name |
(char) Name of the annotation files, which will be used
as well for the heatmaps (if |
verbose |
(logical) |
(data.frame) Results from the enrichment analysis using Gprofiler and heatmaps (if selected)
## Not run: # The data must be annotated (Protein and Gene columns) data_annotated <- artmsAnnotationUniprot( x = artms_data_ph_msstats_results, columnid = "Protein", species = "human") # And then the enrichment enrich_set <- artmsEnrichLog2fc( dataset = data_annotated, species = "human", background = unique(data_annotated$Gene)) ## End(Not run)
## Not run: # The data must be annotated (Protein and Gene columns) data_annotated <- artmsAnnotationUniprot( x = artms_data_ph_msstats_results, columnid = "Protein", species = "human") # And then the enrichment enrich_set <- artmsEnrichLog2fc( dataset = data_annotated, species = "human", background = unique(data_annotated$Gene)) ## End(Not run)
This function simplifies the enrichment analysis performed by the excellent tool GprofileR.
artmsEnrichProfiler( x, categorySource = c("GO"), species, background = NA, verbose = TRUE )
artmsEnrichProfiler( x, categorySource = c("GO"), species, background = NA, verbose = TRUE )
x |
(list, data.frame) List of protein ids. It can be anything:
either a list of ids, or you could also send a data.frame and it will find
the columns with the IDs. Is not cool? Multiple list can be also sent
simultaneously, as for example running:
|
categorySource |
(vector) Resources providing the terms on which the enrichment will be performed. The supported resources by gprofiler are:
|
species |
(char) Specie code: Organism names are constructed by concatenating the first letter of the name and the family name. Example: human - ’hsapiens’, mouse - ’mmusculus’. Check gProfileR to find out more about supported species. |
background |
(vector) gene list to use as background for the enrichment
analysis. Default: |
verbose |
(logical) |
This function uses the following gprofiler
arguments as default:
ordered_query = FALSE
significant = TRUE
exclude_iea = TRUE
underrep = FALSE
evcodes = FALSE
region_query = FALSE
max_p_value = 0.05
min_set_size = 0
max_set_size = 0
min_isect_size = 0
correction_method = "analytical" #Options: "gSCS", "fdr", "bonferroni"
hier_filtering = "none"
domain_size = "known" # annotated or known
numeric_ns = ""
png_fn = NULL
include_graph = TRUE
The enrichment results as provided by gprofiler
## Not run: # annotate the MSstats results to get the Gene name data_annotated <- artmsAnnotationUniprot( x = artms_data_ph_msstats_results, columnid = "Protein", species = "human") # Filter the list of genes with a log2fc > 2 filtered_data <- unique(data_annotated$Gene[which(data_annotated$log2FC > 2)]) # And perform enrichment analysis data_annotated_enrich <- artmsEnrichProfiler( x = filtered_data, categorySource = c('KEGG'), species = "hsapiens", background = unique(data_annotated$Gene)) ## End(Not run)
## Not run: # annotate the MSstats results to get the Gene name data_annotated <- artmsAnnotationUniprot( x = artms_data_ph_msstats_results, columnid = "Protein", species = "human") # Filter the list of genes with a log2fc > 2 filtered_data <- unique(data_annotated$Gene[which(data_annotated$log2FC > 2)]) # And perform enrichment analysis data_annotated_enrich <- artmsEnrichProfiler( x = filtered_data, categorySource = c('KEGG'), species = "hsapiens", background = unique(data_annotated$Gene)) ## End(Not run)
Converts the MaxQuant evidence file to the 3 required files
by SAINTexpress. One can choose to either use the spectral counts
(use msspc
) or the intensities
(use msint
) for the analysis.
artmsEvidenceToSaintExpress( evidence_file, keys_file, ref_proteome_file, quant_variable = c("msspc", "msint"), output_file, verbose = TRUE )
artmsEvidenceToSaintExpress( evidence_file, keys_file, ref_proteome_file, quant_variable = c("msspc", "msint"), output_file, verbose = TRUE )
evidence_file |
(char) The evidence file path and name |
keys_file |
(char) Keys file with a SAINT column
specifying test ( |
ref_proteome_file |
(char) Reference proteome path file name in fasta format |
quant_variable |
(char) choose either
|
output_file |
(char) Output file name (must have extension .txt) |
verbose |
(logical) |
The 3 required files by SAINTexpress:
interactions.txt
preys.txt
baits.txt
# Testing that the files cannot be empty artmsEvidenceToSaintExpress(evidence_file = NULL, keys_file = NULL, ref_proteome_file = NULL)
# Testing that the files cannot be empty artmsEvidenceToSaintExpress(evidence_file = NULL, keys_file = NULL, ref_proteome_file = NULL)
Converts the MaxQuant evidence file to the required files
by SAINTq. The user can choose to use either peptides with spectral counts
(use msspc
) or the all the peptides (use all
) for the analysis.
The quantitative can be also chosen (either MS Intensity or Spectral Counts)
artmsEvidenceToSAINTq( evidence_file, keys_file, output_dir = "artms_saintq", sc_option = c("all", "msspc"), fractions = FALSE, quant_variable = c("msint", "msspc"), verbose = TRUE )
artmsEvidenceToSAINTq( evidence_file, keys_file, output_dir = "artms_saintq", sc_option = c("all", "msspc"), fractions = FALSE, quant_variable = c("msint", "msspc"), verbose = TRUE )
evidence_file |
(char or data.frame) The evidence file path and name, or data.frame |
keys_file |
(char) Keys file with a SAINT column specifying
test ( |
output_dir |
(char) New directory to create and save files. Default is current directory (recommended to provide a new folder name). |
sc_option |
(char). Filter peptides with spectral counts only. Two options:
|
fractions |
(logical) |
quant_variable |
(char) Select the quantitative variable. Two options available:
|
verbose |
(logical) |
After running the script, the new specified folder should contain the folling files:
saintq-config-peptides
saintq-config-proteins
saintq_input_peptides.txt
saintq_input_proteins.txt
Then cd
into the new folder and run either of the following two options
(assuming that saintq
is installed in your linux/unix/mac os x system):
> saintq config-saintq-peptides
or
> saintq config-saintq-proteins
The input files requires to run SAINTq
# Testing that the files cannot be empty artmsEvidenceToSAINTq (evidence_file = NULL, keys_file = NULL, output_dir = NULL)
# Testing that the files cannot be empty artmsEvidenceToSAINTq (evidence_file = NULL, keys_file = NULL, output_dir = NULL)
Remove contaminants and erronously identified 'reverse' sequences by MaxQuant, in addition to empty protein ids
artmsFilterEvidenceContaminants(x, verbose = TRUE)
artmsFilterEvidenceContaminants(x, verbose = TRUE)
x |
(data.frame) of the Evidence file |
verbose |
(logical) |
(data.frame) without REV__ and CON__ Protein ids
ef <- artmsFilterEvidenceContaminants(x = artms_data_ph_evidence)
ef <- artmsFilterEvidenceContaminants(x = artms_data_ph_evidence)
Generate extended detailed ph-site file, where every line is a ph site instead of a peptide. Therefore, if one peptide has multiple ph sites it will be breaking down in each of the sites. This file will help generate input files for tools as Phosfate or PHOTON
artmsGeneratePhSiteExtended( df, pathogen = "nopathogen", species, ptmType, output_name )
artmsGeneratePhSiteExtended( df, pathogen = "nopathogen", species, ptmType, output_name )
df |
(data.frame) of log2fc and imputed values |
pathogen |
(char) Is there a pathogen in the dataset as well? Available
pathogens are |
species |
(char) Main organism (supported for now: |
ptmType |
(char) It must be a ptm-site quantification dataset. Either:
yes: |
output_name |
(char) A output file name (extension |
(data.frame) extended version of the ph-site
## Not run: artmsGeneratePhSiteExtended(df = dfobject, species = "mouse", ptmType = "ptmsites", output_name = log2fc_file) ## End(Not run)
## Not run: artmsGeneratePhSiteExtended(df = dfobject, species = "mouse", ptmType = "ptmsites", output_name = log2fc_file) ## End(Not run)
MaxQuant introduced changes in the column names and number of columns for the evidence file in version 1 (we think). This function check whether the evidence comes from the latest version of MaxQuant.
artmsIsEvidenceNewVersion(evidence_file)
artmsIsEvidenceNewVersion(evidence_file)
evidence_file |
the evidence file name |
(logical) TRUE
if it is a newer version of MaxQuant,
FALSE
otherwise
artmsIsEvidenceNewVersion(evidence_file = artms_data_ph_evidence)
artmsIsEvidenceNewVersion(evidence_file = artms_data_ph_evidence)
Given a species name, it checkes whether is supported, and if supported, check whether the annotation package is installed.
artmsIsSpeciesSupported(species, verbose = TRUE)
artmsIsSpeciesSupported(species, verbose = TRUE)
species |
(char) The species name. Species currently supported as part of artMS:
And the following species can be used as well, but the user needs to install the corresponding org.db package:
|
verbose |
(logical) |
(string) Name of the package for the given species
# Should return TRUE artmsIsSpeciesSupported(species = "HUMAN") artmsIsSpeciesSupported(species = "CHIMP")
# Should return TRUE artmsIsSpeciesSupported(species = "HUMAN") artmsIsSpeciesSupported(species = "CHIMP")
Downloading a Reference Uniprot fasta database includes several Uniprot IDs for every protein. If the regular expression available in Maxquant is not activated, the full id will be used in the Proteins, Lead Protein, and Leading Razor Protein columns. This script leaves only the Entry ID.
For example, values in a Protein column like this:
sp|P12345|Entry_name;sp|P54321|Entry_name2
will be replace by
'P12345;P54321“
artmsLeaveOnlyUniprotEntryID(x, columnid)
artmsLeaveOnlyUniprotEntryID(x, columnid)
x |
(data.frame) that contains the |
columnid |
(char) Column name with the full uniprot ids |
(data.frame) with only Entry IDs.
# Example of data frame with full uniprot ids and sequences p <- c("sp|A6NIE6|RN3P2_HUMAN;sp|Q9NYV6|RRN3_HUMAN", "sp|A7E2V4|ZSWM8_HUMAN", "sp|A5A6H4|ROA1_PANTR;sp|P09651|ROA1_HUMAN;sp|Q32P51|RA1L2_HUMAN", "sp|A0FGR8|ESYT2_HUMAN") s <- c("ALENDFFNSPPRK", "GWGSPGRPK", "SSGPYGGGGQYFAK", "VLVALASEELAK") evidence <- data.frame(Proteins = p, Sequences = s, stringsAsFactors = FALSE) # Replace the Proteins column with only Entry ids evidence <- artmsLeaveOnlyUniprotEntryID(x = evidence, columnid = "Proteins")
# Example of data frame with full uniprot ids and sequences p <- c("sp|A6NIE6|RN3P2_HUMAN;sp|Q9NYV6|RRN3_HUMAN", "sp|A7E2V4|ZSWM8_HUMAN", "sp|A5A6H4|ROA1_PANTR;sp|P09651|ROA1_HUMAN;sp|Q32P51|RA1L2_HUMAN", "sp|A0FGR8|ESYT2_HUMAN") s <- c("ALENDFFNSPPRK", "GWGSPGRPK", "SSGPYGGGGQYFAK", "VLVALASEELAK") evidence <- data.frame(Proteins = p, Sequences = s, stringsAsFactors = FALSE) # Replace the Proteins column with only Entry ids evidence <- artmsLeaveOnlyUniprotEntryID(x = evidence, columnid = "Proteins")
Map GENE SYMBOL, NAME, AND ENTREZID to a vector of Uniprot IDS
artmsMapUniprot2Entrez(uniprotkb, species)
artmsMapUniprot2Entrez(uniprotkb, species)
uniprotkb |
(vector) Vector of UniprotKB IDs |
species |
(char) The species name. Species currently supported
as part of artMS: check |
(data.frame) with ENTREZID and GENENAMES mapped on UniprotKB ids
# Load an example with human proteins exampleID <- c("Q6P996", "B1N8M6") artmsMapUniprot2Entrez(uniprotkb = exampleID, species = "HUMAN")
# Load an example with human proteins exampleID <- c("Q6P996", "B1N8M6") artmsMapUniprot2Entrez(uniprotkb = exampleID, species = "HUMAN")
Merge the evidence and keys files on the given columns
artmsMergeEvidenceAndKeys( x, keys, by = c("RawFile"), isSummary = FALSE, verbose = TRUE )
artmsMergeEvidenceAndKeys( x, keys, by = c("RawFile"), isSummary = FALSE, verbose = TRUE )
x |
(data.frame or char) The evidence data, either as data.frame or the file name (and path). It also works for the summary.txt file |
keys |
The keys data, either as a data.frame or file name (and path) |
by |
(vector) specifying the columns use to merge the evidence and keys.
Default: |
isSummary |
(logical) TRUE or FALSE (default) |
verbose |
(logical) |
(data.frame) with the evidence and keys merged
evidenceKeys <- artmsMergeEvidenceAndKeys(x = artms_data_ph_evidence, keys = artms_data_ph_keys)
evidenceKeys <- artmsMergeEvidenceAndKeys(x = artms_data_ph_evidence, keys = artms_data_ph_keys)
Converts the MSStats results file to wide format (unique Protein ID and columns are the comparisons), as well as adds BioReplicate information about
the Number of Unique Peptides,
Spectral Counts
Intensities for each protein. In cases where there are multiple values for a Protein-BioReplicate pair due to minute changes in sequence, the maximum value is taken for the pair. Any pairs without a value are assigned a value of NA.
artmsMsstatsSummary( evidence_file, prot_group_file, keys_file, results_file, return_df = FALSE, verbose = TRUE )
artmsMsstatsSummary( evidence_file, prot_group_file, keys_file, results_file, return_df = FALSE, verbose = TRUE )
evidence_file |
(char or data.frame) The filepath to the MaxQuant searched data (evidence) file (txt tab delimited file). Only works for the newer versions of the evidence file. |
prot_group_file |
(char) The filepath to the MaxQuant
|
keys_file |
(char) The filepath to the keys file used with MSStats (txt tab delimited file). |
results_file |
(char) The filepath to the MSStats results file in t he default long format (txt tab delimited file or data.frame). |
return_df |
(data.frame) Whether or not to return the results to the R environment upon completion. This is useful if this is being used in an R pipeline and you want to feed the results directly into the next stage of analysis via an R environment/terminal. Regardless, the results will be written to file. Default = FALSE |
verbose |
(logical) |
(data.frame or txt file) with the summary
# Testing warning if files are not submitted test <- artmsMsstatsSummary(evidence_file = NULL, prot_group_file = NULL, keys_file = NULL, results_file = NULL)
# Testing warning if files are not submitted test <- artmsMsstatsSummary(evidence_file = NULL, prot_group_file = NULL, keys_file = NULL, results_file = NULL)
It takes as input the imputedL2fcExtended.txt
results
generated by the artmsAnalysisQuantifications()
function and generates
the Phosfate input file (or data.frame)
Please, notice that the only species suported by Phosfate is humans.
artmsPhosfateOutput(inputFile, output_dir = ".", verbose = TRUE)
artmsPhosfateOutput(inputFile, output_dir = ".", verbose = TRUE)
inputFile |
(char) the |
output_dir |
(char) Name of the folder to output results
(Default: current directory. Recommended: |
verbose |
(logical) |
Multiple output files (inputs of phosfate)
## Not run: artmsPhosfateOutput(inputFile) ## End(Not run)
## Not run: artmsPhosfateOutput(inputFile) ## End(Not run)
It takes as input the imputedL2fcExtended.txt
results
generated by the artmsAnalysisQuantifications()
function and generates
the PHOTON input file.
Please, notice that the only species suported by PHOTON is humans.
artmsPhotonOutput(inputFile, output_dir = ".", verbose = TRUE)
artmsPhotonOutput(inputFile, output_dir = ".", verbose = TRUE)
inputFile |
(char) the |
output_dir |
(char) Name of the folder to output results (Default: current. Recommended: "photon_input_files" or similar) |
verbose |
(logical) |
Multiple output files (inputs of phosfate)
## Not run: artmsPhotonOutput(inputFile) ## End(Not run)
## Not run: artmsPhotonOutput(inputFile) ## End(Not run)
Heatmap of the Relative Quantifications (MSStats results)
artmsPlotHeatmapQuant( input_file, output_file = "quantifications_heatmap.pdf", species, labels = "*", cluster_cols = FALSE, display = "log2FC", lfc_lower = -2, lfc_upper = 2, whatPvalue = "adj.pvalue", FDR = 0.05, verbose = TRUE )
artmsPlotHeatmapQuant( input_file, output_file = "quantifications_heatmap.pdf", species, labels = "*", cluster_cols = FALSE, display = "log2FC", lfc_lower = -2, lfc_upper = 2, whatPvalue = "adj.pvalue", FDR = 0.05, verbose = TRUE )
input_file |
(char) MSstats |
output_file |
(char) Output file name (pdf format) and location. Default:"quantifications_heatmap.pdf" |
species |
(char). Specie name to be able to add the Gene name. To find
out more about the supported species check |
labels |
(vector) of uniprot ids if only specific labes would like to be plotted. Default: all labels |
cluster_cols |
(boolean) |
display |
Metric to be displayed. Options:
|
lfc_lower |
(int) Lower limit for the log2fc. Default: -2 |
lfc_upper |
(int) Upper limit for the log2fc. Default: +2 |
whatPvalue |
(char) |
FDR |
(int) Upper limit false discovery rate (or pvalue). Default: 0.05 |
verbose |
(logical) |
(pdf or ggplot2 object) heatmap of the MSStats results using the selected metric
# Unfortunately, the example does not contain any significant hits # Use for illustration purposes artmsPlotHeatmapQuant(input_file = artms_data_ph_msstats_results, species = "human", output_file = NULL, whatPvalue = "pvalue", lfc_lower = -1, lfc_upper = 1)
# Unfortunately, the example does not contain any significant hits # Use for illustration purposes artmsPlotHeatmapQuant(input_file = artms_data_ph_msstats_results, species = "human", output_file = NULL, whatPvalue = "pvalue", lfc_lower = -1, lfc_upper = 1)
ProteinID
to ProteinID_AAnumber
notationIt enables the modified-peptide specific quantification by
converting the Protein column of the evidence file selected by the user
to an ProteinID_AAnumber
notation.
In this way, each of the modified peptides can be quantified
independently across conditions.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING: we have detected a version of MaxQuant (>1.6.3.0) outputs a' "Modified sequence" column of the evidence file that has two important changes for the annotation of phosphorylation:
Uses p
instead of (ph)
The modified residue (i.e. STY
) is the residue on the right of the p
,
instead of the residue to the left of (ph)
, as usual.
We have introduced a modification to detect and address this issue, but
we advice the user to double check both the new evidence file with the
introduce new notation and the -mapping.txt
file and check that there
are no NA values for the notation of phophopeptides.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
artmsProtein2SiteConversion( evidence_file, ref_proteome_file, column_name = c("Leading razor protein", "Leading proteins", "Proteins"), output_file, mod_type, overwrite_evidence = FALSE, verbose = TRUE )
artmsProtein2SiteConversion( evidence_file, ref_proteome_file, column_name = c("Leading razor protein", "Leading proteins", "Proteins"), output_file, mod_type, overwrite_evidence = FALSE, verbose = TRUE )
evidence_file |
(char) The evidence file name and location |
ref_proteome_file |
(char) The reference proteome used as database
to search the |
column_name |
(char) The Protein Column Name to map. Options:
|
output_file |
(char) Output file name
( |
mod_type |
(char) The posttranslational modification. Options:
|
overwrite_evidence |
(logical) if <output_file> is the same
as <evidence_file>, |
verbose |
(logical) |
(file) Return a new evidence file with the specified Protein id column modified by adding the sequence site location(s) + postranslational modification(s) to the uniprot entry / refseq id.
Output ID examples: A34890_ph3
; Q64890_ph24_ph456
;
Q64890_ub34_ub129_ub234
; Q64890_ac35
.
# Testing warning if files are not submitted. artmsProtein2SiteConversion(evidence_file = NULL, ref_proteome_file = NULL, output_file = NULL)
# Testing warning if files are not submitted. artmsProtein2SiteConversion(evidence_file = NULL, ref_proteome_file = NULL, output_file = NULL)
Quality Control analysis of the MaxQuant evidence file
artmsQualityControlEvidenceBasic( evidence_file, keys_file, prot_exp = c("AB", "PH", "UB", "AC", "APMS", "PTM:XXX:yy"), output_dir = "qc_basic", output_name = "qcBasic_evidence", isSILAC = FALSE, plotINTDIST = FALSE, plotREPRO = FALSE, plotCORMAT = TRUE, plotINTMISC = TRUE, plotPTMSTATS = TRUE, printPDF = TRUE, verbose = TRUE )
artmsQualityControlEvidenceBasic( evidence_file, keys_file, prot_exp = c("AB", "PH", "UB", "AC", "APMS", "PTM:XXX:yy"), output_dir = "qc_basic", output_name = "qcBasic_evidence", isSILAC = FALSE, plotINTDIST = FALSE, plotREPRO = FALSE, plotCORMAT = TRUE, plotINTMISC = TRUE, plotPTMSTATS = TRUE, printPDF = TRUE, verbose = TRUE )
evidence_file |
(char or data.frame) The evidence file path and name, or data.frame |
keys_file |
(char or data.frame) The keys file path and name or data.frame |
prot_exp |
(char) Proteomics experiment. 6 options available:
|
output_dir |
(char) Name for the folder to output the results plots. Default is "qc_basic". |
output_name |
(char) prefix output name (no extension). Default: "qcBasic_evidence" |
isSILAC |
if |
plotINTDIST |
if |
plotREPRO |
if |
plotCORMAT |
if
|
plotINTMISC |
if |
plotPTMSTATS |
IF |
printPDF |
If |
verbose |
(logical) |
Quality control files and plots
artmsQualityControlEvidenceBasic(evidence_file = artms_data_ph_evidence, keys_file = artms_data_ph_keys, prot_exp = "PH", isSILAC = FALSE, plotINTDIST = FALSE, plotREPRO = TRUE, plotCORMAT = FALSE, plotINTMISC = FALSE, plotPTMSTATS = FALSE, printPDF = FALSE, verbose = FALSE) # But we recommend the following test: # 1. Go to a working directory: # setwd("/path/to/your/working/directory/") # 2. Run the following command to print out all the pdf files # artmsQualityControlEvidenceBasic(evidence_file = artms_data_ph_evidence, # keys_file = artms_data_ph_keys, # prot_exp = "PH") # 3. Check your working directory and you should find pdf files with # all the QC plots
artmsQualityControlEvidenceBasic(evidence_file = artms_data_ph_evidence, keys_file = artms_data_ph_keys, prot_exp = "PH", isSILAC = FALSE, plotINTDIST = FALSE, plotREPRO = TRUE, plotCORMAT = FALSE, plotINTMISC = FALSE, plotPTMSTATS = FALSE, printPDF = FALSE, verbose = FALSE) # But we recommend the following test: # 1. Go to a working directory: # setwd("/path/to/your/working/directory/") # 2. Run the following command to print out all the pdf files # artmsQualityControlEvidenceBasic(evidence_file = artms_data_ph_evidence, # keys_file = artms_data_ph_keys, # prot_exp = "PH") # 3. Check your working directory and you should find pdf files with # all the QC plots
Performs quality control based on the information available in
the MaxQuant evidence.txt
file.
artmsQualityControlEvidenceExtended( evidence_file, keys_file, output_dir = "qc_extended", output_name = "qcExtended_evidence", isSILAC = FALSE, plotPSM = TRUE, plotIONS = TRUE, plotTYPE = TRUE, plotPEPTIDES = TRUE, plotPEPTOVERLAP = TRUE, plotPROTEINS = TRUE, plotPROTOVERLAP = TRUE, plotPIO = TRUE, plotCS = TRUE, plotME = TRUE, plotMOCD = TRUE, plotPEPICV = TRUE, plotPEPDETECT = TRUE, plotPROTICV = TRUE, plotPROTDETECT = TRUE, plotIDoverlap = TRUE, plotPCA = TRUE, plotSP = TRUE, printPDF = TRUE, verbose = TRUE )
artmsQualityControlEvidenceExtended( evidence_file, keys_file, output_dir = "qc_extended", output_name = "qcExtended_evidence", isSILAC = FALSE, plotPSM = TRUE, plotIONS = TRUE, plotTYPE = TRUE, plotPEPTIDES = TRUE, plotPEPTOVERLAP = TRUE, plotPROTEINS = TRUE, plotPROTOVERLAP = TRUE, plotPIO = TRUE, plotCS = TRUE, plotME = TRUE, plotMOCD = TRUE, plotPEPICV = TRUE, plotPEPDETECT = TRUE, plotPROTICV = TRUE, plotPROTDETECT = TRUE, plotIDoverlap = TRUE, plotPCA = TRUE, plotSP = TRUE, printPDF = TRUE, verbose = TRUE )
evidence_file |
(char or data.frame) The evidence file path and name, or data.frame |
keys_file |
(char or data.frame) The keys file path and name or data.frame |
output_dir |
(char) Name for the folder to output the results plots. Default is "qc_extended". |
output_name |
(char) prefix output name (no extension). Default: "qcExtended_evidence" |
isSILAC |
if |
plotPSM |
(logical) |
plotIONS |
(logical) |
plotTYPE |
(logical) |
plotPEPTIDES |
(logical) |
plotPEPTOVERLAP |
(logical) |
plotPROTEINS |
(logical) |
plotPROTOVERLAP |
(logical) |
plotPIO |
(logical) |
plotCS |
(logical) |
plotME |
(logical) |
plotMOCD |
(logical) |
plotPEPICV |
(logical) |
plotPEPDETECT |
(logical) |
plotPROTICV |
(logical) |
plotPROTDETECT |
(logical) |
plotIDoverlap |
(logical) |
plotPCA |
(logical) |
plotSP |
(logical) |
printPDF |
If |
verbose |
(logical) |
all the plots are generated by default
A number of QC plots based on the evidence file
# Testing warning if files are not submitted test <- artmsQualityControlEvidenceExtended(evidence_file = NULL, keys_file = NULL)
# Testing warning if files are not submitted test <- artmsQualityControlEvidenceExtended(evidence_file = NULL, keys_file = NULL)
Quality Control analysis of the evidence-like metabolomics dataset
artmsQualityControlMetabolomics( evidence_file, keys_file, met_exp = c("MV"), output_name = "qcPlots_metab", plotINTDIST = FALSE, plotCORMAT = TRUE, plotINTMISC = TRUE, printPDF = TRUE, verbose = TRUE )
artmsQualityControlMetabolomics( evidence_file, keys_file, met_exp = c("MV"), output_name = "qcPlots_metab", plotINTDIST = FALSE, plotCORMAT = TRUE, plotINTMISC = TRUE, printPDF = TRUE, verbose = TRUE )
evidence_file |
(char or data.frame) The evidence file path and name, or data.frame |
keys_file |
(char or data.frame) The keys file path and name or data.frame |
met_exp |
(char) Metabolomics experiment. Only one option available (so far):
|
output_name |
(char) prefix output name (no extension). Default: "qcPlots_metab" |
plotINTDIST |
if |
plotCORMAT |
if
|
plotINTMISC |
if |
printPDF |
If |
verbose |
(logical) |
Quality control files and plots for metabolomics
# Testing that input arguments cannot be null artmsQualityControlMetabolomics(evidence_file = NULL, keys_file = NULL, met_exp = "MV")
# Testing that input arguments cannot be null artmsQualityControlMetabolomics(evidence_file = NULL, keys_file = NULL, met_exp = "MV")
Performs quality control based on the information available in the MaxQuant summary.txt file.
artmsQualityControlSummaryExtended( summary_file, keys_file, output_dir = "qc_summary", output_name = "qcExtended_summary", isFractions = FALSE, plotMS1SCANS = TRUE, plotMS2 = TRUE, plotMSMS = TRUE, plotISOTOPE = TRUE, printPDF = TRUE, verbose = TRUE )
artmsQualityControlSummaryExtended( summary_file, keys_file, output_dir = "qc_summary", output_name = "qcExtended_summary", isFractions = FALSE, plotMS1SCANS = TRUE, plotMS2 = TRUE, plotMSMS = TRUE, plotISOTOPE = TRUE, printPDF = TRUE, verbose = TRUE )
summary_file |
(char or data.frame) The evidence file path and name, or data.frame |
keys_file |
(char or data.frame) The keys file path and name or data.frame |
output_dir |
(char) Name for the folder to output the results plots. Default is "qc_summary". |
output_name |
(char) prefix output name (no extension). Default: "qcExtended_summary" |
isFractions |
(logical) |
plotMS1SCANS |
(logical) |
plotMS2 |
(logical) |
plotMSMS |
(logical) |
plotISOTOPE |
(logical) |
printPDF |
If |
verbose |
(logical) |
A number of plots from the summary file
# Testing warning if files are not submitted test <- artmsQualityControlSummaryExtended(summary_file = NULL, keys_file = NULL)
# Testing warning if files are not submitted test <- artmsQualityControlSummaryExtended(summary_file = NULL, keys_file = NULL)
Relative quantification using MSstats including:
plots
quantifications (log2fc, pvalues, etc)
normalized abundance values
artmsQuantification( yaml_config_file, data_object = FALSE, printPDF = TRUE, printTables = TRUE, display_msstats = FALSE, return_results_object = FALSE, verbose = TRUE )
artmsQuantification( yaml_config_file, data_object = FALSE, printPDF = TRUE, printTables = TRUE, display_msstats = FALSE, return_results_object = FALSE, verbose = TRUE )
yaml_config_file |
(char, required) The yaml file name and location |
data_object |
(logical) flag to indicate whether the configuration file
is a string to a file that should be opened or config object (yaml).
Default is |
printPDF |
(logical) if |
printTables |
(logical) |
display_msstats |
(logical) if |
return_results_object |
(logical) Default is
|
verbose |
(logical) |
The relative quantification of the conditions and comparisons specified in the keys/contrast file resulting from running MSstats, in addition to quality control plots (if selected)
# Recommended # artmsQuantification(yaml_config_file = "your-config-file.yaml") # Example to test this function using the example dataset available in artMS # Step 1: Add evidence, keys, and contrast to configuration object artms_data_ph_config$files$evidence <- artms_data_ph_evidence artms_data_ph_config$files$keys <- artms_data_ph_keys artms_data_ph_config$files$contrasts <- artms_data_ph_contrast # Step 2: Run the quantification step quant_results <- artmsQuantification(yaml_config_file = artms_data_ph_config, data_object = TRUE, display_msstats = FALSE, printPDF = FALSE, printTables = FALSE) # Check the list of data frames "quant_results". Nothing should be printed out.
# Recommended # artmsQuantification(yaml_config_file = "your-config-file.yaml") # Example to test this function using the example dataset available in artMS # Step 1: Add evidence, keys, and contrast to configuration object artms_data_ph_config$files$evidence <- artms_data_ph_evidence artms_data_ph_config$files$keys <- artms_data_ph_keys artms_data_ph_config$files$contrasts <- artms_data_ph_contrast # Step 2: Run the quantification step quant_results <- artmsQuantification(yaml_config_file = artms_data_ph_config, data_object = TRUE, display_msstats = FALSE, printPDF = FALSE, printTables = FALSE) # Check the list of data frames "quant_results". Nothing should be printed out.
Converts the normal MSStats results.txt file into "wide" format where each row represents a unique protein's results, and each column represents the comparison made by MSStats. The fold change and p-value of each comparison will be its own column.
artmsResultsWide( results_msstats, output_file = NULL, select_pvalues = c("adjpvalue", "pvalue"), species, verbose = TRUE )
artmsResultsWide( results_msstats, output_file = NULL, select_pvalues = c("adjpvalue", "pvalue"), species, verbose = TRUE )
results_msstats |
(char) Input file name and location
(MSstats |
output_file |
(char) Output file name and location
(e.g. |
select_pvalues |
(char) Either
|
species |
(char) Specie name for annotation purposes.
Check |
verbose |
(logical) |
(output file tab delimited) reshaped file with unique protein ids and as many columns log2fc and adj.pvalues as comparisons available
ph_results_wide <- artmsResultsWide( results_msstats = artms_data_ph_msstats_results, output_file = NULL, species = "human")
ph_results_wide <- artmsResultsWide( results_msstats = artms_data_ph_msstats_results, output_file = NULL, species = "human")
Converting the evidence file from a SILAC search to a format compatible with MSstats. It basically modifies the Raw.files adding the Heavy and Light label
artmsSILACtoLong(evidence_file, output = NULL, verbose = TRUE)
artmsSILACtoLong(evidence_file, output = NULL, verbose = TRUE)
evidence_file |
(char) Text filepath to the evidence file |
output |
(char) Text filepath of the output name. If NULL it does not write the output |
verbose |
(logical) |
(data.frame) with SILAC data processed for MSstats (and output file)
## Not run: evidence2silac <- artmsSILACtoLong(evidence_file = "silac.evicence.txt", output = "silac-evidence.txt") ## End(Not run)
## Not run: evidence2silac <- artmsSILACtoLong(evidence_file = "silac.evicence.txt", output = "silac-evidence.txt") ## End(Not run)
Outputs the spectral counts from the MaxQuant evidence file.
artmsSpectralCounts( evidence_file, keys_file, output_file = NULL, verbose = TRUE )
artmsSpectralCounts( evidence_file, keys_file, output_file = NULL, verbose = TRUE )
evidence_file |
(char) Maxquant evidence file or data object |
keys_file |
(char) Keys file with the experimental design or data object |
output_file |
(char) Output file name (add |
verbose |
(logical) |
A txt file with biological replicates, protein id, and spectral count columns
summary_spectral_counts <- artmsSpectralCounts( evidence_file = artms_data_ph_evidence, keys_file = artms_data_ph_keys)
summary_spectral_counts <- artmsSpectralCounts( evidence_file = artms_data_ph_evidence, keys_file = artms_data_ph_keys)
It generates a scatter-plot used to quickly identify changes
artmsVolcanoPlot( mss_results, output_name = "volcano_plot.pdf", lfc_upper = 1, lfc_lower = -1, whatPvalue = "adj.pvalue", FDR = 0.05, PDF = TRUE, decimal_threshold = 16, verbose = TRUE )
artmsVolcanoPlot( mss_results, output_name = "volcano_plot.pdf", lfc_upper = 1, lfc_lower = -1, whatPvalue = "adj.pvalue", FDR = 0.05, PDF = TRUE, decimal_threshold = 16, verbose = TRUE )
mss_results |
(data.frame or file) Selected MSstats results |
output_name |
(char) Name for the output file (don't forget the |
lfc_upper |
(numeric) log2fc upper threshold (positive value) |
lfc_lower |
(numeric) log2fc lower threshold (negative value) |
whatPvalue |
(char) |
FDR |
(numeric) False Discovery Rate threshold |
PDF |
(logical) Option to generate pdf format. Default: |
decimal_threshold |
(numeric) Decimal threshold for the pvalue. Default: 16 (10^-16) |
verbose |
(logical) |
(pdf) of a volcano plot
artmsVolcanoPlot(mss_results = artms_data_ph_msstats_results, whatPvalue = "pvalue", PDF = FALSE)
artmsVolcanoPlot(mss_results = artms_data_ph_msstats_results, whatPvalue = "pvalue", PDF = FALSE)
Creates a template file of the artMS configuration file, which
is required to run artmsQuantification
. Check ?artms_config
and the
vignettes to find out more about the details of the structure of the file
and how to fill it up
artmsWriteConfigYamlFile( config_file_name = "artms_config_file.yaml", overwrite = FALSE, verbose = TRUE )
artmsWriteConfigYamlFile( config_file_name = "artms_config_file.yaml", overwrite = FALSE, verbose = TRUE )
config_file_name |
(char) The name for the configuration file. It must
have a |
overwrite |
(logical) Default FALSE |
verbose |
(logical) |
A file (or yaml data object) of the artMS configuration file
config_empty <- artmsWriteConfigYamlFile(config_file_name = NULL)
config_empty <- artmsWriteConfigYamlFile(config_file_name = NULL)