Title: | Identification and classification of plant transcription factors |
---|---|
Description: | planttfhunter is used to identify plant transcription factors (TFs) from protein sequence data and classify them into families and subfamilies using the classification scheme implemented in PlantTFDB. TFs are identified using pre-built hidden Markov model profiles for DNA-binding domains. Then, auxiliary and forbidden domains are used with DNA-binding domains to classify TFs into families and subfamilies (when applicable). Currently, TFs can be classified in 58 different TF families/subfamilies. |
Authors: | FabrÃcio Almeida-Silva [aut, cre] , Yves Van de Peer [aut] |
Maintainer: | FabrÃcio Almeida-Silva <[email protected]> |
License: | GPL-3 |
Version: | 1.7.0 |
Built: | 2024-11-18 03:54:24 UTC |
Source: | https://github.com/bioc/planttfhunter |
PFAM domains are assigned to each sequence using HMMER.
annotate_pfam(seq = NULL, evalue = 1e-05)
annotate_pfam(seq = NULL, evalue = 1e-05)
seq |
An AAStringSet object as returned
by |
evalue |
Numeric indicating the E-value threshold for hmmsearch to be used for domains without pre-defined domain cutoffs. Only valid if parameter mode = 'local'. Default: 1e-05. |
A 2-column data frame with the variables Gene and Domain, which contain gene IDs and domain IDs, respectively.
data(gsu) seq <- gsu[1:5] if(hmmer_is_installed()) { annotate_pfam(seq) }
data(gsu) seq <- gsu[1:5] if(hmmer_is_installed()) { annotate_pfam(seq) }
The classification scheme is the same as the one used by PlantTFDB.
data(classification_scheme)
data(classification_scheme)
A data frame with the following variables:
TF family name.
TF subfamily name.
DNA-binding domain
Auxiliary domain
Forbidden domain
Jin, J., Tian, F., Yang, D. C., Meng, Y. Q., Kong, L., Luo, J., & Gao, G. (2016). PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic acids research, gkw982.
data(classification_scheme)
data(classification_scheme)
Identify TFs and classify them in families
classify_tfs(domain_annotation = NULL)
classify_tfs(domain_annotation = NULL)
domain_annotation |
A 2-column data frame with the gene ID in the first column and the domain ID in the second column. |
A 2-column data frame with the variables Gene and Family representing gene ID and TF family, respectively.
data(gsu_annotation) domain_annotation <- gsu_annotation families <- classify_tfs(domain_annotation)
data(gsu_annotation) domain_annotation <- gsu_annotation families <- classify_tfs(domain_annotation)
This function identifies and classifies TFs, and returns TF counts for each family as a SummarizedExperiment object
get_tf_counts(proteomes, species_metadata = NULL)
get_tf_counts(proteomes, species_metadata = NULL)
proteomes |
List of AAStringSet objects |
species_metadata |
(Optional) A data frame containing species names in row names (names must match element names in the proteomes list), and species metadata (e.g., taxonomic information, ecological information) in columns. If NULL, the colData of the SummarizedExperiment object will be empty. |
A SummarizedExperiment object containing transcription factor frequencies per family in each species, as well as species metadata (if species_metadata is not NULL).
data(gsu) set.seed(123) # Pick random subsets of 100 genes to simulate other species proteomes <- list( Gsu1 = gsu[sample(names(gsu), 50, replace = FALSE)], Gsu2 = gsu[sample(names(gsu), 50, replace = FALSE)], Gsu3 = gsu[sample(names(gsu), 50, replace = FALSE)], Gsu4 = gsu[sample(names(gsu), 50, replace = FALSE)] ) # Create species metadata species_metadata <- data.frame( row.names = names(proteomes), Division = "Rhodophyta", Origin = c("US", "Belgium", "China", "Brazil") ) # Get SummarizedExperiment object if(hmmer_is_installed()) { se <- get_tf_counts(proteomes, species_metadata) }
data(gsu) set.seed(123) # Pick random subsets of 100 genes to simulate other species proteomes <- list( Gsu1 = gsu[sample(names(gsu), 50, replace = FALSE)], Gsu2 = gsu[sample(names(gsu), 50, replace = FALSE)], Gsu3 = gsu[sample(names(gsu), 50, replace = FALSE)], Gsu4 = gsu[sample(names(gsu), 50, replace = FALSE)] ) # Create species metadata species_metadata <- data.frame( row.names = names(proteomes), Division = "Rhodophyta", Origin = c("US", "Belgium", "China", "Brazil") ) # Get SummarizedExperiment object if(hmmer_is_installed()) { se <- get_tf_counts(proteomes, species_metadata) }
Data obtained from PLAZA Diatoms. Only genes containing domains used for TF family classification were kept for package size issues.
data(gsu)
data(gsu)
An AAStringSet object as returned
by Biostrings::readAAStringSet()
.
Osuna-Cruz, C. M., Bilcke, G., Vancaester, E., De Decker, S., Bones, A. M., Winge, P., ... & Vandepoele, K. (2020). The Seminavis robusta genome provides insights into the evolutionary adaptations of benthic diatoms. Nature communications, 11(1), 1-13.
data(gsu)
data(gsu)
annotate_pfam()
in local
mode.Domain annotation for the algae species Galdieria sulphuraria
The data set was created using the funcion annotate_pfam()
in local
mode.
data(gsu_annotation)
data(gsu_annotation)
A 2-column data frame with the following variables:
Gene ID
Domain ID or domain name when ID is not available in PFAM
data(gsu_annotation)
data(gsu_annotation)
classify_tfs()
.TFs families of the algae species Galdieria sulphuraria
The data set was created using the funcion classify_tfs()
.
data(gsu_families)
data(gsu_families)
A 2-column data frame with the following variables:
Gene ID
TF family
data(gsu_families)
data(gsu_families)
Check if HMMER is installed
hmmer_is_installed()
hmmer_is_installed()
Logical indicating whether HMMER is installed or not.
hmmer_is_installed()
hmmer_is_installed()
Simulated species were created by sampling 100 genes from the example
data set gsu with after set.seed(123)
.
data(tf_counts)
data(tf_counts)
A SummarizedExperiment with TF frequencies per family in each species in assay and species metadata in colData.
data(tf_counts)
data(tf_counts)