Title: | Different test statistics based on co-citation. |
---|---|
Description: | A collection of software tools for dealing with co-citation data. |
Authors: | B. Ding and R. Gentleman |
Maintainer: | Bioconductor Package Maintainer <[email protected]> |
License: | CPL |
Version: | 1.79.0 |
Built: | 2024-12-29 04:51:15 UTC |
Source: | https://github.com/bioc/CoCiteStats |
When two objects are related through a bipartite graph it is sometimes appropriate to carry out special adjustments. One of the adjustments is called actor size adjustment. In this case the counts are adjusted according to how often the objects are referenced.
actorAdjTable(twT, eps = 1e-08)
actorAdjTable(twT, eps = 1e-08)
twT |
A two way table as produced by |
eps |
A small quantity used to assess approximate equality. |
When testing for associations between entities, the social networks literature has developed a number of tools to help measure such associations. We can think of genes (actors) as being joined by citation in papers (events) and having two genes cited in the same paper (equivalent to two actors attending the same event) suggests that they are related to each other. However, some genes are cited in many papers and so we might want to discount the level of importance, as compared to genes that are cited less often. And additionally, some papers cite very many genes, and hence typically say less about them than a paper that cites rather fewer genes.
An adjusted two way table, with elements named u11
,
u12
, u21
and u22
.
R. Gentleman
Testing Gene Associations Using Co-citation, by B. Ding and R. Gentleman. Bioconductor Technical Report, 2004
tw1 = twowayTable("10", "100", FALSE) actorAdjTable(tw1)
tw1 = twowayTable("10", "100", FALSE) actorAdjTable(tw1)
Computes gene gene statistics.
gene.gene.statistic(g1, g2, paperLens = paperLen())
gene.gene.statistic(g1, g2, paperLens = paperLen())
g1 |
The Entrez Gene identifier for one of the genes. |
g2 |
The Entrez Gene identifier for the other gene. |
paperLens |
A vector with the number of citations for each paper. |
For the two genes identified by their Entrez IDs a number
of two-way table statistics, i.e. those computed via twTStats
,
are returned, as are their gene and paper size adjusted variants.
A list with entries
original |
The output of |
gs |
The output of |
ps |
The output of |
both |
The output of |
B. Ding and R. Gentleman
Testing Gene Associations Using Co-citation, by B. Ding and R. Gentleman. Bioconductor Technical Report, 2004
g1 = "10" #Entrez ID for gene 1 g2 = "101" #Entrez ID for gene 2 pLens = paperLen() gene.gene.statistic(g1, g2, pLens)
g1 = "10" #Entrez ID for gene 1 g2 = "101" #Entrez ID for gene 2 pLens = paperLen() gene.gene.statistic(g1, g2, pLens)
This function calculates Concordance, Jaccard's index and
Hubert's with
no adjustment, adjusting for paper size (PS), adjusting for
gene size (GS) and both, to evaluate the significance of
co-citation of a gene of interest and a gene list
gene.geneslist.sig(gene, geneslist, paperLens = paperLen(), n.resamp=100)
gene.geneslist.sig(gene, geneslist, paperLens = paperLen(), n.resamp=100)
gene |
The Entrez Gene ID for the gene of interest. |
geneslist |
The list of Entrez Gene IDs for genes with which the co-citation of the gene of interest is to be evaluated. |
paperLens |
The sizes of the PubMed papers for consideration. |
n.resamp |
Number of resampling for generating empirical p-values. |
Statistics and resampling p-values for all 3 two-way tables along with the
4 adjustments for gene
and geneslist
based on
n.resamp
resamplings.
Beiying Ding
Testing Gene Associations Using Co-citation, by B. Ding and R. Gentleman. Bioconductor Technical Report, 2004
actorAdjTable
, paperLen
,
twTStats
, twowayTable
gene <- "705" geneslist <- "7216" gene.geneslist.sig(gene, geneslist, n.resamp=50)
gene <- "705" geneslist <- "7216" gene.geneslist.sig(gene, geneslist, n.resamp=50)
Whether or not a gene has an association with another gene, or a set of genes is measured using co-citation in PubMed as a basis for measuring that association.
gene.geneslist.statistic(gene, geneslist, paperLens = paperLen())
gene.geneslist.statistic(gene, geneslist, paperLens = paperLen())
gene |
The Entrez Gene ID for the gene of interest. |
geneslist |
A vector of Entrez Gene ID for the set of genes. |
paperLens |
A vector containing the number of genes cited by each paper. |
To be filled in later.
R. Gentleman
Testing Gene Associations Using Co-citation, by B. Ding and R. Gentleman. Bioconductor Technical Report, 2004
twowayTable
, link{gene.gene.statistic}
g1 = "101" gl = c("10014", "10015", "10016", "10017", "10018") pL = paperLen() s1 = gene.geneslist.statistic(g1, gl, pL) s1
g1 = "101" gl = c("10014", "10015", "10016", "10017", "10018") pL = paperLen() s1 = gene.geneslist.statistic(g1, gl, pL) s1
The set of papers that cite the input Entrez Gene identifiers are found, and for each of these the number of genes cited in that paper is computed and returned.
paperLen(x)
paperLen(x)
x |
A vector of Entrez Gene identifiers. |
This function first finds the set of unique PMIDs associated with the input set of Entrez Gene IDS. Then for each PMID it finds the number of Entrez Gene identifiers associated with that paper. The function uses different sets of variable mappings from the org.Hs.eg.db package.
If x
is missing then all Entrez gene identifiers in the
org.Hs.egPMID are used.
counts |
For each paper the number of Entrez Gene identifiers referred to. |
papers |
A list of the same length as |
R. Gentleman
ans = paperLen(c("10", "1001")) ans$counts ans$papers
ans = paperLen(c("10", "1001")) ans$counts ans$papers
This function computes a two way table for comparing co-citation, in PubMed for the two input genes. The values in the table can be adjusted according to either the paper size or the gene size.
twowayTable(g1, g2, weights = TRUE, paperLens=paperLen())
twowayTable(g1, g2, weights = TRUE, paperLens=paperLen())
g1 |
The EntrezGene identifier for gene 1. |
g2 |
The EntrezGene identifier for gene 2. |
weights |
|
paperLens |
A vector containing the number of genes each paper refers to, or cites. |
To determine the association between two genes one can use
co-citation in the medical literature. When weights
is
FALSE
this function computes the
number of papers that cite only gene 1, only gene 2, both and
neither.
By default, we use the org.Hs.eg.db package to define the set of
papers that are used in the computations. For other organisms, or for more
restricted sets of papers the user will need to supply the
vector paperLens
explicitly.
One can consider papers which cite many genes to be less informative than
those that cite only a few genes. If weights
is TRUE
(the default) then papers are weighted by the inverse of the number
of citations.
A vector of length four, with entries n11
, n12
,
n21
and n22
. These correspond to the number of papers
that cite both genes, the number that cite only gene 1, the number
that cite only gene 2, and the total number of papers minus those
counted in n11
, n21
, n12
, or in the default case
the weighted versions of these quantities.
R. Gentleman
pL = paperLen() twowayTable("10", "100", paperLens=pL) twowayTable("10", "100", FALSE, paperLens=pL)
pL = paperLen() twowayTable("10", "100", paperLens=pL) twowayTable("10", "100", FALSE, paperLens=pL)
For two way tables based on co-citations four different test statistics are reported, the odds ratio, the Concordance, the Jaccard index and Hubert's Gamma.
twTStats(twT)
twTStats(twT)
twT |
A two way table, as produced by |
The entries in the presumed 2 by 2 table are labeled n11, n12, n21, n22, corresponding to the entries in the first row first column, first row second column etc. The odds ratio is the product of n11 and n22 divided by the product of n12 and n21. The Conordance is simply the n11 entry. The Jaccard index is the n11 entry divided by the sum of n11, n12, and n21. Hubert's Gamma is slightly more complicated.
Concordance |
The concordance statistic. |
Jaccard |
The Jaccard index. |
Hubert |
Hubert's Gamma |
OddsRatio |
The odds ratio. |
R. Gentleman
Testing Gene Associations Using Co-citation, by B. Ding and R. Gentleman. Bioconductor Technical Report, 2004
tw1 = twowayTable("10", "101", FALSE) twTStats(tw1)
tw1 = twowayTable("10", "101", FALSE) twTStats(tw1)