Each of the following files contain a R object with all associations calculated for all pairs of genes in the database. These are useful for custom analysis across many genes. If you only require associations between a single gene and others, you may download this information directly from the page of the gene that you desire without having to operate the raw database.
Version | Database | File size | Link |
---|---|---|---|
TPM | Pearson (minimum across 20) | 662M | TPM_pearson_min.RData |
TPM | Spearman (minimum across 20) | 635M | TPM_spearman_min.RData |
TPM | G-statistic (minimum across 20) | 1.4G | TPM_G_min.RData |
Z-score | Pearson (minimum across 20) | 369M | Z-score_pearson_min.RData |
Z-score | Spearman (minimum across 20) | 370M | Z-score_spearman_min.RData |
Z-score | G-statistic (minimum across 20) | 1.1G | Z-score_G_min.RData |
Databases are represented as the upper triangle of the gene
association/co-expression matrix "flattened" into a vector
in order to avoid saving redundant associations and the diagonal elements.
To further optimize space, the vectors are of integer type and only
save two positions of the decimal part of the estimates. We provide the following
R function
to programatically extract associations between gene idx
and all n
genes in the database:
#' @param idx Index of gene in the database #' @param db CoGTEx raw associations/co-expression vector #' @param n Number of genes in the database #' @return Vector of associations/co-expression estimates between idx and all genes in the database getIndexPairs <- function(idx, db, n) { pidx <- idx - 1 if (idx > 1) { i <- j <- 1:n i[idx:n] <- j[1:pidx] <- idx k <- i - 1 pairIdxs <- (k * n) - ( (k * (k + 1)) / 2 ) + (j - i) pairIdxs[idx] <- 0L } else { pairIdxs <- 1:(n - 1) } r <- db[pairIdxs] return(append(r, 0L, after = pidx)) # own association is set to 0 }
The following R script script is an example of usage:
# load the downloaded data source("getIndexPairs.r") x <- setNames( read.csv("geneinfo.txt", sep = "\t", header = FALSE, stringsAsFactors = FALSE), readLines("geneinfoHeaders.txt") ) n <- nrow(x) print(load("TPM_pearson_min.RData")) # m # get the index in the database of genes of interest query <- c("GAPDH", "TP53", "GH1") cogtexIdxs <- match(query, x$'Gene Symbol') estimates <- lapply(cogtexIdxs, getIndexPairs, m, n) estimates <- lapply(estimates, '/', 100) # divide over 100 to turn integers to real values # returned vector elements are always in the same order (the database order) estimates <- setNames(lapply(estimates, setNames, x$'Gene Symbol'), query) lapply(estimates, head, 5) # each vector in the list is of length n # $GAPDH # WASH7P OR4F5 RP11-34P13.15 RP11-34P13.16 RP11-34P13.14 # -0.18 0.32 -0.26 -0.31 -0.01 # $TP53 # WASH7P OR4F5 RP11-34P13.15 RP11-34P13.16 RP11-34P13.14 # 0.20 -0.30 0.29 0.35 0.20 # $GH1 # WASH7P OR4F5 RP11-34P13.15 RP11-34P13.16 RP11-34P13.14 # 0.03 -0.07 0.04 0.03 0.00
This section is under construction.
For requests/changes please write to vtrevino at tec.mx
1) Search/browse for genes in the "One-Gene" Tab and "click" it.
2) Use "+" to see the overall gene expression along tissues.
3) Use the buttons appearing for the clicked gene to open information links or the CoGTEx page of that gene.