CoGTEx

Table plots preferences

Database for figures: Data version:

Associations list preferences for selected gene

Number of top associations to show:
Database to order top associations:

Downloads

geneinfo.txt: tab-separated file with general information for all genes in the database (same as shown in the "One gene" tab in this page).
geneinfoHeaders.txt: headers for the previous file.

Bulk associations/co-expression

Each of the following files contain a R object with all associations calculated for all pairs of genes in the database. These are useful for custom analysis across many genes. If you only require associations between a single gene and others, you may download this information directly from the page of the gene that you desire without having to operate the raw database.

Version	Database	File size	Link
TPM	Pearson (minimum across 20)	662M	TPM_pearson_min.RData
TPM	Spearman (minimum across 20)	635M	TPM_spearman_min.RData
TPM	G-statistic (minimum across 20)	1.4G	TPM_G_min.RData
Z-score	Pearson (minimum across 20)	369M	Z-score_pearson_min.RData
Z-score	Spearman (minimum across 20)	370M	Z-score_spearman_min.RData
Z-score	G-statistic (minimum across 20)	1.1G	Z-score_G_min.RData

Databases are represented as the upper triangle of the gene association/co-expression matrix "flattened" into a vector in order to avoid saving redundant associations and the diagonal elements. To further optimize space, the vectors are of integer type and only save two positions of the decimal part of the estimates. We provide the following R function to programatically extract associations between gene idx and all n genes in the database:

                    #' @param idx Index of gene in the database 
                    #' @param db CoGTEx raw associations/co-expression vector
                    #' @param n Number of genes in the database
                    #' @return Vector of associations/co-expression estimates between idx and all genes in the database
                    getIndexPairs <- function(idx, db, n) {

                        pidx <- idx - 1
                        if (idx > 1) {

                            i <- j <- 1:n
                            i[idx:n] <- j[1:pidx] <- idx

                            k <- i - 1
                            pairIdxs <- (k * n) - ( (k * (k + 1)) / 2 ) + (j - i)
                            pairIdxs[idx] <- 0L

                        } else {

                            pairIdxs <- 1:(n - 1)

                        }

                        r <- db[pairIdxs]
                        return(append(r, 0L, after = pidx)) # own association is set to 0

                    }

The following R script script is an example of usage:

                    # load the downloaded data
                    source("getIndexPairs.r")
                    x <- setNames(
                        read.csv("geneinfo.txt", sep = "\t", header = FALSE, stringsAsFactors = FALSE),
                        readLines("geneinfoHeaders.txt")
                    )
                    n <- nrow(x)
                    print(load("TPM_pearson_min.RData"))
                    # m

                    # get the index in the database of genes of interest
                    query <- c("GAPDH", "TP53", "GH1")
                    cogtexIdxs <- match(query, x$'Gene Symbol')
                    estimates <- lapply(cogtexIdxs, getIndexPairs, m, n)
                    estimates <- lapply(estimates, '/', 100) # divide over 100 to turn integers to real values

                    # returned vector elements are always in the same order (the database order)
                    estimates <- setNames(lapply(estimates, setNames, x$'Gene Symbol'), query)
                    lapply(estimates, head, 5) # each vector in the list is of length n
                    # $GAPDH
                    #        WASH7P         OR4F5 RP11-34P13.15 RP11-34P13.16 RP11-34P13.14 
                    #         -0.18          0.32         -0.26         -0.31         -0.01 

                    # $TP53
                    #        WASH7P         OR4F5 RP11-34P13.15 RP11-34P13.16 RP11-34P13.14 
                    #          0.20         -0.30          0.29          0.35          0.20 

                    # $GH1
                    #        WASH7P         OR4F5 RP11-34P13.15 RP11-34P13.16 RP11-34P13.14 
                    #          0.03         -0.07          0.04          0.03          0.00

CoGTEx: a database of system-level human gene expression associations

Table plots preferences

Associations list preferences for selected gene

External links for selected gene

Build a network heatmap of co-expression with a list of whitespace separated gene symbols and/or ensembl IDs

Downloads

Bulk associations/co-expression

Tutorial

Frequently Asked Questions