qPCR Master Mix (SYBR-Green)
Databases of genome-wide regulatory module and element predictions
Database Assembly Search
regions Search
region
type Nbr. of
input
species Conserved
motifs Discovery
p-value
threshold Ensembl
compatibility Release
date
Human 9 NCBI v36b 18.7k promoter 41 236k 0.01 Build 38-49 26 Jul. 2007
Mouse 4 NCBI m37 17.5k promoter 38 223k 0.1 Build 47-49 26 Sep. 2007
Mouse 3.1 NCBI m35 17.5k promoter 38 223k 0.1 Build 38 18 Apr. 2007
Rat 1.1 RGSC v3.1 6.7k promoter 28 116k 0.25 n/a 12 Feb. 2006
C.elegans 4 WormBase WS170 3.8k promoter 8 158k 1.0 Build 44-46 18 Jul. 2008
Human Stat1 ChIP-seq peaks 1 NCBI v35 226 ChIP-seq 23 ~6k 1.0 n/a 03 Apr. 2007
Overview
The cisRED database holds conserved sequence motifs identified by genome scale motif discovery, similarity, clustering, co-occurrence and coexpression calculations. Sequence inputs include low-coverage genome sequence data and ENCODE data. A Nucleic Acids Research article describes the system architecture; please use this publication to cite cisRED. PubMed publications that cite cisRED are listed here.
cisRED makes three levels of information available for regulatory elements:
'Atomic' motifs: These are conserved, over-represented, sequence sets, typically 6 to 12 bp long, that have been discovered in a 'search region' sequence set.
Groups of 'similar' motifs: These are identified either by a) annotating motifs with site sequences from TRANSFAC, JASPAR and ORegAnno databases (annotation-based groups), or by b) 'de novo' hierarchical clustering with the OPTICS algorithm ('de novo' groups).
Patterns of motif group labels that co-occur in many search regions: These putative regulatory modules are ranked using genome-scale statistical and functional properties. Motifs in highly ranked patterns are likely the most reliable predictions.
In promoter-based cisRED databases, sequence search regions for motif discovery extend from 1.5 Kb upstream to 200b downstream of a transcription start site, net of most types of repeats and of coding exons. Many transcription factor binding sites are located in such regions. For each target gene's search region, we use a base set of probabilistic ab initio discovery tools, in parallel, to find over-represented atomic motifs. Discovery methods use comparative genomics with over 40 vertebrate input genomes.
In ChIP-seq-based cisRED databases, sequence search regions for motif discovery correspond to significant peaks that represent genome-wide sites of protein-DNA binding. Because such peaks occur in a wide range of genic and intergenic locations, ChIP-seq and promoter-based databases are complementary. Currently, motif discovery for ChIP-seq data uses scan-based approaches that make more explicit use of sets of sequences known to be functional transcription factor binding sites, and that consider a wide range of levels of conservation. For the human STAT1 ChIP-seq database search regions in the target species (human) was selected +/- 300 bp around the ChIP-seq peak maximum. Repeats and coding regions were masked. Multiple sequence alignment were used to assemble orthologous input sequences from other species.
You can access cisRED's data in three ways:
view predicted regulatory elements directly in cisRED's web user interface. From this interface, motifs can be viewed 'live' in the UCSC or Ensembl genome browsers.
download the data and SQL structure for each species' MySQL 4.x database, with a schema diagram and example SQL queries, from the Databases and Methods tab.
query the databases directly with SQL at db.cisred.org. Queries can be driven from command line or graphical clients (e.g. the MySQL QueryBrowser), or programmatically from Perl, Python, Java, Ruby, etc. The username is 'anonymous' and the password should be left blank.
cisRED human motifs are available as a native data type at the Ensembl genome browser.
cisRED is an ongoing project. Updates will be released frequently.
Usage Notes
Filters and Cookies
cisRED manages your 'Filter' settings via a browser 'cookie'.
You must allow your web browser to accept cookies from cisred.org for your filter settings to take effect.
News
C. elegans v4 database published
January 16, 2009
The C. elegans cisRED database has been published in Nucleic Acids Research.
C. elegans v4 tables are now public
August 25, 2008
The new C. elegans database has been added to our public MySQL server.
C. elegans v4 database
July 18, 2008
This version of the C. elegans cisRED database features 8 nematode genomes and 3847 highly conserved transcripts.
New mouse v4 database
September 26, 2007
The v3.1 motif coordinates were 'lifted' to the NCBI m37 (mm9) assembly. The v4 motifs are compatible with (and will be available at) Ensembl 47.
New human v9 database
July 26, 2007
The new human database offers regulatory modules predicted with sequence data from 41 vertebrate species.
New browse/search options for the mouse database
June 20, 2007
You can now browse/search modules using GO terms, and by chromosome.
Browse by location
June 20, 2007
You can now find all search regions that overlap a genomic region defined by coordinates.
Search region masks
June 14, 2007
Our UCSC genome browser images now show the search region masks.
Improved search tools
June 13, 2007
You can now use IUPAC symbols in a motif search, or can search by cisRED group ID or cisRED module ID.
ORegAnno is an open access, LGPL/open source database that holds annotations for regulatory elements and polymorphisms. Its data are submitted and reviewed by a global user community. cisRED uses ORegAnno data for optimizing and assessing predictive performance, and for annotating computationally predicted DNA sequence motifs.