Syntenic Blocks in ancestral species

The Genomicus browser displays (when possible) the predicted order of genes in ancestral species. The method used to predict this order is briefly described here, and in more details in a poster presented at the Cold Spring Harbor meeting on Genome Informatics in october 2009. The method is described in full in a manuscript in preparation.

  1. A pairwise comparison between ALL available species is performed to identify pairwise synteny blocs. Two consecutive genes A1 and B1 in species 1 will belong to a syntenic block with their respective orthologs A2 and B2 in species 2 if A2 and B2 are also consecutive and in the same respective orientation as A1 and B1. This definition is applied strictly for any number of consecutive genes.
  2. All pairwise syntenic blocs are compared and when two such blocks overlap without any inconsistencies, the two are merged into a larger block.
  3. Merged blocks represent the ancestral gene order in the common ancestor of those extant species that contributed pairwise syntenic blocs.

Because the definition of pairwise syntenic blocks is very strict, it is assumed that this order reflects accurately the order and orientation of genes in their last common ancestor. Merging pairwise syntenic blocks solves the problem of gene losses or duplications in terminal branches of the tree that disrupt the above definition.

Conserved Non-coding Elements (CNEs)

CNEs were computed from multiple alignments between 46 vertebrate genomes projected on the human genome, generated using multiz and other tools by the UCSC and Penn State Bioinformatics groups, and made available on the UCSC web site.

The current algorithm scans the alignment and looks for conserved regions of a minimal length (10bp) and identity (90%) and extends them by accepting up to 3 non-conserved columns on each side (less than 88% of identity). This algorithm does not require a fixed set of key species in the alignment, but instead a minimal number – eight – of them. The displayed CNEs are filtered on a minimal size (20bp). CNEs are excluded from regions overlapping protein coding sequences in all of the species considered. By convention, intronic CNEs are displayed on the right-hand side of the gene in which they are included.

A "mouse-over" the CNE may highlight a neighbouring gene. In these cases, the CNE is a predicted regulatory enhancer, and the highlighted gene is a predicted target of this enhancers (M. Naville et al. submitted). Only CNEs showing a high linkage score (score > 0.9) to their target human gene are displayed in Genomicus.