molecules
  • About
    CoDNaS-RNA: Conformational diversity of RNAs

    CoDNaS-RNA is a database of Conformational Diversity in the Native State of RNA molecules. Each entry in CoDNaS-RNA compiles known structures of RNAs with the same sequence, as determined in separate experiments and possibly under different conditions. Consequently, these conformers can be considered as alternative instances of the RNA structure in its native ensemble. Structural information is obtained directly from the worldwide Protein Data Bank (wwPDB) as provided by different structure determination techniques. As a result, while CoDNaS-RNA contains data on mRNAs, it is inherently and inevitably biased towards functional, non-coding RNA types such as rRNA, tRNA, ribozymes and others. The extent of observed conformational diversity of each RNA is mainly assessed by the maximum RMSD among any given pair of its conformers. TM-score, secondary structure information and intra- and inter-chain contacts (both with polynucleotides and proteins) provide extended information for comparison. CoDNaS-RNA also showcases possible associations between the diversity in the native ensemble and physicochemical, biological or functional modulators such as pH or temperature, binding to ligands or ions and source of the molecule. External data about RNAs is cross-referenced to facilitate further exploration of relevant features.

  • Development of CoDNaS-RNA

    Experimentally-solved structures of RNAs are retrieved from the RCSB Protein Data Bank in mmCIF format, along with all available data about them as a tabular report. Standard and external python libraries (such as BioPython and gemmi) are used to parse the sequence and structure data from these files.

    Native structure ensembles are collected by grouping identical sequences together using CD-HIT clustering, further validated with Blastclust. All possible pairwise structural comparisons of conformers within the resulting clusters are performed with TM-align. Distance metrics are taken from the output, among which the C3'-RMSD values are used to identify the pair of conformers that provide the maximum degree of conformational diversity for the molecule. Additional data is presented for this pair of structures to highlight changes in experimental conditions that may account for the observed diversity. DSSR is applied to parse the intra- and inter-chain contacts and secondary structure information of these conformers. 2D secondary structure plots are done with RNArtist. The superposition of the maximum pair of conformers is displayed on the website with NGLviewer. The API service and public postgres database of RNAcentral are queried to map the sequence of an RNA to relevant annotations such as RNA type, source (organism), extended descriptions, etc. RNAcentral also provides links to 40+ databases of RNA data.

    The first version of CoDNaS-RNA (v1.0.0) was released on July 1st, 2020 with 1000 clusters of RNA structures, comprising an average of 9.5 conformers per cluster.

    CoDNaS-RNA releases information

    • Download:
    • Databases:
      • RCSB PDB: retrieved on 2020-02-13. All entries with at least one RNA chain are initially considered.
      • RNAcentral: release 15 from 2020-05-21. Contains ~16M sequences, ~46M mapped entries, with links to 40+ databases.
    • Software:
      • standalone:
        • BlastClust: v2.2.26
        • CD-HIT: v4.8.1
        • DSSR: v1.9.10
        • Gemmi: v0.3.6
        • Python: v3.6.9
        • RNArtistCore: v0.2.7
        • TM-align: v20190425
      • python libraries:
        • biopython: 1.76
        • gemmi: 0.3.6
        • matplotlib: 3.2.1
        • numpy: 1.18.2
        • pandas: 1.0.3
        • scipy: 1.5.4
        • TM-seaborn: 0.11.1