CoDNaS-RNA is an online resource with collections of alternative conformers of RNA molecules.
CoDNaS-RNA was designed to facilitate research on the conformational diversity in the native state of RNAs. CoDNaS-RNA gathers alternative conformations of a RNA in the same entry, along with detailed information about the molecule. It allows for an extensive characterization of the population of conformers in a dynamic equilibrium that characterize the native state. This information may help to better understand biological processes and functional aspects of RNAs by avoiding the 'one structure - one function' approach.
CoDNaS-RNA is not restricted to any particular type of RNA, although non-coding RNAs are largely overrepresented due to the intrinsic bias in the original sources. More details about RNA type composition is provided in the Statistics page.
CoDNaS-RNA considers data on RNA molecules of all types that have a known structure in the Protein Data Bank are mapped from the RCSB PDB. These structures are taken along with related data on RNA sequence, type, source, experimental conditions, literature references, etc. Additional annotations on non-coding RNAs are cross-referenced from RNACentral through the RNA sequence. Information secondary structure, intra-chain and inter-chain contacts in mmCIF files are generated by DSSR.
The sequences of RNAs included in CoDNaS-RNA are grouped by similarity into separate entries. Each entry is a cluster that presents the available conformers of a given RNA that have known structures. Every cluster can thus be seen as representative of the native structural ensemble of a RNA. These conformers have been structurally aligned to quantify the extent of its native conformational diversity. Each entry is extended with information about the RNA taken from external resources.
Each entry groups together all structures of RNAs from the RCSB PDB that are 100% identical in sequence and share 98% of coverage, as determined by CD-HIT. Alternative clustering at 98% identity and 90% coverage is calculated with Blastclust to identify 'gold' clusters that are not modified regardless of the clustering procedure.
We initially accept all available RNA structures in the RCSB PDB whether solved by X-ray Diffraction Crystallography (XRD), Nuclear Magnetic Resonance (NMR) or cryo-Electron Microscopy (cryo-EM). All models in the same mmCIF file are considered separately. Structures which do not achieve a minimum standard of quality are filtered out by imposing a minimum resolution of 3.5Å and a minimum sequence RNA length of 10nt. We take both natural and synthetic RNA structures into account.
The maximum RMSD among any pair of conformers is taken as the largest evidence of the conformational diversity of that particular RNA. All-vs-all structure comparisons are calculated ad hoc with TM-align, which also provides the TM-score metric of similarity. Each entry shows possible sources of variation between this 'maximum pair' of conformers, such as differences in pH, temperature or bound molecules when performing the experiment. These may help to understand the causes and extent of the observed conformational diversity in the RNA molecule.
The superposition of the maximum pair of conformers and the interactive view of RCSB PDB entries are displayed on the website with NGLviewer.
We use RNArtist software to plot 2D secondary structures of RNAs. As an input file for each conformer, we generated Kotlin files (KTS) from the revised dot-bracket notation file (DBN (rev)). Each DBN (rev) corresponds to a Vienna Fasta-like file generated from DBN (orig) which correspond to the original Vienna Fasta-like file generated by DSSR. All of these files are available to the users by downloading a Cluster_ID.tar.gz file or clicking on the download button when navigating the Cluster or viewing a particular pair of conformers.
As mentioned in later question, a revised dot-bracket notation file (DBN (rev)) consists of the original DBN file (from DSSR) that was programmatically revised to remove “&” and replace unpaired dot-bracket characters (i.e. pseudoknots between different chains) with dots “.”. This is required for plotting the single-chain conformers in CoDNaS-RNA, otherwise most important and known plotting software won’t work. In that sense, we chose RNArtist as it generates descriptive and interactive plots. Be aware that although a DBN (orig) may not have “&” or unpair dot-bracket characters, DBN (rev) files are provided anyway in order to inform users that this was taken into account on our revision.
Some sections are missing in the ‘Cluster Details’ page of three clusters (Cluster_46, Cluster_283 and Cluster_890). This is due to unknown errors in the structural alignment step. We're working to fix them as soon as possible.
We provide different ways to download the data. No registration is required.
Several custom and third-party software packages are used to build the database (see About page). Among the most important, BioPython is used to handle sequences while gemmi is the main parser for structures. CD-HIT and Blastclust are used for sequence clustering. Structure alignments and similarity measurements are performed with TM-align. DSSR is used to extract secondary structure information and site-specific contacts from mmCIF files. RNArtist is used to plot 2D secondary structures of RNAs. The website is built on HTML+CSS+JavaScript using React with Material-UI.
We are not aware of other databases on conformational diversity of RNAs. However, you can explore the conformational diversity of protein tertiary structures in CoDNaS and PDBflex, or the conformational diversity of homo-oligomeric proteins in CoDNaS-Q.