Q.1: What is CoDNaS-RNA?

CoDNaS-RNA is an online resource with collections of alternative conformers of RNA molecules.

Q.2: What is the purpose of CoDNaS-RNA?

CoDNaS-RNA was designed to facilitate research on the conformational diversity in the native state of RNAs. CoDNaS-RNA gathers alternative conformations of a RNA in the same entry, along with detailed information about the molecule. It allows for an extensive characterization of the population of conformers in a dynamic equilibrium that characterize the native state. This information may help to better understand biological processes and functional aspects of RNAs by avoiding the 'one structure - one function' approach.

Q.3: What types of RNAs are included?

CoDNaS-RNA is not restricted to any particular type of RNA, although non-coding RNAs are largely overrepresented due to the intrinsic bias in the original sources. More details about RNA type composition is provided in the Statistics page.

Q.4: Where does the data from?

CoDNaS-RNA considers data on RNA molecules of all types that have a known structure in the Protein Data Bank are mapped from the RCSB PDB. These structures are taken along with related data on RNA sequence, type, source, experimental conditions, literature references, etc. Additional annotations on non-coding RNAs are cross-referenced from RNACentral through the RNA sequence. Information secondary structure, intra-chain and inter-chain contacts in mmCIF files are generated by DSSR.

Q.5: How is CoDNaS-RNA organized?

The sequences of RNAs included in CoDNaS-RNA are grouped by similarity into separate entries. Each entry is a cluster that presents the available conformers of a given RNA that have known structures. Every cluster can thus be seen as representative of the native structural ensemble of a RNA. These conformers have been structurally aligned to quantify the extent of its native conformational diversity. Each entry is extended with information about the RNA taken from external resources.

Q.6: What is the relationship between members of an any entry in CoDNaS-RNA?

Each entry groups together all structures of RNAs from the RCSB PDB that are 100% identical in sequence and share 98% of coverage, as determined by CD-HIT. Alternative clustering at 98% identity and 90% coverage is calculated with Blastclust to identify 'gold' clusters that are not modified regardless of the clustering procedure.

Q.7: Are all experimentally-solved structures available considered?

We initially accept all available RNA structures in the RCSB PDB whether solved by X-ray Diffraction Crystallography (XRD), Nuclear Magnetic Resonance (NMR) or cryo-Electron Microscopy (cryo-EM). All models in the same mmCIF file are considered separately. Structures which do not achieve a minimum standard of quality are filtered out by imposing a minimum resolution of 3.5Å and a minimum sequence RNA length of 10nt. We take both natural and synthetic RNA structures into account.

Q.8: How is conformational diversity measured?

The maximum RMSD among any pair of conformers is taken as the largest evidence of the conformational diversity of that particular RNA. All-vs-all structure comparisons are calculated ad hoc with TM-align, which also provides the TM-score metric of similarity. Each entry shows possible sources of variation between this 'maximum pair' of conformers, such as differences in pH, temperature or bound molecules when performing the experiment. These may help to understand the causes and extent of the observed conformational diversity in the RNA molecule.

Q.9: How are 3D views implemented?

The superposition of the maximum pair of conformers and the interactive view of RCSB PDB entries are displayed on the website with NGLviewer.

Q.10: How are 2D secondary structure plots generated?

We use RNArtist software to plot 2D secondary structures of RNAs. As an input file for each conformer, we generated Kotlin files (KTS) from the revised dot-bracket notation file (DBN (rev)). Each DBN (rev) corresponds to a Vienna Fasta-like file generated from DBN (orig) which correspond to the original Vienna Fasta-like file generated by DSSR. All of these files are available to the users by downloading a Cluster_ID.tar.gz file or clicking on the download button when navigating the Cluster or viewing a particular pair of conformers.

Q.11: How do you 'revise' the revised dot-bracket notation files, DBN (rev)?

As mentioned in later question, a revised dot-bracket notation file (DBN (rev)) consists of the original DBN file (from DSSR) that was programmatically revised to remove “&” and replace unpaired dot-bracket characters (i.e. pseudoknots between different chains) with dots “.”. This is required for plotting the single-chain conformers in CoDNaS-RNA, otherwise most important and known plotting software won’t work. In that sense, we chose RNArtist as it generates descriptive and interactive plots. Be aware that although a DBN (orig) may not have “&” or unpair dot-bracket characters, DBN (rev) files are provided anyway in order to inform users that this was taken into account on our revision.

Q.12: Why are certain sections missing in some 'Cluster Details' pages?

Some sections are missing in the ‘Cluster Details’ page of three clusters (Cluster_46, Cluster_283 and Cluster_890). This is due to unknown errors in the structural alignment step. We're working to fix them as soon as possible.

Q.13: Is CoDNaS-RNA data available?

We provide different ways to download the data. No registration is required.

  • Users can download specific data at cluster level in the Search Results page by selecting one or more clusters and clicking the Download button on the top-right of the table. This data can also be downloaded from the button on the top of each Cluster Details page. All data available for the cluster is provided.
  • Users can retrieve data from each section in the Cluster Details and Pairs Details view with the dedicated buttons on the top-right of each section. All data from that section available for the cluster is provided.
  • Users can download all data from the current release of the database (see About page) as a collection of tab-separated tables packed in a compressed tar.gz file. Only tabular data is provided.

Q.14: How is the CoDNaS-RNA website implemented?

Several custom and third-party software packages are used to build the database (see About page). Among the most important, BioPython is used to handle sequences while gemmi is the main parser for structures. CD-HIT and Blastclust are used for sequence clustering. Structure alignments and similarity measurements are performed with TM-align. DSSR is used to extract secondary structure information and site-specific contacts from mmCIF files. RNArtist is used to plot 2D secondary structures of RNAs. The website is built on HTML+CSS+JavaScript using React with Material-UI.

Q.15: Are there other similar resources to study conformational diversity?

We are not aware of other databases on conformational diversity of RNAs. However, you can explore the conformational diversity of protein tertiary structures in CoDNaS and PDBflex, or the conformational diversity of homo-oligomeric proteins in CoDNaS-Q.