CoDNaS-RNA allows several ways to perform searches. A quick option on the Home page is by clicking the 'Browse clusters list' green button on the left (red arrow in figure below), which allows exploration of all entries in CoDNaS-RNA. A more specific retrieval of entries can be done with a basic search box on the right of the Home page (blue arrow in figure below). Users should select a field and enter a keyword before clicking on the 'Search' green button. The third option is a dedicated ‘Advanced Search’ page that can be accessed from the navigation bar at the top of any page (yellow arrow in figure below).
Users can easily access a list of all clusters in CoDNaS-RNA with just one click on the 'Browse clusters list' green button on the Home page. It shows the user a table listing one cluster entry per row, as shown on the Search Results section.
Users can search by one of different fields, selected from the drop-down menu in the Home page:
'Cluster ID' allows selection of a specific cluster. Cluster IDs are arbitrarily assigned, so this option is most useful to retrieve a cluster of interest identified in previous visits.
The 'URS ID' field allows searches of clusters with a specific Unique RNAcentral Sequence IDentifier, as taken from RNAcentral.
'PDB ID' is an important case-insensitive field, allowing searches with any identifier from wwwPDB or using more specific alternatives according to these formats: 'PDB', 'PDB_CHAIN', 'PDB_MODEL_CHAIN'.
The 'RNA type' field is useful if the user has a preference for a certain type of RNA. Types are taken from the INSDC definition.
The remaining options ('RNA name', 'Taxon ID', 'Organism' and 'Title') work in a similar way to the PDB ID field.
Users can search by two ways (indicated with green arrows in the image below):
When a search is successful, all matching clusters will be summarized as different rows in a table. By default (but customizable) and for simplicity reasons, only five entries are listed simultaneously, with arrows on the bottom to scroll through the table. Each cluster is presented along with basic information on its ‘RNA Type’ and the maximum pairwise RMSD ('RMSD Max') and maximum TM-score ('TM-score Max') between its conformers. A thumbnail view of a representative tertiary structure of the RNA is depicted.
On the last column ('Gold') a golden star is shown for robust clusters, which grouped the exact same conformers when sequence clustering was analyzed by bothCD-HitandBlastcust.As cluster results show up, a quick string-based filter is made available for each column in the Search Results table. Users can sort the columns too by clicking any of the column headers.
After a cluster row is selected, it expands to show a bigger representative view with some useful cluster information (indicated with a blue arrow on the next figure). This allows users to inspect the number of conformers in the cluster, RNAcentral-mapped information (URS IDs, organisms) and the range of structurally-aligned sequence identity percentages.
The primary details on a given cluster can be explored by clicking on its Cluster ID (indicated with a red arrow on the next figure), which takes the user to the‘Cluster Details’page. Once one or more clusters are selected (with the checkboxes on the left), the complete available information for them can be downloaded as a tar.gz file (per selected cluster) by clicking the download icon button on the right side (yellow arrow in figure). Be aware that some clusters could be very big in size (e.g., Cluster_0 is 25 GB). Place the mouse cursor over the Cluster ID to see the size of data available for the single cluster, or select one or more clusters and find the total size to be downloaded by hovering on the download icon.
If the user searches for a valid PDB ID or URS ID code that has not been included in any CoDNaS-RNA cluster, the ‘Search Results’ page will display one of two different messages:
1) 'Sorry! No conformational diversity data available inRCSB PDBfor your search.' Indicates that no cluster could be built for the query term because there are less than two conformers available.
2) 'Sorry! Results not found!'. Indicates that the query has not been incorporated in the latest update of CoDNaS-RNA. This would happen if the query did not pass our quality filters or it was released after our latest update.
Other helpers messages are present too along the search options. Users could experience a ‘typo’ message for Cluster ID, URS ID, Taxon ID or all fields when an invalid or an empty keyword is searched:
The‘Advanced Search’page also includes the option 'Browse clusters list' to show the full list of clusters as different rows in the ‘Search Results’ table.
Once a cluster has been selected, the ‘Cluster Details’ page presents the available data for the cluster. On top and next to the Cluster ID title, a gold star symbol indicates whether the cluster is robust (see FAQ for details). Next to it a download button is provided for access to all data about this cluster (red arrow in figure below). Eight sections are available to explore:
General information about the cluster is presented at the top of the page (red box in figure below), with details about the following fields:
At the bottom of this tutorial (‘Example of a Biological Study Case’) we provide a brief description on how CoDNaS-RNA can help to gain biological insights about a system of interest. We used Cluster_64 as an example, which corresponds to the HIV-1 TransActivation Responsive element (HIV-1 TAR) RNA, and its interaction with the trans-activator protein TAT.
The Structural Information section provides comparative structural data determined among all members of the cluster (see red box figure below). Data is provided for the following fields:
As can be seen in the figure above, Cluster_64 includes 128 alternative conformations related to the HIV-1 TAR element. The table reports a maximum RMSD of 4.24 Å, accompanied by a low minimum TM-score of 0.0843. This indicates that the RNA molecule is flexible and has at least two conformations with distinct structures.
Every cluster has at least one pair of conformers that displays the maximum RMSD value between them. We call this the 'maximum pair' of conformers. This section presents a table of structural features of each of these two conformers along with their comparison. These are TM-align (red arrow in figure below) structural values, cross-references from the RCSB PDB (blue arrow in figure below) and interaction features obtained with DSSR (yellow arrow in figure below).
The first column (see red box 1 in the figure below) identifies each feature, with information about individual conformers placed in the second (red box 2) and third (red box 3) columns. The features include:
The fourth column (rectangle 4) indicates if the values of the corresponding feature differ between conformers. For easy identification, rows in green indicate that the conformers have identical values, while red rows highlight differences.
As mentioned earlier, a relevant feature of CoDNaS-RNA is the interaction information. A dedicated table (see figure below) presents inter-chain contacts between RNA and other nucleotides or with proteins, and intra-chain contacts in the RNA, all extracted from mmCIF files with DSSR. After a row is selected (red arrow and circle in figure below), it expands to show all interactions at residue level in both conformers, allowing for their visual inspection and comparison. On this expanded table (red boxes in figure below) and from left to right, the columns indicate the residue name in one-letter code ('res id'), its position in the structurally observed sequence ('res num'), and the chain ('partner chain'), name ('partner res id') and position ('partner res num') of the partner residue it interacts with.
This section allows to visualize the optimal superposition (red box figure below) for the maximum RMSD pair of conformers in the cluster, as calculated by TM-align. CConformers are displayed in different colors (see red arrow for legend). Within the interactive frame (see big red box in the figure below) users can move, rotate and zoom the superposed structures. Placing the mouse over the structures identifies the pointed residue or bond. Only the RNA conformer chains are displayed for simplicity, but the full representation of structural data can be inspected on the bottom of the website by choosing one of the ‘Available Conformers’ table in the cluster. A download button is provided (blue arrow in figure below) to download the structures of the fixed and superposed conformers in PDB format.
In the example above, 1uts_9_B and 2kdq_8_B from Cluster_64 can be viewed in a 3D superimposed representation. This facilitates detection of structural similarity and highlights regions where differences are more evident, such as the loop in the upper left on the figure.
In this first subsection, users can see, move and zoom the two-dimensional secondary structure representation of each conformer. Clicking on the download button at the right side, a list of options for downloading data becomes available (red arrow in figure below). All of them download two files, one for each conformer, but the option 'All' downloads all secondary structure information as a single compressed tar.gz file. The options are:
In the example above, the secondary structure representation of 1uts_9_B and 2kdq_8_B from Cluster_64 do not show any difference. Secondary structures, taken together with the tertiary structure superposition, can help understand the extent and origin of conformational diversity in the depicted RNA.
In this second subsection two text boxes are available per conformer (see figure below). The box on top (red arrow in figure) shows the primary sequence of the conformer as taken from the revised dot-bracket file (see first subsection). The second box (blue arrow in figure below) shows the secondary structure in dot-bracket notation extracted from the same file. Users can select a sequence region on the top box and see the corresponding dot-bracket notation region highlighted on the box below.
This section presents the two primary sequences from the pair of conformers in the cluster that displays the maximum RMSD value between them (the 'maximum pair'). For easier reading, the sequences are shown in blocks of ten positions with up to six blocks per line (see red boxes on figure below). The name of each conformer is shown with the number of residues to the left and a regular expression-compatible search field on the right that allows users to find specific regions within the sequence. A download button at the right side (red arrow on figure below) facilitates a quick download of the conformers fasta files.
This section shows the structural similarity among all conformers in the cluster (see image below). The results of a complete-linkage hierarchical clustering of RMSD values among all conformers are shown. Structural differences between all pairs of conformers are represented by their relative RMSD values, such that more distant structures are at longer vertical distances in the dendrogram representation (see red box on the left side of image). The matrix on the right is composed of two merged heatmaps. Each cell of the matrix shows a pairwise RMSD value for a unique superposition of conformers. The color scale range of the upper triangle heatmap is based on RMSD values observed in the cluster (blue arrow in figure below). The lower triangle heatmap has the same RMSD values but the color scale range is based on RMSD values reported for the whole CoDNaS-RNA database (yellow arrow in figure below). Note: A few clusters display heatmap matrices with some rows in the 'wrong' order. This is due to limitations in the plotting library when all pairwise RMSD values are identical for two or more conformers. This does not affect the RMSD values and coloring scheme of the heatmap. A download button is provided (red arrow in figure below) to download the full-scale PNG image.
This section presents a table of all conformers present in the cluster with selected data of interest as sortable columns (see figure below). The fields include the conformer length, its original source and several details on the structural determination (the method name and associated resolution, plus experimental temperature and pH.) Clicking anywhere on a row updates the subsection below, 'Overview of RCSB PDB entry from selected Conformer', to display extended structural information about the corresponding conformer. This table is also used to retrieve the pairwise comparisons of two or more conformers in the cluster. Users can select their preferred conformers by clicking on checkboxes to the left side of the names. After selecting the conformers, pressing the Compare button (see red arrow in figure below) will open a new tab on the browser with the the ‘Pair Details’ page (see ‘Pair Details’ section below).
Once a conformer row is selected in the table above, this section loads a full 3D structure representation of the entire RCSB PDB entry that the selected conformer belongs to (see figure below.) A dedicated button opens the 3D-view on the RCSB PDB website on a new browser tab (see red box in figure below), allowing the user to explore the structure in more detail. Above this, two subsections display structural data about the molecule information (red arrow in figure below), its primary reference (blue arrow in figure below). In particular, details on the primary citation of the wwPDB entry include its Title, Abstract, Authors, Journal details, DOI, Pubmed ID and Pubmed Central ID. Data is mostly taken from the RCSB PDB entry directly, but Taxon ID is also mapped against the NCBI Taxonomy database to ensure Genus and Species are properly represented (see top red circle below). For easy usage, users can jump to external references on bibliography by clicking the DOI, Pubmed and PMC related links.
On the 'Available Conformers' table of the ‘Cluster Details’ page, users can select two or more conformers to obtain their pairwise structural comparisons. As a result, the ‘Pair Details’ page is opened to allow inspection of the individual and compared features of the selected pair. The layout of the page is very similar to the ‘Cluster Details’ page for the maximum pair of conformers.
This section presents a table with each resulting conformer pair in a different row (see red box below). A green box is briefly shown on the bottom to indicate the total number of pairs retrieved (red arrow below). Comparative information about the pair is presented on sortable columns. The fields describe:
Once a pair is selected by clicking in its row, the page displays the data for that pair below the table. The table remains visible on top as an easy way to navigate the conformational ensemble of the RNA.
The sections shown for the selected pair are identical to what is shown for the maximum pair described in the ‘Cluster Details’ page, with the exception of the 'Structural Clustering for all Conformers By RMSD' and 'Available Conformers' sections on the bottom. Therefore, the ‘Pairs Details’ page includes the following sections:
Please refer to the ‘Cluster Details’ page description in this tutorial for more information on the contents of each section.
As we have shown along this tutorial, Cluster_64 includes 128 alternative conformations related to the immunodeficiency virus type 1 (HIV-1) transactivation response (TAR) RNA element. The TAR element displays a stem-loop structural arrangement (Aboul-ela F. et al. 1996).
Previous works have shown that the HIV-1 transcriptional activator protein Tat binds TAR (Kligun E. et al., 2015). This interaction is crucial for the virus since it ultimately enhances transcription of the integrated proviral genome (Shortridge MD. et al., 2019). That binds on a trinucleotide bulge near the apical loop of TAR. This interaction is associated with a conformational change in the bases and backbone of TAR. The importance of this interaction for the virus has led to its consideration as the target of different antiviral drugs.
Structural information has provided important details about the relevance of different arrangements in TAR that could explain alternative responses to drugs. In particular, CoDNaS-RNA can provide alternative conformations of the U-C-U bulge and surrounding nucleotides that could contribute to a better understanding of the effect caused by different interactors.
Cluster_64 shows two independently solved TAR conformers (1uts_9_B and 2kdq_8_B) in the pair of conformers with maximum RMSD between them (RMSD = 4.24Å). The structural superposition of these conformers shows evident displacements along the entire structure (see figure below, with 1uts_9_B in cyan and 2kdq_8_B in green):
Both structures were solved in the presence of ligands designed as synthetic drugs:
As presented in section 5.1 of this tutorial, the secondary structure representations of 1uts_9_B and 2kdq_8_B are identical. However, visual exploration of the complete RCSB PDB 3D representation of 2kdq (below, left) and 1uts (below, right) in CoDNaS-RNA shows that these structures are distinct and very mobile, as evidenced from the superposition of all models in each PDB entry:
While all 10 NMR models of 2kdq cluster together on the RMSD-based hierarchical clustering included in CoDNaS-RNA, inspection of these conformers suggest a high variability mostly on the apical loop, also evident in the trinucleotide bulge. Furthermore, 1uts has 16 NMR models that display an extended flexibility along the whole structure, which possibly translates to the observed distribution of 1uts models in different sub-groups of the hierarchical clustering.
An interesting question is whether this population of highly diverse TAR conformers require different ligands to become available, or if they are also accessible in the free TAR conformational ensemble. The RCSB PDB entry 1anr corresponds to the ligand-free NMR structure of TAR. Analyzing its behaviour from the PDB 3D representation in CoDNaS-RNA (below, left) shows that its 20 conformers have a general flexibility that is more strongly reflected in the TAT binding pocket. These conformers appear dispersed throughout the hierarchical clustering tree, although always closer to 1uts_9_B than to 2kdq_8_B, the other member in the maximum pair for Cluster_64. In this way, it is possible to note that the free TAR molecule visits the conformations related to the inhibit, ligand-bound forms. Another set of TAR conformers that is similar to the free forms is captured by 1arj (below, right), one of the first structural determinations of this RNA and bound to the amino acid derivative argininamide. In these conformers the bulge region undergoes a local conformational rearrangement to form a more stable structure. Interestingly, while most argininamide-bound 1arj conformers are similar to different free 1anr models, two of them (1arj_14_N and 1arj_5_N) are clustered apart, closer to the 2kdq_8_B member from the maximum pair.
Overall, the high conformational diversity of TAR is evidenced by a flexibility range that allows this RNA to bind several compounds of different chemical properties and using alternative binding mechanisms. This high degree of conformational diversity may be present prior to the interaction with other ligands, which may instead impose a conformational selection that fixes the HIV-1 TAR RNA in a preferred, and likely functionally different, structural conformation.
Putting the available data in the hands of interested users, experts or not, allows us to assert that the inclusion of alternative conformations could significatively help to a more complete description of an interaction mechanism, as the different pieces of evidence centered on the example presented above have shown. The take-home message is that the consideration of these alternative conformations in a structure-based database such as CoDNaS-RNA could contribute to finding, analyzing and comparing these relevant aspects in an accessible and centralized way.