While curating pathway models for the various functions of SARS-CoV-2 proteins,
I often stumbled upon previously published pathway figures
and solved protein structures
relating to the prior rounds of coronavirus research. The obvious question in these cases is
"how similar are the sequences; are the critical residues conserved?" In the spirit of open science, I'm sharing these
protein sequence alignments along with
my observations on particularly interesting similarities and, by extension, structural and functional predictions.
SARS-CoV-2 has 82.6% sequence identity with SARS-CoV over the 316 amino acid PLpro domain, which was
co-crystalized with its inhibitor GRL0617
. Note: 100% identity over the 13 residues that participate in binding GRL0617:
L163,G164,D165,E168,P248,P249,Y265,G267-G272(loop),Y269,Q270,Y274,T302.
The Papain-Like Protease (PLpro) can inhibit multiple steps of the
induction of type I interferon signaling pathway.
Nsp5 Mpro/3CLpro
SARS-CoV-2 nsp5 encodes a 3C-like proteinase (a.k.a. main proteinase) that mediates cleavages downstream of nsp4.
Structures have been determined for both SARS-CoV and
SARS-CoV-2 proteins.
SARS-CoV-2 nsp16 encodes a 2'-O-ribose methyltransferase (OMT) that modifies the 5'-end of its viral RNAs to mimic eukaryotic mRNAs, which is important for RNA stability,
protein translation and evading viral immune response. In SARS-CoV, nsp16 was shown to require nsp10 in order to bind m7GpppA-RNA and the
protein structure of the complex has been solved.
SARS-CoV-2 nsp16 has 93.3% sequence identity over 298 amino acids of this solved structure. The residues for binding nsp10 are 100% conserved
(I40,T48,A83,V84,R86,Q87,D102,S105,D106,L244,M247). Those for SAM binding are 100% conserved
(N43,Y47,G71,A72,G73,G81,D99,L100,N101,D114,C115,D130,M131,Y132,F149).
And the motif for methyl-transfer is 100% conserved (K46,D130 K170,E203)
Orf3a
One of the novel proteins originally characterized in SARS-CoV consists of a 274 amino acid
viroporin called Orf3a (alternatively X1 or U274). This sequence share no significant similarity with any other known protein!
With an extracellular N-terminus, 3 transmembrane domains and a long C-terminus, Orf3a is localized to the cell membrane and perinuclear region.
Antibodies to Orf3a were found in a majority of SARS patients in 2004.
Likewise, antibodies to the N-terminus could be raised and used in vitro to detect surface expression as well as endocytosis. Given the lack of homologous sequences,
there have been many studies attempting to elucidate the function and
critical sequence/structure motifs of Orf3a. In the alignment below, a diverse set of Orf3a sequences
are aligned and labeled per the host organism, collection location and collection date. The SARS-CoV-2 sequence is thus labeled, "Human_China_2019" (in bold). Below the
alignment are manual annotations and a secondary structure prediction from JPred.
Signal sequence: The YxxΦ and diacidic motifs between 160 and 173 in the C-terminal domain
have been implicated in the localization of Orf3a to the plasma membrane. Note, however, that position 171 (labeled E*) is acidic only in the human SARS-CoV
from 2003. Perhaps the other nearby acidic residues compensate; or perhaps transmembrane localization is diminished.
Also note that even some of the more conserved acidic positions (165, 182, 192) are lost or even flipped to basic residues in the 2019 human SARS-CoV-2 sequence.
K+ channel activity: Given the membrane localization and presence of potassium ion channels in many other viruses,
Orf3a has been tested for potassium channel behavior. Lu et al. characterized the formation of homotetramers (common among K+ channels)
as a dimer of disulfide-linked dimers (not common). They also demonstrated K+ conductance dependent on tetramer formation and blocked by barium. While Orf3a may form a channel of some sort, it is most assuredly not
a K+ ion channel. From cholera to methanobacteria to yeast, mice and humans, K+ channels always have a signature selectivity filter between the last two transmembrane domains: TXXTXGYG. There is no part
of Orf3a that aligns with this essential K+ channel feature.
Viroporins are small transmembrane proteins that transport ions and small molecules and play diverse roles in the lifecycle
of all sorts of viruses . So, while the selective conductance of K+ is doubtful, Orf3a may still be playing a role as a viroporin. Unfortunately, all attempts to align Ofr3a with the sequences of many viral
and bacterial channels and transporters have yet to yield any similarities of note.
TRAF3 binding: SARS-CoV-2 has 72.7% sequence identity with SARS-CoV over the 275 amino acids of Orf3a, which was
shown to bind to TRAF3 and
induce the NLRP3 inflammasome, contributing to the cytokine storm. TRAF3 binding is mediated by the PxQxS/T
motif starting at residue 36 and is 100% conserved in SARS-CoV-2.
Structural protein: There is evidence for the Orf3a
acting as a structural protein,
in association with proteins N, M, S and E, and specifically as a
modulator of the trafficking properties of protein S. Interestingly, Orf3a can also be aligned with the sequence of protein M, as another 3-transmembrane
protein with a large C-terminus. The alignment reveals a 14% sequence identity (34% similarity) between Orf3a and protein M of SARS-CoV-2, which is well below what is
typically considered homologous and less than the identity among M proteins across other coronoviruses (~23%), for example. But given the striking similarity in predicted
secondary structure (shown below alignment) and conspicuous viral genomic context, some questions are raised:
Can Orf3a perform a similar function as protein M? Can it dimerize with protein M? Is it a novel structural protein
that is "allowed" to mutate while the essential role played by the other structural proteins was conserved?