SARS-CoV-2 Alignments and Homology Models

While curating pathway models for the various functions of SARS-CoV-2 proteins, I often stumbled upon previously published pathway figures and solved protein structures relating to the prior rounds of coronavirus research. The obvious question in these cases is "how similar are the sequences; are the critical residues conserved?" In the spirit of open science, I'm sharing these protein sequence alignments along with my observations on particularly interesting similarities and, by extension, structural and functional predictions.

Jump to alignments below:

Nsp3 (PLpro domain)

SARS-CoV-2 has 82.6% sequence identity with SARS-CoV over the 316 amino acid PLpro domain, which was co-crystalized with its inhibitor GRL0617 . Note: 100% identity over the 13 residues that participate in binding GRL0617: L163,G164,D165,E168,P248,P249,Y265,G267-G272(loop),Y269,Q270,Y274,T302. The Papain-Like Protease (PLpro) can inhibit multiple steps of the induction of type I interferon signaling pathway.

102030405060708090100110120130140150160170180190200210220230240250260270280290300310REVRTIKVFTTVDNINLHTQVVDMSMTYGQQFGPTYLDGADVTKIKPHNSHEGKTFYVLPNDDTLRVEAFEYYHTTDPSFLGRYMSALNHTKKWKYPQVNGLTSIKWADNNCYLATALLTLQQIELKFNPPALQDAYYRARAGEAANFCALILAYCNKTVGELGDVRETMSYLFQHANLDSCKRVLNVVCKTCGQQQTTLKGVEAVMYMGTLSYEQFKKGVQIPCTCGKQATKYLVQQESPFVMMSAPPAQYELKHGTFTCASEYTGNYQCGHYKHITSKETLYCIDGALLTKSSEYKGPITDVFYKENSYTTTIKMEVKTIKVFTTVDNTNLHTQLVDMSMTYGQQFGPTYLDGADVTKIKPHVNHEGKTFFVLPSDDTLRSEAFEYYHTLDESFLGRYMSALNHTKKWKFPQVGGLTSIKWADNNCYLSSVLLALQQLEVKFNAPALQEAYYRARAGDAANFCALILAYSNKTVGELGDVRETMTHLLQHANLESAKRVLNVVCKHCGQKTTTLTGVEAVMYMGTLSYDNLKTGVSIPCVCGRDATQYLVQQESSFVMMSAPPAEYKLQQGTFLCANEYTGNYQCGHYTHITAKETLYRIDGAHLTKMSEYKGPVTDVFYKETSYTTTIK

Nsp5 Mpro/3CLpro

SARS-CoV-2 nsp5 encodes a 3C-like proteinase (a.k.a. main proteinase) that mediates cleavages downstream of nsp4. Structures have been determined for both SARS-CoV and SARS-CoV-2 proteins.

Coming soon...

Nsp9

SARS-CoV-2 nsp9 forms a dimer through the interaction of parallel alpha-helices containing the interaction motif GXXXG.

Coming soon...

Nsp10

A unique feature for SARS-CoV is that nsp16 requires non-structural protein nsp10 as a stimulatory factor to execute its MTase activity.

Coming soon...

Nsp13 Helicase

SARS-CoV-2 nsp13 is a helicase and thus an important potential drug target.

Coming soon...

Nsp16 OMT

SARS-CoV-2 nsp16 encodes a 2'-O-ribose methyltransferase (OMT) that modifies the 5'-end of its viral RNAs to mimic eukaryotic mRNAs, which is important for RNA stability, protein translation and evading viral immune response. In SARS-CoV, nsp16 was shown to require nsp10 in order to bind m7GpppA-RNA and the protein structure of the complex has been solved. SARS-CoV-2 nsp16 has 93.3% sequence identity over 298 amino acids of this solved structure. The residues for binding nsp10 are 100% conserved (I40,T48,A83,V84,R86,Q87,D102,S105,D106,L244,M247). Those for SAM binding are 100% conserved (N43,Y47,G71,A72,G73,G81,D99,L100,N101,D114,C115,D130,M131,Y132,F149). And the motif for methyl-transfer is 100% conserved (K46,D130 K170,E203)

102030405060708090100110120130140150160170180190200210220230240250260270280290SSQAWQPGVAMPNLYKMQRMLLEKCDLQNYGDSATLPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLPTGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPKTKNVTKENDSKEGFFTYICGFIQQKLALGGSVAIKITEHSWNADLYKLMGHFAWWTAFVTNVNASSSEAFLIGCNYLGKPREQIDGYVMHANYIFWRNTNPIQLSSYSLFDMSKFPLKLRGTAVMSLKEGQINDMILSLLSKGRLIIRENNRVVISSDVLVNNASQAWQPGVAMPNLYKMQRMLLEKCDLQNYGENAVIPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLPTGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPRTKHVTKENDSKEGFFTYLCGFIKQKLALGGSIAVKITEHSWNADLYKLMGHFSWWTAFVTNVNASSSEAFLIGANYLGKPKEQIDGYTMHANYIFWRNTNPIQLSSYSLFDMSKFPLKLRGTAVMSLKENQINDMIYSLLEKGRLIIRENNRVVVSSDILVNN

Orf3a

One of the novel proteins originally characterized in SARS-CoV consists of a 274 amino acid viroporin called Orf3a (alternatively X1 or U274). This sequence share no significant similarity with any other known protein! With an extracellular N-terminus, 3 transmembrane domains and a long C-terminus, Orf3a is localized to the cell membrane and perinuclear region. Antibodies to Orf3a were found in a majority of SARS patients in 2004. Likewise, antibodies to the N-terminus could be raised and used in vitro to detect surface expression as well as endocytosis. Given the lack of homologous sequences, there have been many studies attempting to elucidate the function and critical sequence/structure motifs of Orf3a. In the alignment below, a diverse set of Orf3a sequences are aligned and labeled per the host organism, collection location and collection date. The SARS-CoV-2 sequence is thus labeled, "Human_China_2019" (in bold). Below the alignment are manual annotations and a secondary structure prediction from JPred.

Signal sequence: The YxxΦ and diacidic motifs between 160 and 173 in the C-terminal domain have been implicated in the localization of Orf3a to the plasma membrane. Note, however, that position 171 (labeled E*) is acidic only in the human SARS-CoV from 2003. Perhaps the other nearby acidic residues compensate; or perhaps transmembrane localization is diminished. Also note that even some of the more conserved acidic positions (165, 182, 192) are lost or even flipped to basic residues in the 2019 human SARS-CoV-2 sequence.

K+ channel activity: Given the membrane localization and presence of potassium ion channels in many other viruses, Orf3a has been tested for potassium channel behavior. Lu et al. characterized the formation of homotetramers (common among K+ channels) as a dimer of disulfide-linked dimers (not common). They also demonstrated K+ conductance dependent on tetramer formation and blocked by barium. While Orf3a may form a channel of some sort, it is most assuredly not a K+ ion channel. From cholera to methanobacteria to yeast, mice and humans, K+ channels always have a signature selectivity filter between the last two transmembrane domains: TXXTXGYG. There is no part of Orf3a that aligns with this essential K+ channel feature. Viroporins are small transmembrane proteins that transport ions and small molecules and play diverse roles in the lifecycle of all sorts of viruses . So, while the selective conductance of K+ is doubtful, Orf3a may still be playing a role as a viroporin. Unfortunately, all attempts to align Ofr3a with the sequences of many viral and bacterial channels and transporters have yet to yield any similarities of note.

TRAF3 binding: SARS-CoV-2 has 72.7% sequence identity with SARS-CoV over the 275 amino acids of Orf3a, which was shown to bind to TRAF3 and induce the NLRP3 inflammasome, contributing to the cytokine storm. TRAF3 binding is mediated by the PxQxS/T motif starting at residue 36 and is 100% conserved in SARS-CoV-2.

Human_China_2003Bat_China_2004Bat_China_2011Bat_China_2013Bat_Kenya_2007Bat_Bulgaria_2008Human_China_2019
102030405060708090100110120130140150160170180190200210220230240250260270MDLFMRFFTLGSITAQPVKIDNASPASTVHATATIPLQASLPFGWLVIGVAFLAVFQSATKIIALNKRWQLALYKGFQFICNLLLLFVTIYSHLLLVAAGMEAQFLYLYALIYFLQCINACRIIMRCWLCWKCKSKNPLLYDANYFVCWHTHNYDYCIPYNSVTDTIVVTEGDGISTPKLKEDYQIGGYSEDRHSGVKDYVVVHGYFTEVYYQLESTQITTDTGIENATFFIFNKLVKDPP-NVQIHTIDGSSGVANPAMDPIYDEPTTTTSVPLMDLFMSIFTLGSITRQPSKIENAFLASTVHATATIPLQASFSFRWLVIGVALLAVFQSASKVIALHKKWQLALYKGIQLVCNLLLLFVTIYSHFLLLAAGIEVQFLYIYALIYILQILSFCRFVMRCWLCWKCKSKNPLLYDANYFVCWHTYNYDYCIPYNSVTDTIVVTSGDGISTPELKEDYQIGGYSEDWHSGVKDYVVVHGYFTEVHYQLESTQITTDTGIQNATFFIFNKLVKDPP-NVQIHTIDGSSGVVNPAMDPIYDEPTTTTSVPLMDLFMRIFTLGSITAQPGKIDNASPASTVHATATIPLQATLPFGWLVIGVAFLAVFQSATKIIALNKRWQLALYKGFQFICNLLLLFVTIYSHLLLVAAGMEAQFLYLYALIYFLQCINACRIIMRCWLCWKCRSKNPLLYDANYFVCWHTNCYDYCIPYNSVTDTIVLTSSDGTNVPKLKEDYQIGGYSEDWHSGVKDYVVIHGYFTEIYYQLESTQLSTDTGAENATFFIYSKLVKDAD-HVQIHTIDGSSGVVNPAMDPIYDEPTTTTSVPLMDLFMRIFTLGSITAQSGKIDNASPASTVHATATIPLQASLPFGWLVIGVAFLAVFQSVTKIIALNKRWQLALYKGFQFICNLLLLFVTIYSHLLLVAAGMEAQFLYLYALIYFLQCINACRIIMRCWLCWKCKSKNPLLYDANYFVCWHTHNYDYCIPYNSVTDTIVVTAGDGISTPKLKEDYQIGGYSEDWHSGVKDYVVVHGYFTEVYYQLESTQITTDTGIENATFFIFNKLVKDPQ-NVQIHTIDGSSGVVNPAMDPIYDEPTTTTSVPLMDLFISIFTLGSITR--GSVQNAVPANSLHATATIPLQATLPFGWLIVGVALLAVFQNASKVIPFNSLWQRCLYQSFQLVCSLLVGFLTVYVHLLLAAAGLEAPFLYLLALIYFLQCVVFGRFLLRCWLCWKCKSKNPLIYDASYFVCWHTHTHDYCIPYNSITETIVLTAGDGVTIPIKTQDYQIGGFVEKWESGVKDYVTLIGLFTEIHYQLESTQISADTGINNATFFLFSKYD-RESESVQVHTIDGSSGVVN----PIYDEPTPTTSVPLMDLFLNIFTLGSITRQPGKVENVSPASSFHSTASIPLQATLPFGWLVVGVAFLAVFQSAAKLIPFNSLWQRCLYQSFQLLCNVLLIALTVYSHLLLVAAGLEAPFLYLLALIYFLQCVVFGRLLVRCWLCWKCKSKNPLIYDSNYFVCWHTHTHDYCIPYNSITNTIVLTAGDGVTIPIRTQDYQIGGYFEKWESGVKDYLTLIGPFTEVYYQLESTQISTDTGINNATFFLFSKNDEREQESVQVHTIDGSSGVVN----PIYDEPTPTTSVPLMDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPLPQS/TCCC*YxxΦE*DTM1TM2TM3

Structural protein: There is evidence for the Orf3a acting as a structural protein, in association with proteins N, M, S and E, and specifically as a modulator of the trafficking properties of protein S. Interestingly, Orf3a can also be aligned with the sequence of protein M, as another 3-transmembrane protein with a large C-terminus. The alignment reveals a 14% sequence identity (34% similarity) between Orf3a and protein M of SARS-CoV-2, which is well below what is typically considered homologous and less than the identity among M proteins across other coronoviruses (~23%), for example. But given the striking similarity in predicted secondary structure (shown below alignment) and conspicuous viral genomic context, some questions are raised: Can Orf3a perform a similar function as protein M? Can it dimerize with protein M? Is it a novel structural protein that is "allowed" to mutate while the essential role played by the other structural proteins was conserved?

SARS-CoV-2_orf3a/1-275SARS-CoV_orf3a/1-274SARS-CoV-2_M/1-222SARS-CoV_M/1-221Orf3aM
102030405060708090100110120130140150160170180190200210220230240250260270MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPLMDLFMRFFTLGSITAQPVKIDNASPASTVHATATIPLQASLPFGWLVIGVAFLAVFQSATKIIALNKRWQLALYKGFQFICNLLLLFVTIYSHLLLVAAGMEAQFLYLYALIYFLQCINACRIIMRCWLCWKCKSKNPLLYDANYFVCWHTHNYDYCIPYNSVTDTIVVTEGDGISTPKLKEDYQIGGYSEDRHSGVKDYVVVHGYFTEVYYQLESTQITTDTGIENATFFIFNKLVKDPP-NVQIHTIDGSSGVANPAMDPIYDEPTTTTSVPL-----------------MADSNGTITVEELKKLLEQWNLVIGFLFLTWICLLQFAYANRNRFLYIIKLIFLWLLWPVTLACFVLAAVYRINWIT-GGIAIAMACLVGLMWLSYFIASF---RLFARTRSMWSFN------PETNILLNVPLHGTILTRPLLESELVIGAVILRG-----HLRIAGHHLGRCDIKDLPKEITVATSRTLSYYKLGASQRVAGDSGFAAYSRYRIGNYKLNTDHSSSSDNIALLVQ--------------------------------------MAD-NGTITVEELKQLLEQWNLVIGFLFLAWIMLLQFAYSNRNRFLYIIKLVFLWLLWPVTLACFVLAAVYRINWVT-GGIAIAMACIVGLMWLSYFVASF---RLFARTRSMWSFN------PETNILLNVPLRGTIVTRPLMESELVIGAVIIRG-----HLRMAGHSLGRCDIKDLPKEITVATSRTLSYYKLGASQRVGTDSGFAAYNRYRIGNYKLNTDHAGSNDNIALLVQ---------------------