This web page originated as an assignment in Emory University's Biology 142 lab course. Students were assigned proteins of interest and asked to research what is known about the protein and to examine whether the newly sequenced whale shark genome had evidence of an orthologous protein.
Background Information
Non-homologous end-joining factor 1, also known as Cernunnos or XRCC4-like factor (XLF) is encoded by the NHEJ1 gene in the human genome. This protein is required for the non-homologous end joining pathway (NHEJ) of DNA reparation mechanism. It also plays an important role in one’s immune system because NHEJ helps people generate a diversity of antibodies. It’s thought to be involved with the end-bridging or ligation steps of NHEJ by working with DNA ligase.
Non-homologous end-joining factor 1 is a very important protein in the human body because the reparation of DNA double-strand breaks is crucial. Without this protein, the human genome would be damaged without repair and large quantity of mutations would occur. Although it may have side effects such as linking previously unlinked genes and causing minor changes at the break sites, it still contributes a lot more to the stability of the genome. It exists in mammals, yeast and bacteria, in which it stabilizes and preserves the genome. (Hefferin, 2004)
There are two main ways of repairing double-strand breaks: homologous recombination (HR) and NHEJ. HR is an overall very effective and accurate pathway, predominantly in the S and G2 cell cycle, whereas NHEJ, although inaccurate, is the prevailing repair pathway during G1 and M phase (Weterings, 2008). There are also many additional proteins that are heavily involved in this pathway in order for it to work.
Here, we are able to find a hit in the whale shark genome that share the same domain as our protein of interest, but we are not entirely confident to conclude that the protein we picked is the ortholog in whale shark.
Methods
Finding Whale shark predicted orthologs: The human protein sequence (ENSP00000349313) was used as the query in a BLAST against the predicted whale shark protein database using the Galaxy server by Georgia Aquarium. The full sequence of the best matching predicted protein, determined by the lowest e-value and alignment length, was obtained. The sequence was then used as the query in protein BLASTs against the NCBI human protein database to confirm existence of possible orthologs by showing a positive reciprocal match. Using the bootstrap method, we were able to further confirm our best hit. Predicted orthologs in other species: NHEJ1 (ENSP00000349313) predicted orthologs were identified in other species by using the NCBI Blast server. The human NHEJ1 protein sequence was used as query in these searches. Protein BLASTs were performed using the human protein sequence against single species protein databases for mouse, zebrafish, rat, whale shark, and rabbit species on the NCBI database with default settings. Phylogenetic tree: The whale shark predicted protein sequence and the top hit with the lowest e-value for each non-whale shark species, with the human sequence as the query, were used to create a multiple sequence alignment and phylogenetic tree. The source used to create the multiple sequence alignment and rooted phylogenetic tree was ClustalW2. None of the settings were altered.
Searching for NHEJ1 in whale sharks
The protein sequence for NHEJ1 in humans was used as the query against the whale shark predicted protein database. The results of the top four hits in the BLAST are shown in Table 1 below. These 4 best hits had the lowest E-vlaues with the lowest E-value being 3e-17. Because there was a significant difference between the best hit and the other three in terms of e-value, only the top hit was Blasted against the human protein database using NCBI BLASTp.
Whale Shark ID
e-value
Alignment Length
Predicted protein length
% Identity
g34279.t1
2e -05
87
122
29.63%
g21654.t1
3e -17
72
82
43.06%
g13492.t1
3e -05
27
138
52.38%
g21418.t1
3e -04
30
119
43.33%
Table1. Human NHEJ1 best BLASTp best hits against the whale shark predicted protein database. The Galaxy server was used to BLAST the human NHEJ1 protein sequence against the predicted whale shark protein database.. The top 4 hits according to E-value are displayed here with their ID, protein length, alignment length, and percent identity. The top hit was used as query and blasted against the human database.
The top predicted protein (ID g21654.t1) from the Blast of the whale shark proteins using the human NHEJ1 sequence as query returned NHEJ1 as the best hit against the human protein database. The hit had an E-value of 6e-12 and a an identity of 43.2%. With this result we were not confident that the best predicted whale shark protein and the human NHEJ1 are orthologs. We believe that the two proteins share the same protein domain.
Protein Domains
The predicted NHEJ1 protein domain was consistently conserved across all five organisms (Human, Whale Shark, House Mouse, Brown Rat, and Zebrafish). The working mechanism of XLF has eluded researchers and has remain enigmatic ever since. However it has been found that XLF super family works with XRCC4 to provide supporting structure for reparation of double strand break without DNA ends. (Mahaney, 2013). [Homo Sapian]-Human [Rhicodon Typus]-Whale Shark [Mus Musculus]-House Mouse [Rattus norvegicus]-Brown Rat [Danio rerio]-Zebrafish Figure 1.Domains best hit predicted proteins for all the species compared. All the best hits contain XLF domains as predicted by NCBI BLAST server analyses
mouse MEELEQDLLLQPWAWLQLAENSLLAKVSITKHGYALLISDLQQVWHEQVDTSVVSQRAKE rat MEELEQGLLMQPWAWLQLAENSLLAKASITKHGYALLISDLQQVWHEQVDTLEVSQRAKE human MEELEQGLLMQPWAWLQLAENSLLAKVFITKQGYALLVSDLQQVWHEQVDTSVVSQRAKE rabbit MEELEQGLLMQPWAWLQLAENSLMAKAYITKQGYALLVSDLQQVWHEQVDTNVVIQRAKE elephantshark ------------------------------------MLSDLDSVWWEEMTSDNIRQRSQE g21654.t1 ------------------------------------------------QMLLNVILFPQE zebrafish ---MEAVLSALPWVPVNISGSDLLAKAWFGDSQYRVLLTDLSTVWEEEMSTDDIQSRAQD : .::
mouse PFSWHFHCIPASSSLVSQHLIHPLMGVSLALQSHVRELAALLRMKDLEIQAYQESGAVLS rat PFNWHFHCLPASSLLVSQHLICPLMGVSLALHSHVRELAALLRMKDLEIQAYQESGAVLS human PFYWNFHCMLASPSLVSQHLIRPLMGMSLALQCQVRELATLLHMKDLEIQDYQESGATLI rabbit PFCWHFHCTPASPFLVSQHLIRPLMGMSLALQNQVRELATLLRMKDLEIQEYRESGATLS elephantshark PFYWAFHCSEAPISMVTRHMVCPLLGMVQALQRQSRDLVLLIARKDAEILDYRENGAALT g21654.t1 PFYWTFRCTEAPMSM--------------------------------------------- zebrafish PFYWEFRCTTTPVAVVCRQLVRPLLAMTLVLQRQAEDLAALLARKDAEIQDYQENGAVLS ** * *:* :. :
mouse RSRLKTEPFEENSFLEQFMAEKLPEACAVGDGKPFAMSLQSLYVAVTKQQIQARQAHKDS rat RGRLKTEPFEENSFLEQFMVEKLPEACAVGDGRPFAMNLQSLYVAVTKQQVQARQKHKGS human RDRLKTEPFEENSFLEQFMIEKLPEACSIGDGKPFVMNLQDLYMAVTTQEVQVGQKHQGA rabbit RGRLKTEPFEENSFLEQFMVEKLPEACSISDGRPFVLNLQSLYMAVTKQEVQVGQKHQGT elephantshark RDRLETEIFDEGKFKERFLAEGLPEPVTVEDAGVFSSELQQLYTAVTATEANQRAALGNT g21654.t1 ------------------------------------------------------------ zebrafish RARLQTEPFEVHQYKENFITQILPQMNVTLDSLGFDSELQALYMAVNSGKTGRKRKHSPD
mouse GETQASSSTSP-------RGTD----NQPEEP----VSLSSTLSEPEYEPVAASGPMHRA rat GEPQTSSSTSP-------QGTDSQLQNQPEQQ----ISPTPTLSEPECEPMAASGPVHRA human GDPHTSNSASL-------QGIDSQCVNQPEQL----VSSAPTLSAPEKESTGTSGPLQRP rabbit GDPQTSSSASP-------QRTDSQLVVQPEQP----AFSALAPSGPEKEPVGISASLQRP elephantshark GDGKTGHSELRGQLPAETREETEPVTSTPGEN----SEIKPKAPSKRPASGAGVSPRTES g21654.t1 ------------------------------------------------------------ zebrafish SSPAAQENHITDHQHISESTDVGPSLASQEHNNAKESGRSQVANSQQTLPLSSTAGSEDR
mouse RLVKSKRKKPRGLFS rat QLVKAKRKKPRGLFS human QLSKVKRKKPRGLFS rabbit QLSKVKRKKPRGLFS elephantshark PVAKQRKRKGRGIFG g21654.t1 --------------- zebrafish STSRAKKKKAVGLFR
Figure 2. The multiple alignment results across different species using clustalw is reported above. The XLF super family protein domain is highlighted in red.
Orthologs in Other Species
The human NHEJ1 protein sequence (ENSP00000349313) was used as query in NCBI BLAST searches against individual species' protein databases. XLF superfamily domain orthologs were found using this method in mice, rats, rabbits, zebra fish and elephant sharks.
Table 2.The best hits found when using the human NHEJ1 sequence against the various species. The names, ID, length and E-value are all listed above. All E-values are extremely significant.
Phylogenetic Tree
The best hits from the BLASTs of the human NHEJ1 protein as query against other species are shown below in the phylogenetic tree (Figure 1). Since elephant shark is closely related to whale shark, we are not surprised to see that those two are very close on the tree. From this tree, one can also see that the whale shark predicted protein is grouped with the human protein sequence quite closely, compared to zebrafish. This suggests there is close common ancestry in the protein sequence of NHEJ1 between humans and whale sharks.
Figure 3. Phylogenetic tree of NHEJ1 best hits across various species. The ClustalW2 program was used to create a phylogenetic tree from the best hit protein sequences.
Conclusion
We did find orthologs for NHEJ1 in other species and blasted against the whale shark genome. Judging from those results, we found g21654.t1 to be a good result. Although we were not able to confidently identify a predicted NHEJ1 ortholog in whale sharks because of the length difference, we found evidence that there is a similarity in the protein domains, which is the XLF super family. We speculate that since NHEJ1 in human and other species share a common protein domain with our whale shark hit, our hit has similar function as the NHEJ1 protein. Additionally, the extent of the common ancestry seen in the phylogenetic tree further supports this claim.
References
Hefferin, Melissa L., and Alan E. Tomkinson. "Mechanism of DNA double-strand break repair by non-homologous end joining." DNA repair 4.6 (2005): 639-648.
Weterings, Eric, and David J. Chen. "The endless tale of non-homologous end-joining." Cell research 18.1 (2008): 114-124. Xing, Mengtan, et al. "Interactome analysis identifies a new paralogue of XRCC4 in non-homologous end joining DNA repair pathway." Nature communications 6 (2015).
Mahaney, Brandi L., et al. "XRCC4 and XLF form long helical protein filaments suitable for DNA end protection and alignment to facilitate DNA double strand break repair 1." Biochemistry and Cell Biology 91.1 (2013): 31-41.
NHEJ1
This Project
This web page originated as an assignment in Emory University's Biology 142 lab course. Students were assigned proteins of interest and asked to research what is known about the protein and to examine whether the newly sequenced whale shark genome had evidence of an orthologous protein.Background Information
Non-homologous end-joining factor 1, also known as Cernunnos or XRCC4-like factor (XLF) is encoded by the NHEJ1 gene in the human genome. This protein is required for the non-homologous end joining pathway (NHEJ) of DNA reparation mechanism. It also plays an important role in one’s immune system because NHEJ helps people generate a diversity of antibodies. It’s thought to be involved with the end-bridging or ligation steps of NHEJ by working with DNA ligase.Non-homologous end-joining factor 1 is a very important protein in the human body because the reparation of DNA double-strand breaks is crucial. Without this protein, the human genome would be damaged without repair and large quantity of mutations would occur. Although it may have side effects such as linking previously unlinked genes and causing minor changes at the break sites, it still contributes a lot more to the stability of the genome. It exists in mammals, yeast and bacteria, in which it stabilizes and preserves the genome. (Hefferin, 2004)
There are two main ways of repairing double-strand breaks: homologous recombination (HR) and NHEJ. HR is an overall very effective and accurate pathway, predominantly in the S and G2 cell cycle, whereas NHEJ, although inaccurate, is the prevailing repair pathway during G1 and M phase (Weterings, 2008). There are also many additional proteins that are heavily involved in this pathway in order for it to work.
Here, we are able to find a hit in the whale shark genome that share the same domain as our protein of interest, but we are not entirely confident to conclude that the protein we picked is the ortholog in whale shark.
Methods
Finding Whale shark predicted orthologs:The human protein sequence (ENSP00000349313) was used as the query in a BLAST against the predicted whale shark protein database using the Galaxy server by Georgia Aquarium. The full sequence of the best matching predicted protein, determined by the lowest e-value and alignment length, was obtained. The sequence was then used as the query in protein BLASTs against the NCBI human protein database to confirm existence of possible orthologs by showing a positive reciprocal match. Using the bootstrap method, we were able to further confirm our best hit.
Predicted orthologs in other species:
NHEJ1 (ENSP00000349313) predicted orthologs were identified in other species by using the NCBI Blast server. The human NHEJ1 protein sequence was used as query in these searches. Protein BLASTs were performed using the human protein sequence against single species protein databases for mouse, zebrafish, rat, whale shark, and rabbit species on the NCBI database with default settings.
Phylogenetic tree:
The whale shark predicted protein sequence and the top hit with the lowest e-value for each non-whale shark species, with the human sequence as the query, were used to create a multiple sequence alignment and phylogenetic tree. The source used to create the multiple sequence alignment and rooted phylogenetic tree was ClustalW2. None of the settings were altered.
Searching for NHEJ1 in whale sharks
The protein sequence for NHEJ1 in humans was used as the query against the whale shark predicted protein database. The results of the top four hits in the BLAST are shown in Table 1 below. These 4 best hits had the lowest E-vlaues with the lowest E-value being 3e-17. Because there was a significant difference between the best hit and the other three in terms of e-value, only the top hit was Blasted against the human protein database using NCBI BLASTp.The top predicted protein (ID g21654.t1) from the Blast of the whale shark proteins using the human NHEJ1 sequence as query returned NHEJ1 as the best hit against the human protein database. The hit had an E-value of 6e-12 and a an identity of 43.2%. With this result we were not confident that the best predicted whale shark protein and the human NHEJ1 are orthologs. We believe that the two proteins share the same protein domain.
Protein Domains
The predicted NHEJ1 protein domain was consistently conserved across all five organisms (Human, Whale Shark, House Mouse, Brown Rat, and Zebrafish). The working mechanism of XLF has eluded researchers and has remain enigmatic ever since. However it has been found that XLF super family works with XRCC4 to provide supporting structure for reparation of double strand break without DNA ends. (Mahaney, 2013).[Homo Sapian]-Human
[Rhicodon Typus]-Whale Shark
[Mus Musculus]-House Mouse
[Rattus norvegicus]-Brown Rat
[Danio rerio]-Zebrafish
Figure 1.Domains best hit predicted proteins for all the species compared. All the best hits contain XLF domains as predicted by NCBI BLAST server analyses
mouse MEELEQDLLLQPWAWLQLAENSLLAKVSITKHGYALLISDLQQVWHEQVDTSVVSQRAKE
rat MEELEQGLLMQPWAWLQLAENSLLAKASITKHGYALLISDLQQVWHEQVDTLEVSQRAKE
human MEELEQGLLMQPWAWLQLAENSLLAKVFITKQGYALLVSDLQQVWHEQVDTSVVSQRAKE
rabbit MEELEQGLLMQPWAWLQLAENSLMAKAYITKQGYALLVSDLQQVWHEQVDTNVVIQRAKE
elephantshark ------------------------------------MLSDLDSVWWEEMTSDNIRQRSQE
g21654.t1 ------------------------------------------------QMLLNVILFPQE
zebrafish ---MEAVLSALPWVPVNISGSDLLAKAWFGDSQYRVLLTDLSTVWEEEMSTDDIQSRAQD
: .::
mouse LNKRLTAPPAALLCHLDEALRPLFKD-----SAHPSKATFSCDRGEEGLILRVQSELSGL
rat LNKRLTAPPAAFLHHLDEVLRPLFKDSAHQDAAHPSKATFSCDRGEEVLILRVRSELSGL
human LNKRLTAPPAAFLCHLDNLLRPLLKD-----AAHPSEATFSCDCVADALILRVRSELSGL
rabbit LNKRLTAPPAALLCHLENLLHPLLKD-----AAHPSEATFSSDRAAEALILRVRSELSGL
elephantshark LNKRLKAPVPAFFRHLRDVMEPMLSG-----TGRERLSGFTSRRLHNQLHIQVRSELSGV
g21654.t1 LNKRLKAPVSAFCRYLREAVGPLLRG-----GGRDPLPGFVCERAQNLLSISLRSELSGV
zebrafish LNKRLRAPAQAFFSHLCSVARPCFSG---LDEDQISAAQAALEQHGESLTVKLKSELAGL
* *: :* . * : . : . : * : ::*:*:
mouse PFSWHFHCIPASSSLVSQHLIHPLMGVSLALQSHVRELAALLRMKDLEIQAYQESGAVLS
rat PFNWHFHCLPASSLLVSQHLICPLMGVSLALHSHVRELAALLRMKDLEIQAYQESGAVLS
human PFYWNFHCMLASPSLVSQHLIRPLMGMSLALQCQVRELATLLHMKDLEIQDYQESGATLI
rabbit PFCWHFHCTPASPFLVSQHLIRPLMGMSLALQNQVRELATLLRMKDLEIQEYRESGATLS
elephantshark PFYWAFHCSEAPISMVTRHMVCPLLGMVQALQRQSRDLVLLIARKDAEILDYRENGAALT
g21654.t1 PFYWTFRCTEAPMSM---------------------------------------------
zebrafish PFYWEFRCTTTPVAVVCRQLVRPLLAMTLVLQRQAEDLAALLARKDAEIQDYQENGAVLS
** * *:* :. :
mouse RSRLKTEPFEENSFLEQFMAEKLPEACAVGDGKPFAMSLQSLYVAVTKQQIQARQAHKDS
rat RGRLKTEPFEENSFLEQFMVEKLPEACAVGDGRPFAMNLQSLYVAVTKQQVQARQKHKGS
human RDRLKTEPFEENSFLEQFMIEKLPEACSIGDGKPFVMNLQDLYMAVTTQEVQVGQKHQGA
rabbit RGRLKTEPFEENSFLEQFMVEKLPEACSISDGRPFVLNLQSLYMAVTKQEVQVGQKHQGT
elephantshark RDRLETEIFDEGKFKERFLAEGLPEPVTVEDAGVFSSELQQLYTAVTATEANQRAALGNT
g21654.t1 ------------------------------------------------------------
zebrafish RARLQTEPFEVHQYKENFITQILPQMNVTLDSLGFDSELQALYMAVNSGKTGRKRKHSPD
mouse GETQASSSTSP-------RGTD----NQPEEP----VSLSSTLSEPEYEPVAASGPMHRA
rat GEPQTSSSTSP-------QGTDSQLQNQPEQQ----ISPTPTLSEPECEPMAASGPVHRA
human GDPHTSNSASL-------QGIDSQCVNQPEQL----VSSAPTLSAPEKESTGTSGPLQRP
rabbit GDPQTSSSASP-------QRTDSQLVVQPEQP----AFSALAPSGPEKEPVGISASLQRP
elephantshark GDGKTGHSELRGQLPAETREETEPVTSTPGEN----SEIKPKAPSKRPASGAGVSPRTES
g21654.t1 ------------------------------------------------------------
zebrafish SSPAAQENHITDHQHISESTDVGPSLASQEHNNAKESGRSQVANSQQTLPLSSTAGSEDR
mouse RLVKSKRKKPRGLFS
rat QLVKAKRKKPRGLFS
human QLSKVKRKKPRGLFS
rabbit QLSKVKRKKPRGLFS
elephantshark PVAKQRKRKGRGIFG
g21654.t1 ---------------
zebrafish STSRAKKKKAVGLFR
Figure 2. The multiple alignment results across different species using clustalw is reported above. The XLF super family protein domain is highlighted in red.
Orthologs in Other Species
The human NHEJ1 protein sequence (ENSP00000349313) was used as query in NCBI BLAST searches against individual species' protein databases. XLF superfamily domain orthologs were found using this method in mice, rats, rabbits, zebra fish and elephant sharks.
Table 2.The best hits found when using the human NHEJ1 sequence against the various species. The names, ID, length and E-value are all listed above. All E-values are extremely significant.
Phylogenetic Tree
The best hits from the BLASTs of the human NHEJ1 protein as query against other species are shown below in the phylogenetic tree (Figure 1). Since elephant shark is closely related to whale shark, we are not surprised to see that those two are very close on the tree. From this tree, one can also see that the whale shark predicted protein is grouped with the human protein sequence quite closely, compared to zebrafish. This suggests there is close common ancestry in the protein sequence of NHEJ1 between humans and whale sharks.
Figure 3. Phylogenetic tree of NHEJ1 best hits across various species. The ClustalW2 program was used to create a phylogenetic tree from the best hit protein sequences.Conclusion
We did find orthologs for NHEJ1 in other species and blasted against the whale shark genome. Judging from those results, we found g21654.t1 to be a good result. Although we were not able to confidently identify a predicted NHEJ1 ortholog in whale sharks because of the length difference, we found evidence that there is a similarity in the protein domains, which is the XLF super family. We speculate that since NHEJ1 in human and other species share a common protein domain with our whale shark hit, our hit has similar function as the NHEJ1 protein. Additionally, the extent of the common ancestry seen in the phylogenetic tree further supports this claim.
References
Hefferin, Melissa L., and Alan E. Tomkinson. "Mechanism of DNA double-strand break repair by non-homologous end joining." DNA repair 4.6 (2005): 639-648.
Weterings, Eric, and David J. Chen. "The endless tale of non-homologous end-joining." Cell research 18.1 (2008): 114-124.
Xing, Mengtan, et al. "Interactome analysis identifies a new paralogue of XRCC4 in non-homologous end joining DNA repair pathway." Nature communications 6 (2015).
"Non-homologous End-joining Factor 1." NHEJ1. UniProt, n.d. Web. 06 Apr. 2015.
Mahaney, Brandi L., et al. "XRCC4 and XLF form long helical protein filaments suitable for DNA end protection and alignment to facilitate DNA double strand break repair 1." Biochemistry and Cell Biology 91.1 (2013): 31-41.
"Basic Local Alignment Search Tool." BLAST:. N.p., n.d. Web. 14 Apr. 2015. <__http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastHome__>.
"Galaxy / Whale Shark." Galaxy / Whale Shark. N.p., n.d. Web. 14 Apr. 2015. <__http://whaleshark.georgiaaquarium.org/__>.