CD74 (ENSP00000430614)


This Project
This web page originated as an assignment in Emory University's Biology 142 lab course. Students were assigned proteins of interest and asked to research what is known about the protein and to examine whether the newly sequenced whale shark genome had evidence of an orthologous protein.


Background
CD74 is a protein encoded by a gene of “class II invariant chain” and is a part of the major histocompatibility complex. This gene family is known to control the immune system in all vertebrates (Kudo, J). This protein has multiple functions such as the regulation of the immune system, serving as a cell surface receptor and an inflammatory inducer. CD74 regulates antigens in the immune response system and serves as a cell surface receptor for macrophage migration inhibitory factor (MIF) which initiates survival pathways as well as cell growth and duplication (CD74 Gene). When MIF is bound to CD74, MAPK proteins are activated, which also aid in anti-inflammatory responses (Pearson, G). CD74 is also important in the facilitation of MHC class II’s products exports from the Endoplasmic reticulum to the vesicles (CD74 Gene). A defect in gene CD74 can cause Thyroid cancer and Lymphoma as well as Glioblastomas, a malignant brain tumor (Piette, Caroline). It is located in the Golgi apparatus and cytoskeleton.


Figure 1. Inhibition of the binding of CD74 to MIF


external image q8cVSZphrIPRFgrEdM8TXELmvaBZFF53cBoTtvsQPKByld5lvrMnx4GjjDkjCKwGkOoY6edpqVCF0fLlkTEE3I81OlyzSpggc4lWRG_PXilawTdG6Btp2UNjUtRL5TGVjO7PVa0
Figure 1. In this image, a small-molecule MIF inhibitor is preventing MIF and CD74 from binding, which in turn affects MAPK which aids in anti-inflammatory responses.


Methods:
Finding whale shark predicted orthologs:
In order to find whale shark predicted orthologues, the human protein sequence (ENSP00000430614) was obtained from Ensembl and then was used as query in a Blast against the predicted whale shark protein database using the Galaxy server: whaleshark.georgiaaquarium.org . From the data collected from Galaxy, the top 4 predicted protein hits were chosen according to the lowest and most significant e-values. The top 4 predicted hits were then used as queries (using the full predicted sequence) in protein BLASTs against the NCBI human protein database. Orthologs were found if the predicted protein returned CD74 for homo sapiens as the top hit. However, none of the hits returned an ortholog, therefore, the boot strapping method was used to Blast the CD74 human protein sequence against the elephant shark genome on NCBI Blast. The top hit’s protein sequence was used as a query in a Blast against the predicted whale shark protein database using the Galaxy server. From the data retrieved, the top hits were compared to the top hits found originally from the CD74 human protein sequence Blast against the whale shark genome. If the any of the hits matched up, it can be implied that a potential ortholog was found; if not, then an ortholog has not been found.


Predicted orthologs
STX11 predicted orthologs were identified in another species other than whale sharks using the NCBI Blast server.
Protein BLASTs were performed using single species protein databases for mouse, dog, fruit fly,yeast, and elephant shark protein databases to search for other potential orthologues. The human STX11 protein (ENSP00000356540) was used as query sequence in these searches with default settings.


Phylogenetic tree
To construct the phylogenetic tree, the hit with the lowest E-value for each non-whale shark species search (using the human protein as query) were used to create a multiple sequence alignment and phylogenetic tree. ClustalW2 with default settings was used to create the alignment and tree.


Searching for CD74 in the Whale Shark
The human CD74 protein sequence was used to query the whale shark predicted protein database and results are shown in Table 2. There were 4 hits with the smallest and most significant being 3e-12 with the next smallest E-value being 1e-11 . These 4 best hits were then Blasted against the human protein database using NCBI BLASTp. None of 4 genes, returned the human CD74 protein as it’s best hit. This indicates that there are no orthologues in the whale shark genome for the human CD74 protein.


Table 1: Significant Predicted Protein Hits in the Whale Shark Genome
Gene ID
E-Value
Alignment length
% Identical
Protein length
g44865.t1
3e-12
64
46.88
367
g47927.t1
1e-11
46
47.83
256
g47927.t1
6e-10
27
59.26
256
g29064.t1
4e-7
45
33.33
89
g22775.t1
8e-05
47
27.66
134
Table 1. This table indicates the best Human CD74 BLASTp hits against the whale shark predicted protein database. The Galaxy server was used to query the predicted whale shark protein database using the human CD74 protein sequence. The top 4 hits according to the lowest and most significant E-values are reported here with their database ID and amino acid length. These sequences were also used as queries against the NCBI human protein database. There were no hits with the human CD74 protein sequence returned when each protein sequence for the gene ID was blasted against the human protein database.


Figure 2. Top hit in Blast search for CD 74 human protein sequence against elephant shark genome
external image Mn4sBF8JhZbrbJ6MmAVT-Swnl5p7jWJKWNRcsHtWovSvGN6l-7R-cLu4fVnPiQbTw6XyqlIsBj3dxolRQvDbh8Gm6KXv-Okil1-qShZytCgzG-kcaAQ22jP-59SugsNfzc1vNYo
Figure 2. Top hit for CD74 protein human sequence against elephant shark sequence. This figure indicates that the CD74 human protein was returned as a top hit with few gaps in the protein sequences.


Because there were no orthologs found when the top protein hits from the whale shark genome were used as queries against the human protein database, the bootstrapping method was used to blast the human protein CD74 sequence against the NCBI Blast elephant shark genome. From this data, the CD74 protein was returned as the top hit. The top figure shows the similarity between the protein sequences for the elephant shark CD74 protein sequence and the human CD74 protein sequence. This indicated that the elephant shark CD74 protein may be a potential ortholog and perhaps the whale shark CD 74 protein may also be a potential ortholog for the human CD74 protein since the elephant shark and whale shark species share some homology. However, the top hit’s protein sequence from the results for the elephant shark genome was blasted against the whale shark predicted protein genome in the Galaxy server. There were no matching hits returned between the whale shark and elephant shark top hits as the e-values were different for both species. This indicated that the CD74 protein in the whale shark genome was not an ortholog of the elephant shark CD74 protein which means that the human CD74 protein was definitely not an ortholog of the whale shark CD74 protein.



Protein Domain:
MHC class II-associated invariant chain is a superfamily associated with CD74. It is responsible for regulating the presents of antigen in antigen pathways. It does it by assembling MHC which stabilizes and secretes pepride-free class II apha/beat herodomedimers which is taken to from rough ER into compartments where peptides is being loaded. Another domain that associated with CD74 is Thyroglobulin type 1. It is a part of MEROPS proteinase in inhibitor family and found in large glycoprotein which is located in Thyroid gland. In thyroid gland it serves as precursor to the thyroid hormones such as thyroxine and triiodothyronin (Interpro). This domain is present in many different proteins such as: human pancreatic carcinoma maker proteins, nidogen, insulin growth factor and saxiphilin. What is interesting about this domain is that it has been found in both human CD74 an in whale shark despite the fact that it is not an ortholog.


Figure 3. CD74 Putative conserved protein domains
Screen Shot 2015-04-11 at 7.23.33 PM.png
Screen Shot 2015-04-11 at 7.23.33 PM.png

Figure 3. These domains were identified for the CD74 protein (ENSP00000430614). The top 4 hits from Table 1 all had relation to MHC or Thyroglobulin superfamilies.


Orthologues:
The Human CD74 protein sequence (ENSP00000430624) was used as query in NCBI BLAST searches against other individual species’ protein databases. The orthologues were found by this method, then by reciprocating them back against the human database, to ensure their accuracy.

Table 2. Blast Results for Top hits for CD 74 human protein sequence in Other Species
Animal Name
Query Coverage
E-value
Protein length
Accession #
Ortholog?
Mouse
95%
0.0
296
NP_001020330.1
Yes
Dog
90%
1e-171
272
XP_536468.5
Yes
Fruitfly
17%
2e-10
629
NP_001163695.1
No
Yeast
9%
0.15
123
NP_011289.1
No
Table 2. The best hits for the human RIPK2 protein BLAST. Name of the animal, query coverage, E value, % identical, protein length, accession # and confirmation of ortholog is shown here.


Phylogeny:
The best hits from our query search were used to create a phylogenetic tree. From this tree one can tell that many of our whale shark genes are further away as a homolog from the human gene than are other animals, such as the chimp or dog. One of our gene IDs, g22775.t1, was unable to be included in our phylogeny tree because the ClustalW program told us that it was an “empty gene sequence”. The same happened with the yeast protein sequence. Gene ID g47927.t1 is a repeat sequence with two different E values so the one shown is the gene with the more significant E value (1e-11).

external image nsmDgnqWoHGDFtmFIPz3CKnVkz9-_EIty7y-5NZzV-ynbNPLrtKYt-zTkbeVG_OXBw2Uw2RW4hkAcWtTj3mqDwgERYS7SIQTx2I_wgjpNK_hnoA4VyQC5IqNspBV2247UBc035I
Figure 4 (left): Phylogenetic tree of CD74 best hits. The best 4 hits from the BLAST searches were used in the ClustalW2 program to create the phylogenetic tree. Branch lengths represent relative evolutionary time while the closest names on the figure share common homology.


Conclusion:
We identified two orthologs in our research of CD74 in the Mouse and Dog. Our results confirm prior research showing that this gene codes for the regulation of the immune system in all invertebrates. This is made apparent in our Query Coverage in Table 2. Out of the organisms we researched, Yeast and Fruit Flies were the only ones to not show high percentage matches with the CD74 DNA sequence, demonstrating that invertebrates have a greater benefit in producing this protein than other organisms. As discussed earlier, Thyroid cancer and Lymphoma can be a result of a defect in this gene; thus, further research should be conducted in order to determine what possible preventative measures could be taken in order to decrease the number of defects that occur in this gene, or that could help correct a mutation in this gene after the fact.




References:
Alinari L, Yu B, Christian BA, Yan F, Shin J, Lapalombella R, Hertlein E, Lustberg ME,
Quinion, C, Zhang X, Lozanski G, Muthusamy N, Prætorius-Ibba M, O'Connor OA, Goldenberg DM, Byrd JC, Blum KA, Baiocchi RA.

Blood. 2011 Apr 28;117(17):4530-41. doi: 10.1182/blood-2010-08-303354. Epub 2011 Jan 12

Beswick, Ellen J, and Victor E Reyes. “CD74 in Antigen Presentation, Inflammation, and
Cancers of the Gastrointestinal Tract.” World Journal of Gastroenterology : WJG 15.23 (2009): 2855–2861. PMC. Web. 12 Apr. 2015.

"CD74 Gene." - GeneCards. N.p., n.d. Web. 14 Apr. 2015.

Claesson, L., D. Larhammar, L. Rask, and P. A. Peterson. "CDNA Clone for the Human Invariant Gamma Chain of Class II Histocompatibility Antigens and Its Implications for the Protein Structure." Proceedings of the National Academy of Sciences of the United States of America. U.S. National Library of Medicine, n.d. Web. 14 Apr. 2015.

Christian BA, Poi M, Jones JA, Porcu P, Maddocks K, Flynn JM, Benson DM Jr, Phelps
MA, Wei L, Byrd JC, Wegener WA, Goldenberg DM, Baiocchi RA, Blum KA.
Br J Haematol. 2015 Apr 7. doi: 10.1111/bjh.13354. [Epub ahead of print]

Kudo, J., L. Y. Chao, F. Narni, and G. F. Saunders. "Structure of the Human Gene Encoding the Invariant Gamma-chain of Class II Histocompatibility Antigens." Nucleic Acids Research. U.S. National Library of Medicine, n.d. Web. 14 Apr. 2015.
Nature.com. Nature Publishing Group, n.d. Web. 14 Apr. 2015. <http://www.nature.com/nrd/journal/v5/n5/fig_tab/nrd2029_F5.html>.

Pearson, G. "Result Filters." National Center for Biotechnology Information. U.S. National Library of Medicine, n.d. Web. 14 Apr. 2015.

Piette, Caroline, Manuel Deprez, Thierry Roger, Agnès Noël, Jean-Michel Foidart, and Carine Munaut. "The Dexamethasone-induced Inhibition of Proliferation, Migration, and Invasion in Glioma Cell Lines Is Antagonized by Macrophage Migration Inhibitory Factor (MIF) and Can Be Enhanced by Specific MIF Inhibitors." The Journal of Biological Chemistry. American Society for Biochemistry and Molecular Biology, n.d. Web. 14 Apr. 2015.











CD74 (ENSP00000430614)