Introduction

The automatic annotation of genes only on the basis of sequence homology by BLAST is not sufficient to predict the functions accurately, since this approach can lead to false positive identifications. Researchers wish to understand the exact biological processes and metabolic networks in which a protein acts and this has led to the development of a rigorous approach called metabolic reconstruction. Metabolic reconstruction is the process of association of genes to specific functions in metabolic networks and discrimination between usually existing multiple possibilities. This involves assigning the annotated genes to the metabolic pathways in which they act. This approach can indicate false positives as these genes will either be isolated in branches of pathways which are absent in a particular organism or will be present in pathways not relevant to the organism having no other pathway specific enzymes being present in the organism.  This method will also lead to the prediction of enzymes that do not appear to have a corresponding gene model, but need to be present to complete an otherwise complete metabolic pathway [1].

 

Figure 1 - Steps involved in the annotation of functions from the genome sequence. The highlighted block shows the steps which are involved in the development of this resource.

 

The metabolic reconstruction resources presently available for Apicomplexa include Kyoto Encyclopaedia of Genes and Genomes (KEGG),  ApiCyc, metaTIGER and Malaria Parasite Metabolic Pathways (MPMP) for Plasmodium falciparum. Many of the pathways annotated as specific to Plasmodium in KEGG only have one or two enzymes annotated in Plasmodium, which may be false positives or have non-specific enzymes which participate in other pathways. In addition to this, the annotated functional paralogs of enzyme coding genes are missing in KEGG. PlasmoCyc, a derivative of MetaCyc, contains experimentally verified enzyme information from the scientific literature. However, there are a number of problems with the resource, as follows. The number of polypeptides in PlasmoCyc is much larger than in PlasmoDB, the Plasmodium database of EupathDB group of databases and it contains 320 pathway holes in the 143 pathways annotated [2]. These include the mevalonate pathway which is experimentally proven to be absent in Plasmodium [2, 3].

 

The automatic annotation of apicomplexans is challenging as the genomes of these parasites shows higher divergence to the well established model organisms. An additional difficulty is presented by the higher AT content in the Plasmodium genome [4]. The reduction of metabolic pathways in the parasite and huge holes in pathways which are present is attributed to the lineage-specific adaptation that occurred through gene loss during evolution. This has led to increased dependence on the host for metabolic requirements [2]. Nutrients and metabolites provided by the host to Plasmodium includes various amino acids, riboflavin, nicotinamide, biotin, pantothenic acid and thiamine [4, 5].

 

MPMP  is a manually curated database of Plasmodium specific metabolic pathways which was constructed on the basis of having at least three to four enzymes acting consecutively in a pathway and having been annotated in the genome [2, 6]. MPMP captures gene annotations from GeneDB and PlasmoDB and so each enzyme have links to the annotated genes in GeneDB and PlasmoDB and also to enzyme databases such as Expasy and BRENDA. MPMP is considered to be a gold standard of metabolic reconstruction for P. falciparum [7] as it possesses compact organism specific pathways which is supported by logic and biochemical evidence. As these pathways are specific to Plasmodium, these pathways does not completely reflect other groups in the Apicomplexa and so there is no general model for apicomplexan metabolism as a whole. There is higher variation in metabolic capabilities within apicomplexans, Toxoplasma having less dependence and Theileria having more dependence than Plasmodium on the host for nutrients and metabolites [7]. This is mainly because of the different environmental niches they occupy and variations in stress they undergo.

 

Toxoplasma gondii is the widely studied model apicomplexan in experimental biology. This parasite is the cause of Toxoplasmosis in almost every warm-blooded animal and can cause death in immuno-compromised humans. Toxoplasmosis can pass from infected mother to new born through placenta and can cause damage in brain and eye development. This condition is termed 'congenital toxoplasmosis'. The infections of Theileria/Babesia species are responsible for large numbers of deaths in livestock and causes huge economic loss. Cryptosporidium species, pathogens of intestinal tracts  causes diarrhoea-like infections and can be fatal in immuno-compromised individuals. The metabolic pathway resources available for other apicomplexans include KEGG (for T. gondii, C. parvum, C. hominis, B. bovis, T. parva and T. annulata), metaTIGER (for T. gondii, N. caninum, C. parvum, C. hominis, T. parva and T. annulata) and ApiCyc (for T. gondii, C. parvum and C. hominis). The problems of automatic reconstruction detailed above also applies to T. gondii and other Apicomplexa as well. ToxoDB, the EupathDB database of T. gondii, Neospoa caninum and Eimeria tenella possess a variant of KEGG metabolic pathways. Some of the metabolic pathways included here in these resources are non-specific to the organism as is the case with P. falciparum above. Some of the pathways are annotated only on the basis of having one or two enzyme annotations which may be non-specific, false positive idenfication or may have a role in host biology. Some of the enzymes in T. gondii genome which function in pathways have not been incorporated into metabolic pathways. CryptoDB, the EuPathDB database of Cryptosporidium species and PiroplasmaDB, the database of Theileria species and Babesia bovis does not have the genes actively linked to any metabolic pathways. However, 'metabolic pathways' section of CryptoDB recommends ApiCyc as metabolic pathway resource.

 

This LAMP web database provides compact organism specific metabolic pathways for the apicomplexan genomes in EuPathDB other than Plasmodium species. The core metabolic pathways are now available for  T. gondii, N. caninum, C. muris, C. parvum, C. hominis, B. bovis, T. parva and T. annulata genomes. We are expecting to link out the metabolic reconstructions of this website from respective gene pages in EuPathDB.