Research Article - (2022) Volume 13, Issue 1

Predictive Functionality of Bacteria in Naturally Fermented Milk Products of India Using PICRUSt2 and Piphillin Pipelines
H. Nakibapher Jones Shangpliang and Jyoti Prakash Tamang*
 
Department of Microbiology, School of Life Sciences, Sikkim University, Gangtok 737102, Sikkim, India
 
*Correspondence: Jyoti Prakash Tamang, Department of Microbiology, School of Life Sciences, Sikkim University, Gangtok 737102, Sikkim, India, Email:

Received: 03-Jan-2022, Manuscript No. JDMGP-22-15356; Editor assigned: 05-Jan-2022, Pre QC No. JDMGP-22-15356; Reviewed: 17-Jan-2022, QC No. JDMGP-22-15356; Revised: 20-Jan-2022, Manuscript No. JDMGP-22-15356; Published: 24-Jan-2022, DOI: 10.4172/2153-0602.22.13.244

Abstract

Naturally Fermented Milk (NFM) products are popular food delicacies in Indian states of Sikkim and Arunachal Pradesh. Bacterial communities in these NFM products of India were previously analysed by high-throughput sequence method. However, predictive gene functionality of NFM products of India has not been studied. In this study, raw sequences of NFM products of Sikkim and Arunachal Pradesh were accessed from MG-RAST/NCBI database server. PICRUSt2 and Piphillin tools were applied to study microbial functional gene prediction. MUSiCC- normalized KOs and mapped KEGG pathways from both PICRUSt2 and Piphillin resulted in higher percentage of the former in comparison to the latter. Though, functional features were compared from both the pipelines, however, there were significant differences between the predictions. Therefore, a consolidated presentation of both the algorithms presented an overall outlook into the predictive functional profiles associated with the microbiota of the NFM products of India.

Keywords

Metagenome gene prediction; PICRUSt2; Piphillin; Naturally fermented milk products; Lactic acid bacteria

Introduction

Naturally Fermented Milk (NFM) products are popular food items in daily diets of ethnic people of Arunachal Pradesh and Sikkim in India, which include dahi, mohi, gheu, soft-chhurpi, hard-chhurpi, dudh- chhurpi, chhu, somar, maa, philu, shyow, mar, chhurpi/churapi, churkam and churtang/chhurpupu [1,2]. Previously, taxonomic analysis using High-Throughput Sequencing (HTS) of NFM products of Arunachal Pradesh and Sikkim viz. chhurpi, churkam mar/gheu and dahi, have been studied [3]. We have recorded the abundance of phylum Firmicutes with predominated species of Lactic Acid Bacteria (LAB) viz. Lactococcus lactis (19.7%) and Lactobacillus helveticus (9.6%) and Leuconostoc mesenteroides (4.5%) and Acetic Acid Bacteria (AAB): Acetobacter lovaniensis (5.8%), Acetobacter pasteurianus (5.7%), Gluconobacter oxydans (5.3%), and Acetobacter syzygii (4.8%) [3]. Application of shotgun metagenomics is one of the commonly used methods for understanding the microbial- associated gene functional characteristics [4]. However, alternately functional profiles of a microbial community can also be inferred indirectly by marker-gene surveys such as 16S rRNA gene [5,6]. Bioinformatics pipelines such as Phylogenetic Investigation of Communities by Reconstruction of Unobserved States version2 (PICRUSt2) [7] and Piphillin [8] among others are some of the well-known tools for microbial predictive functionality studies from various NGS-related metagenomic data [5,6]. These pipelines have also been applied in fermented milk products to infer the functional gene predictions [9-11]. Microbiota present in NFM products harbour probiotic properties and impart several health-promoting benefits to consumers [12,13,14]. Predictive gene functionality in NFM products of India has not been analysed yet. Hence, the present study is aimed to predict the microbial functional contents of 16S rRNA gene sequencing data of NFM products of India, previously analysed by high-throughput sequencing method [3], using PICRUSt2 and Piphillin pipelines.

Material and Methods

Pre-analysis prior to predictive functionality analysis

Raw sequences of NFM products of Arunachal Pradesh and Sikkim in India analysed by HTS method [3] were accessed from MG-RAST/NCBI database server and were used in this study. Raw reads were processed using QIIME2-2020.6 (https://docs.qiime2.org/2020.6/) [15]. After importing into QIIME2 environment, Q-score based filtering and denoising was performed using Divisive Amplicon Denoising Algorithm (DADA2) [16] via qiime dada2 denoise-paired plugin. Quality-filtered sequences were then clustered against SILVA v132 [17] database and followed by taxonomic assignment using q2-vsearch-cluster-features-closed- reference [18]. Taxonomic assigned of the ASVs was acheived using SILVA v132 [17] database and relative abundance of the predominant genera was calculated (Table 1).

Genus Region Relative percentage of abundance (%)
Lactococcus Lactobacillus Acetobacter Leuconostoc Pseudomonas Staphylococcus Gluconobacter Bacillus Acinetobacter Enterococcus Others (<1%)
Chhurpi Sikkim 30.54 19.18 15.9 12.7 6.04 4.06 2.45 1.66 0.71 1.28 5.48
Gheu Sikkim 28.92 4.43 4.51 4 49.81 0.45 1.13 0.24 2.15 0.21 4.15
Dahi Sikkim 42.29 13.49 9.32 17.68 9.57 0.86 2.56 0.55 1.59 0.19 1.9
Chhurpi Arunachal Pradesh 51.06 15.8 10.87 5.1 3.88 4.43 2.27 2.26 0.77 0.9 2.66
Churkam Arunachal Pradesh 38.98 14.81 11.95 5.05 3.46 13.16 2.36 3.16 1.29 1.38 4.4
Mar Arunachal Pradesh 35.22 14.12 19.22 8.65 4.56 4.65 7.11 2.56 0.82 0.98 2.11
Average (%) 37.84 13.64 11.96 8.86 12.89 4.6 2.98 1.74 1.22 0.82 3.45

Table 1: Bacterial genera identified from different NFM products of Sikkim and Arunachal Pradesh.

Predictive functionality analysis

PICRUSt2 analysis (https://github.com/picrust/picrust2/wiki): PICRUSt2 is a pipeline for predicting functional abundances based only on marker gene sequences, 16S rRNA gene [7]. Here, the quality-filtered sequences were fed into PICRUSt2 algorithm with the default parameters. The representative sequences were first clustered in QIIME2 against SILVA v132 database [17]. Functional prediction in PICRUSt2 involves three main steps- phylogenetic placement of reads, hidden state prediction, pathway inference. Firstly, for phylogenetic placement of reads, multiple assignment of the Exact Sequence Variants (ESVs) was performed using HMMER (http://www.hmmer.org/); and placements of the ESVs in the reference tree was performed using Evolutionary Placement-ng (EPA-ng) [19] and Genesis Applications for Phylogenetic Placement Analyses (GAPPA) [20]. Secondly, hidden state prediction of the gene families was run by castor R package [21] with the default “maximum parsimony” algorithm. Lastly, for pathway inference, a modified version of MinPath packaged within PICRUSt2 is used and metagenome prediction was achieved using the default “metagenome_pipeline.py” script [22]. The output features were then mapped against KEGG (Kyoto Encyclopaedia of Genes and Genomes) database for systematic analysis of gene functions [23]. Furthermore, the KEGG pathway information was then collapsed into three different levels- Category (Level-1), Super Pathway (Level-2) and Pathways (Level-3).

Piphillin analysis (https://piphillin.secondgenome.com/): Alternatively, functional prediction was also performed using Piphillin software [8]. Piphillin is a straightforward independent algorithm which predicts gene functionality from the structural 16S rRNA gene without the use of any proposed phylogenetic tree unlike PICRUSt or PICRUSt2 [24]. Most importantly, Piphillin is a web-based analysis software which is simplified, user-friendly and has been shown to have better accuracy in predicting genome function from 16S rRNA gene content [8]. Piphillin uses KEGG and BioCyc database as reference databases. Here, gene copy numbers within each genome were retrieved and formatted by KO. Inference of gene function or metagenomic content was achieved by simply matching each representative of OTU/ASVs directly to the nearest sequenced genome without placing the sequence on the phylogenetic tree [24]. The representative OTU abundance table is then transformed into genome abundance table by using USEARCH with global alignment (Edgar 2010) and the resulting closest matched genome to the 16S rRNA gene copy of each representative OTUs/ASVs above identity threshold is considered as the inferred genome for each OTU/ASV (Iwai et al. 2016). In Piphillin analysis, DADA2-clustered representative sequences (.fasta) and ASV abundance frequency table (.csv) were required to upload to the Piphillin server (https://piphillin.secondgenome.com/) via a web-browser. Inferred metagenomic content output was then collapsed into three different levels- Category (Level-1), Super Pathway (Level-2) and Pathways (Level-3).

Statistical analysis and data visualization

Unnormalized Kyoto Encyclopaedia of Genes and Genomes (KEGG) ortholog (KO) profiles of PICRUSt2 and Piphillin predictive were normalized using Metagenomic Universal Single- Copy Correction (MUSiCC), a normalization paradigm which combines universal single-copy genes with machine learning tools to correct biases and to obtain accurate biological measure of gene abundance in metagenomic studies [25]. The output features were then mapped to KEGG database for systematic analysis of gene functions [23]. Error-corrected functional abundance table was then used for downstream analysis and rleative abundances (%) was plotted in MS-Excel v365 as stacked bar-plot for both PICRUSt2 and Piphillin predictive outputs. To check the significant differences between the functional content as predicted by both PICRSUt2 and Piphillin, White’s non-parametric with Benjamini- Hochberg FDR (false discovery rate) was applied in STAMP- statistical analysis of taxonomic and functional profiles [26] and visualized as extended error-bar chart with alpha significance of 0.05 (q-value) for all the functional levels – category (level-1), super pathway (level-2) and pathway (level-3). Furthermore, relationship between bacteria (lactic acid bacteria, LAB; acetic acid bacteria, AAB; and non-LAB/AAB) and functionalities were analyzed using non-parametric Spearman’s correlation in Statistical Package for the Social Sciences (SPSS) v20 and the correlation matrix was visualized as heatmap using ClustVis [27]. Significant interaction between bacteria and function are denoted with “*” <0.05 and “**” <0.01.

Results

Microbial predictive gene functionality

A total of 1109 error-corrected ESVs was obtained from DADA2 analysis and about 268 SILVA-clustered sequences were used for the downstream predictive analysis. A total of 5995 MUSiCC- normalized KOs and 181 mapped KEGG pathways was obtained from PICRUSt2 analysis. Similarly, a total of 5245 MUSiCC- normalized KOs and 157 mapped KEGG pathways was obtained from Piphillin analysis. Overall, both PICRUSt2 and Piphillin pipelines showed a similar pattern (Figure 1), except in the metabolism category where the PICRUSt2 was significantly higher in comparison to that predicted by Piphillin pipeline (Figure 2). Additionally, at the super pathway level, PICRUSt2 prediction showed significantly high in amino acid metabolism, metabolism of cofactors and vitamins, energy metabolism, and biosynthesis of other secondary metabolites (Figure 2). On the other hand, predictive super pathways which included carbohydrate metabolism, xenobiotics biodegradation and metabolism, metabolism of other amino acids, lipid metabolism, metabolism of terpenoids and polyketides, glycan biosynthesis and metabolism, and nucleotide metabolism were significantly higher through Piphillin prediction (Figure 2). Significant metabolic-related pathways inferred by both PICRUSt2 and Piphillin tools were compared showing several functional features predicted by these two pipelines (Figure 3).

genomics-proteomics-categorical

Figure 1: An overall categorical representation of the MUSiCC-normalized predictive microbial functions as inferred by (a) PICRUSt2 and (b) Piphillin. Note: (a) PICRUSt2; KEGG Pathways Relative abundance; Image Environmental Information Processing (3.19%); Image Organismal Systems (0.68%); (b) Piphillin; KEGG Pathways Relative abundance; Image Human Diseases (5.01%); Image Genetic Information Processing (0.97%).

genomics-proteomics-predictive

Figure 2: Extended error bar chart representation of the significant predictive functionalities as inferred by both PICRUSt2 and Piphillin. (a) Overall, metabolism is significantly higher in PICRUSt2 analysis as compared to that of Piphillin, however, (b) a shared difference was observed at the super-pathway level. Significance (q-value>0.05) was calculated using White’s non-parametric test with Benjamini-Hochberg FDR (false discovery rate) in STAMP. Note: ImagePiphillin.

genomics-proteomics-significant

Figure 3: An overall comparison of the significant metabolic pathways as inferred by PICRUSt2 and Piphillin depicting a significant number of functional features predicted by these two pipelines. Significance (q-value>0.05) was calculated using White’s non-parametric test with Benjamini-Hochberg FDR (false discovery rate) in STAMP. Note:Image Piphillin.

Non-parametric correlation of bacteria with predictive functionality

Non-parametric Spearman’s correlation analysis resulted in a complex bacterial-functions interaction. Lactococcus showed a significant negative correlation with glycerolipid metabolism and ubiquinone and other terpenoid-quinone biosynthesis. Lactobacillus showed significant negative correlation with tryptophan metabolism, galactose metabolism, and lipoic acid metabolism while it was observed to be positively significantly correlated with sulphur metabolism. On the other hand, valine, leucine and isoleucine degradation, arginine biosynthesis and ubiquinone and other terpenoid-quinone biosynthesis was positively correlated with Leuconostoc, and negatively correlated with galactose metabolism. Furthermore, a significant negative correlation was observed between Acetobacter with pathways-tryptophan metabolism, valine, leucine and isoleucine biosynthesis, and lipoic acid metabolism. Gluconobacter also showed a significant negative correlation with phenylalanine metabolism, pentose and glucuronate interconversions, fructose and mannose metabolism, and nitrogen metabolism. Glycerolipid metabolism and ubiquinone and other terpenoid-quinone biosynthesis showed significant positive correlation with Staphylococcus, which significantly negatively correlated with propanoate metabolism. Pseudomonas showed significant negative correlation with fructose and mannose metabolism and significant positive correlation with tyrosine metabolism, valine, leucine and isoleucine degradation, arginine and proline metabolism, galactose metabolism, ubiquinone and other terpenoid-quinone biosynthesis and glutathione metabolism. Additionally, a significant positive correlation was observed between Acinetobacter with phenylalanine metabolism, streptomycin biosynthesis, ascorbate and aldarate metabolism, propanoate metabolism, nitrogen metabolism, and biosynthesis of ansamycins (Figure 4).

genomics-proteomics-carried

Figure 4: Non-parametric Spearman’s correlation of the ASV-associated predominant bacterial genera of the NFM products with a consolidated functional feature as inferred by both PICRUSt2 and Piphillin. Here, calculation was carried out using Statistical Package for the Social Sciences (SPSS) v20 and heatmap was generated using ClustVis. All significant correlation pairs are denoted by * (*<0.05 and **<0.01). LAB-lactic acid bacteria; AAB-acetic acid bacteria. Note:Image

Discussion

In this study, microbial predictive gene functional analysis from targetted-16S rRNA gene was explored using PICRUSt2 and Piphillin pipelines. Inference of predictive functionality using these two said pipelines showed a high metabolism rate, since most of these products are consortia of many metabolically active microbiota [3]. These findings are similar to recent studies reported from fermented dairy products [9-11]. The association of various metabolic pathways such as amino acid metabolism, carbohydrate metabolism, energy metabolism, lipid metabolism, metabolism of cofactors and vitamins, and other secondary metabolites with the bacterial genera indicated an active interaction of bacteria- function complexity. LAB are predominant microbiota in many ethnic fermented milk products of India followed by few AAB [28-32] Spearman’s correlation of the predominant bacterial genera with the predictive functionality resulted in a complex microbial- functions interaction in NFM products of Sikkim and Arunachal Pradesh. Metabolic activity such as amino acid metabolism is important in dairy products as they contribute to development of flavour [33]. Similarly, carbohydrate metabolism does also play a major role in flavour and aroma development in milk fermentation [34]. The abundance of functional pathways related to metabolism of amino acids, lipid, energy, and carbohydrates were earlier reported in fermented milk and milk products [35,9,36,10]. A high correlation of functional properties and LAB have also been reported in cheeses [37] since LAB are the most predominant microorganisms in fermented milk products [38,10]. We observed a positive correlation of Staphylococcus with the predictive metabolic features of these NFM products, and interestingly, Staphylococcus is metabolically active in dairy products playing functional activities such as amino acid metabolism, carbohydrate metabolism, lipid metabolism and nitrogen metabolism [39]. We also observed the presence of significant correlation of bacteria with cofactors and vitamins metabolism such as ubiquinone and other terpenoid- quinone biosynthesis and lipoic acid metabolism, which are essential for other microbial metabolism [40]. Apart from LAB, AAB have also contributed to many functional features in NFM products; AAB involve in protein metabolism, production of secondary metabolites and volatile compounds [41,42].

Functional profiles from both PICRUSt2 and Piphillin were normalized using MUSiCC [25], which is a marker gene-based method which uses universal single-copy genes for biasness correction of gene abundances [43]. Normalization using MUSiCC have proven necessary for gene functional studies [44], rescaling the abundant predicted KOs to the actual average gene copy number, correcting several known biases [45]. Piphillin is usually applied in human clinical samples (Iwai et al. 2016); whereas PICRUSt2 is widely used for environmental samples [7]. However, these pipelines have also been widely used in dairy products [11].

From our present analysis, PICRUSt2 analysis generated more predicted KOs and KEGG pathways in comparison to that of Piphillin. Though, significant differences were observed, however, there are functions which were predicted only from PICRUSt2 and missing in Piphillin and vice versa. Therefore, consolidated predictive functions from both these pipelines are necessary for a comprehensive outlook into the potential of bacteria associated with NFM products. Though predictive functionality study of the microbiota associated with NFM products at present is only speculations using bioinformatics tools, a general outlook into the potentiality of functions may be studied and compared. Nonetheless, in the absence of shotgun metagenomics data, using PICRUSt2 and Piphillin serves to be the reliable analysis for microbial predictive gene function.

Conclusion

Bacterial community in NFM products showed many functional features with many important health benefits to consumers. We applied PICRUSt2 and Piphillin tools to infer the predictive functional features of microbiota associated with the ethnic fermented milk products of India. The use of bioinformatics tools is cost-effective and straightforward where potential functional features can be mined using only structural gene, in this case, 16S rRNA gene, in the absence of shotgun metagenomics. Therefore, such studies may be used for future comparison with detailed gene functionality studies of other fermented foods elsewhere.

Acknowledgements

The authors are grateful to the Department of Biotechnology, Govt. of India through the DAICENTER project. HNJS is grateful to DBT for Junior Research Fellowship.

Funding

This current research is supported by Department of Biotechnology, Govt. of India through DBT-AIST International Centre for Translational and Environmental Research (DAICENTER) project.

Authors' Contributions

HNJS did analysis and bioinformatics analysis. JPT has supervised the bioinformatics analysis and finalised the manuscript.

Availability of Data and Materials

Raw sequences were accessed from MG-RAST server having the MG-RAST ID number 4732361 to 4732414. The same were accessed from NCBI database server under the BioProject No. PRJNA661385 with accession numbers SAMN16056817 to SAMN16056870.

Declaration of Competing Interest

The authors declare that they have no competing interests.

REFERENCES

Citation: Shangpliang HNJ, Tamang JP (2022) Predictive Functionality of Bacteria in Naturally Fermented Milk Products of India Using PICRUSt2 and Piphillin Pipelines. J Data Mining Genomics Proteomics. 13:244.

Copyright: © 2022 Shangpliang HNJ, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.