16S Metagenomics report
First、Project information
Second、Workflow
1. Sequencing process
16S metagenomics library was constructed using two rounds of PCR reactions. The PCR primers,
forward
(5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-3’)
and reverse
(5’-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGACTACHVGGGTWTCTAAT-3’),
were designed to amplify the V3V4 domain of bacterial 16S ribosomal DNA as well as adding adaptor sequences. Illumina index sequences were subsequently attached into the amplicons in the second PCR.PCR amplification was performed in a 50 μl reaction volume containing 25 μl 2X Phusion High-Fidelity PCR Master Mix (New Enagland Biolabs), 0.2 μM of each forward and reverse primer, and 100 ng DNA template. The reaction conditions consisted of an initial 98°C for 30sec, followed by 30 cycles of 98°C for 10 sec, 55°C for 30 sec, and 72°C for 30 sec, as well as a final extension of 72°C for 5 min. The number of cycle for second PCR was 10.
Next, amplified products were checked by 2 % agarose gel electrophoresis and were purified using the AMPure XP PCR Purification Kit (Agencourt). Quantification was performed by using Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific) on Qubit 2.0 Fluorometer (Thermo Fisher Scientific) and Light cycler 96 (Roche)- all according to respective manufacturer instructions. The final purified libraries were applied for cluster generation and sequencing on the MiSeq using V3 600 cycles kit.
2. Raw data process
3. Data analysis workflow
Third、Analysis results
1. QC results
SampleID | Raw.Reads | QC | QC.rate | Merge | Merge.rate | OTU.cluster | OTU.cluster.rate | Alignment | Alignment.rate | Represent.OTU | Mapped.OTU | Mapped.OTU.rate | Number.of.Tax |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5-2-F10 | 88539 | 74129 | 83.72% | 74057 | 99.90% | 39095 | 52.79% | 38774 | 99.18% | 12 | 11 | 91.67% | 9 |
5-2-F11 | 143479 | 132752 | 92.52% | 132317 | 99.67% | 118977 | 89.92% | 109463 | 92.00% | 67 | 59 | 88.06% | 44 |
5-2-F12 | 140652 | 122229 | 86.90% | 122116 | 99.91% | 73277 | 60.01% | 62604 | 85.43% | 40 | 32 | 80.00% | 27 |
5-2-F13 | 117505 | 102245 | 87.01% | 102057 | 99.82% | 52123 | 51.07% | 48877 | 93.77% | 23 | 20 | 86.96% | 19 |
5-2-F14 | 144270 | 124658 | 86.41% | 124546 | 99.91% | 73044 | 58.65% | 66811 | 91.47% | 24 | 21 | 87.50% | 17 |
5-2-F8 | 105788 | 97708 | 92.36% | 97180 | 99.46% | 83329 | 85.75% | 76551 | 91.87% | 67 | 62 | 92.54% | 51 |
5-2-F9 | 128612 | 110167 | 85.66% | 110033 | 99.88% | 71066 | 64.59% | 71066 | 100.00% | 12 | 12 | 100.00% | 10 |
5-2-M1 | 146199 | 125347 | 85.74% | 125208 | 99.89% | 70072 | 55.96% | 65419 | 93.36% | 24 | 19 | 79.17% | 18 |
5-2-M2 | 96796 | 82039 | 84.75% | 81904 | 99.84% | 41094 | 50.17% | 37850 | 92.11% | 22 | 17 | 77.27% | 15 |
5-2-M3 | 95485 | 71051 | 74.41% | 70918 | 99.81% | 34848 | 49.14% | 32115 | 92.16% | 21 | 19 | 90.48% | 15 |
5-2-M4 | 116560 | 100152 | 85.92% | 100057 | 99.91% | 51865 | 51.84% | 49116 | 94.70% | 22 | 19 | 86.36% | 15 |
5-2-M5 | 108369 | 93771 | 86.53% | 93684 | 99.91% | 51858 | 55.35% | 45594 | 87.92% | 41 | 37 | 90.24% | 30 |
5-2-M6 | 120432 | 102599 | 85.19% | 102411 | 99.82% | 49687 | 48.52% | 45650 | 91.88% | 29 | 25 | 86.21% | 20 |
5-2-M7 | 113999 | 97404 | 85.44% | 97335 | 99.93% | 61089 | 62.76% | 61083 | 99.99% | 19 | 18 | 94.74% | 15 |
5-20-F10 | 127201 | 108755 | 85.50% | 108668 | 99.92% | 57792 | 53.18% | 57191 | 98.96% | 23 | 21 | 91.30% | 20 |
5-20-F11 | 109253 | 93779 | 85.84% | 93698 | 99.91% | 50007 | 53.37% | 47572 | 95.13% | 26 | 22 | 84.62% | 18 |
5-20-F12 | 150373 | 128307 | 85.33% | 128164 | 99.89% | 75561 | 58.96% | 72493 | 95.94% | 33 | 30 | 90.91% | 29 |
5-20-F13 | 151224 | 128241 | 84.80% | 128136 | 99.92% | 71281 | 55.63% | 70059 | 98.29% | 32 | 30 | 93.75% | 27 |
5-20-F14 | 140689 | 119599 | 85.01% | 119479 | 99.90% | 63151 | 52.86% | 58574 | 92.75% | 34 | 30 | 88.24% | 26 |
5-20-F8 | 136456 | 117599 | 86.18% | 117525 | 99.94% | 62035 | 52.78% | 59418 | 95.78% | 31 | 25 | 80.65% | 25 |
5-20-F9 | 114879 | 97230 | 84.64% | 97131 | 99.90% | 48557 | 49.99% | 47724 | 98.28% | 18 | 16 | 88.89% | 14 |
5-20-M1 | 140154 | 119488 | 85.25% | 119396 | 99.92% | 70010 | 58.64% | 67861 | 96.93% | 20 | 16 | 80.00% | 15 |
5-20-M2 | 150905 | 130488 | 86.47% | 130413 | 99.94% | 77993 | 59.80% | 76477 | 98.06% | 21 | 18 | 85.71% | 18 |
5-20-M3 | 130723 | 113331 | 86.70% | 113263 | 99.94% | 59241 | 52.30% | 55331 | 93.40% | 21 | 17 | 80.95% | 15 |
5-20-M4 | 178332 | 153427 | 86.03% | 153324 | 99.93% | 99067 | 64.61% | 86607 | 87.42% | 45 | 41 | 91.11% | 34 |
5-20-M5 | 144602 | 124905 | 86.38% | 124816 | 99.93% | 67653 | 54.20% | 60245 | 89.05% | 33 | 27 | 81.82% | 26 |
5-20-M6 | 147064 | 125541 | 85.36% | 125405 | 99.89% | 70268 | 56.03% | 63354 | 90.16% | 40 | 36 | 90.00% | 31 |
5-20-M7 | 177935 | 153409 | 86.22% | 153298 | 99.93% | 92849 | 60.57% | 90826 | 97.82% | 24 | 21 | 87.50% | 17 |
QC: # of sequencing reads passed by the quality threshold
Merge: # of merged sequencing reads
OTU.cluster: # of clustered sequencing reads
Alignment: # of sequencing reads mapped to Silva database
Represent.OTU: # of clusters
Mapped.OTU: # of clusters mapped to Silva database
Number.of.Tax: # of Taxonomy
Path: ./result/OTU & QC/
2. Taxonomy results
Barplot
According to taxonomy annotation, top 15 bacteria with relative abundance in each level (Phylum, Class, Order, family, Genus, Species) are selected for each sample or group and generate a histogram of relative abundance. Bacteria with lower relative abundance are merged into “Others”.
Path: ./result/Abundance/
Case & Control
Phylum
Class
Order
Family
Genus
Species
Cas0_BF
Phylum
Class
Order
Family
Genus
Species
Cas1_BF
Phylum
Class
Order
Family
Genus
Species
Gender
Phylum
Class
Order
Family
Genus
Species
Heatmap
The heatmap is drawn according to the taxonomy annotation and relative abundance of all samples (groups) in each level. “Square root” is applied to numerical conversion for relative abundance.
Path: ./result/Heatmap/
Case & Control
Phylum
Species
Cas0_BF
Phylum
Species
Cas1_BF
Phylum
Species
Gender
Phylum
Species
3. Alpha Diversity
Alpha Diversity analysis is to evaluate the diversity of the microbial community in a sample or grouping. The diversity analysis of a single sample can reflect the microbial diversity (Chao index) and the main bacterial diversity (Shannon index, Inverse Simpson) of the microbial community in the sample. The definition of “Chao index” is regardless of the relative abundance of bacteria, whether they are present or not, the codes of 0 and 1 are used to evaluate the number of bacteria. Diversity of main bacteria, calculate the number of bacteria with high relative abundance, bacteria are ignored in the measurement due to low relative abundance, mainly used to evaluate the number of bacteria with high relative abundance.
Path: ./result/ALPHA/
Chao
Case & Control
Cas0_BF
Cas1_BF
Gender
Inverse-Simpson
Case & Control
Cas0_BF
Cas1_BF
Gender
Shannon
Case & Control
Cas0_BF
Cas1_BF
Gender
Rarefaction curve
The rarefaction curve randomly select a certain number of individuals among the samples, count the number of species represented by these individuals, and construct the curve with the number of individuals and species. It can be used to compare the abundance of species in samples with different amounts of sequencing data, and it can also be used to show whether the amount of sequencing data of a sample is reasonable. Using the method of random sampling of sequences, construct a rareaction curve based on the number of sequences drawn and the number of OTUs they can represent. When the curve tends to be flatten, it means that the amount of sequencing data is reasonable, and more data will only generate a small amount of new data. On the contrary, it indicates that more new OTUs may be generated by continuing the sequencing.
Venn-diagram
Venn diagrams can be used to count the number of common and unique species (such as OTU) in multiple groups or samples, and can more intuitively show the similarity and overlap of the number of species (such as OTU) in environmental samples. Usually,16S analysis uses a similar level of 97% OTU or other taxonomic level sample tables.
2groups
3groups
4groups
4. Beta Diversity
Beta diversity (Distance matrix between samples) is to compare the composition of microbial communities between different samples. The larger value represents the greater the difference in species distribution between samples. According to taxonomy annotations and OTUs abundance information, merge the OTUs information of the same classification to obtain an abundance table. The systematic evolution and relative abundance relationship between microbial sequences in each sample are used to calculate the distance matrix between samples or groups. Principal Coordinates Analysis (PCoA, Principal Coordinates Analysis) is used to reduce the dimensionality of the sample group.
Path: ./result/BETA/
Unweighted
Weighted
5. Differential analysis
Significant difference analysis
The Kruskal-Wallis test (also known as the H test method) in the non-matrix method is used for analysis to determine whether there is a difference in the median of two or more groups. If the chi-square value of the overall test is statistically significant, then the null hypothesis is rejected, which means that at least one pair of groups has an unequal average level. As for which pairs are different, a post-mortem comparison is required. The number of samples in each group for KW verification must be at least 5 or more.
In this analysis, three conditions were used for significance screening: (a) Select the H test method p.value <0.05 (b) Fold change more than doubled (c) Case group or Control group, the average relative abundance of at least one group exceeds 0.5% .
Path: ./result/DEG/Sig_Bac_Visual
Cas0_BF
Cas1_BF
Cas0 vs Cas1
Con0 vs Con1
Gender
Summarize_Table
X | Orientation | Mean_seperation | KW.P.value | KW.FDR | ANOVA.P.value | ANOVA.FDR | FC | FC_MEAN_UP | FC_MEAN_DOWN | FC_SEM_UP | FC_SEM_DOWN | UP | DOWN | Sign.KW | Sign.ANOVA | Sign.Both |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;s__Lactobacillus animalis | Cas0_Cas1 | Overall | 0.1116108 | 0.5258346 | 0.0342102 | 0.5144687 | 0.2620754 | 0.042240 | 0.161175 | 0.0131833 | 0.0760204 | Cas1 | Cas0 | 0 | 1 | 1 |
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;s__Clostridium disporicum | Cas0_Cas1 | Overall | 0.0344914 | 0.5258346 | 0.0376441 | 0.5144687 | Inf | 0.134360 | 0.000000 | 0.0354972 | 0.0000000 | Cas1 | Cas0 | 1 | 1 | 1 |
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;s__Clostridium saudiense | Cas0_Cas1 | Overall | 0.0031599 | 0.1295557 | 0.0020703 | 0.0848838 | 0.0000000 | 0.000000 | 0.026950 | 0.0000000 | 0.0116491 | Cas1 | Cas0 | 1 | 1 | 1 |
k__Bacteria;p__Firmicutes;c__Erysipelotrichia;o__Erysipelotrichales;f__Erysipelotrichaceae;gErysipelatoclostridium;s[Clostridium] cocleatum | Cas0_Cas1 | Overall | 0.0479527 | 0.5258346 | 0.0730814 | 0.5866132 | 0.2291892 | 0.001060 | 0.004625 | 0.0008207 | 0.0020878 | Cas1 | Cas0 | 1 | 0 | 1 |
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes;s__Alistipes senegalensis | Cas0_BF | Overall | 0.0138744 | 0.2081161 | 0.0304154 | 0.4562312 | Inf | 0.041050 | 0.000000 | 0.0145647 | 0.0000000 | After | Before | 1 | 1 | 1 |
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;s__Clostridium disporicum | Cas0_BF | Overall | 0.0138744 | 0.2081161 | 0.0003426 | 0.0102788 | 0.0000000 | 0.000000 | 0.156100 | 0.0000000 | 0.0214490 | After | Before | 1 | 1 | 1 |
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;s__Clostridium saudiense | Cas0_BF | Overall | 0.0472209 | 0.2833254 | 0.0599814 | 0.4633202 | Inf | 0.026950 | 0.000000 | 0.0116491 | 0.0000000 | After | Before | 1 | 0 | 1 |
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;gLachnoclostridium;s[Clostridium] scindens | Cas0_BF | Overall | 0.0472209 | 0.2833254 | 0.0770693 | 0.4633202 | Inf | 0.008925 | 0.000000 | 0.0041876 | 0.0000000 | After | Before | 1 | 0 | 1 |
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Neglecta;s__Neglecta timonensis | Cas0_BF | Overall | 0.0472209 | 0.2833254 | 0.1092441 | 0.4633202 | Inf | 0.014250 | 0.000000 | 0.0075822 | 0.0000000 | After | Before | 1 | 0 | 1 |
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides;s__Parabacteroides goldsteinii | Cas1_BF | Overall | 0.0283060 | 0.4251487 | 0.0275870 | 0.4407532 | 0.2823173 | 0.045710 | 0.161910 | 0.0150630 | 0.0460742 | After | Before | 1 | 1 | 1 |
k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;s__Lactobacillus animalis | Cas1_BF | Overall | 0.1597867 | 0.4251487 | 0.0423044 | 0.4407532 | 0.2965043 | 0.042240 | 0.142460 | 0.0131833 | 0.0439188 | After | Before | 0 | 1 | 1 |
k__Bacteria;p__Firmicutes;c__Erysipelotrichia;o__Erysipelotrichales;f__Erysipelotrichaceae;gErysipelatoclostridium;s[Clostridium] cocleatum | Cas1_BF | Overall | 0.0022695 | 0.0998583 | 0.0723346 | 0.4407532 | 0.0490968 | 0.001060 | 0.021590 | 0.0008207 | 0.0107229 | After | Before | 1 | 0 | 1 |
LEfSe analysis
LDA Effect Size (LEfSe) is an algorithm for discovering biomarkers and features in high-dimensional data. Firstly, the Kruskal-Wallis (KW) sumrank test was used to test the abundance difference of each species between the two groups, and then the Linear Discriminant Analysis (LDA) method was used to evaluate the classification utility of all the different species for the groups.
Cut off :
Kruskal-Wallis test p value > 0.05
|LDA score| < 2
Path: ./result/DEG/lefse
Barplot
Cladogram
6. Co-occurence network
Co-occurence analysis use FastSpar to calculate the correlation coefficient between different level species, and then use the absolute value of the correlation coefficient 0.6 to determine the positive correlation and the negative correlation to make a network diagram.
(This example uses a correlation coefficient of 0.1)
Co-occurence network Edge ( |correlation value| > 0.6)
◎ Green: positive
◎ Red: negitave
Node Size: Depend on degree number
Edge width: Depend on correlation coefficient value
Path: ./result/Network/
Con
Cas
7. Advanced analysis
Tax4fun_pathway
The KEGG (Kyoto Encyclopedia of Genes and Genomes) database was launched in 1995 and has since developed into one of the most representative biological metabolic pathway database in the biological world. The database divides metabolic pathways into six categories, namely Cellular Processes, Environmental Information Processing, Genetic Information Processing, Human Diseases, Metabolism, Organismal Systems.
This analysis is based on 16S sequencing data, Silva database and Tax4fun tools to predict the metabolic pathway function of KEGG database. Use Kruskal Wallis two-sample test for P.value, and use 1.5 times the Fold change of Pathway activity as the screening criteria.
Path: ./result/PATHWAY/
Cas0_BF
Cas1_BF
Gender
Summarize_Table
Pathway | Orientation | Mean_seperation | KW.P.value | KW.FDR | ANOVA.P.value | ANOVA.FDR | FC | FC_MEAN_UP | FC_MEAN_DOWN | FC_SEM_UP | FC_SEM_DOWN | N_UP | N_DOWN | UP | DOWN | Sign.ANOVA |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ko00100; Steroid biosynthesis | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0034788 | 0.0111406 | 2.4802253 | 0.0000057 | 0.0000023 | 0.0000003 | 7.00e-07 | 4 | 4 | After | Before | 1 |
ko00140; Steroid hormone biosynthesis | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0008187 | 0.0084341 | 2.3645966 | 0.0007303 | 0.0003089 | 0.0000594 | 3.34e-05 | 4 | 4 | After | Before | 1 |
ko00540; Lipopolysaccharide biosynthesis | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0012049 | 0.0084341 | 2.4545654 | 0.0029277 | 0.0011927 | 0.0002913 | 7.87e-05 | 4 | 4 | After | Before | 1 |
ko00601; Glycosphingolipid biosynthesis - lacto and neolacto series | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0014072 | 0.0084341 | 2.9032122 | 0.0000895 | 0.0000308 | 0.0000103 | 1.90e-06 | 4 | 4 | After | Before | 1 |
ko03015; mRNA surveillance pathway | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0067704 | 0.0176030 | 2.1289147 | 0.0000097 | 0.0000046 | 0.0000012 | 3.00e-07 | 4 | 4 | After | Before | 1 |
ko04113; Meiosis - yeast | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0007715 | 0.0084341 | 3.2704205 | 0.0008542 | 0.0002612 | 0.0000792 | 5.19e-05 | 4 | 4 | After | Before | 1 |
ko04142; Lysosome | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0086779 | 0.0204942 | 2.1488534 | 0.0022379 | 0.0010414 | 0.0003053 | 6.70e-05 | 4 | 4 | After | Before | 1 |
ko04144; Endocytosis | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0089213 | 0.0204942 | 0.4286896 | 0.0000007 | 0.0000015 | 0.0000001 | 2.00e-07 | 4 | 4 | After | Before | 1 |
ko04210; Apoptosis | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0035505 | 0.0111406 | 11.2872505 | 0.0003950 | 0.0000350 | 0.0000775 | 5.00e-06 | 4 | 4 | After | Before | 1 |
ko04612; Antigen processing and presentation | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0019340 | 0.0084341 | 2.4579523 | 0.0000646 | 0.0000263 | 0.0000073 | 6.00e-07 | 4 | 4 | After | Before | 1 |
ko04723; Retrograde endocannabinoid signaling | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0010217 | 0.0084341 | 4.2206170 | 0.0001510 | 0.0000358 | 0.0000184 | 6.00e-06 | 4 | 4 | After | Before | 1 |
ko04912; GnRH signaling pathway | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0089213 | 0.0204942 | 0.4286896 | 0.0000007 | 0.0000015 | 0.0000001 | 2.00e-07 | 4 | 4 | After | Before | 1 |
ko04914; Progesterone-mediated oocyte maturation | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0019340 | 0.0084341 | 2.4579523 | 0.0000646 | 0.0000263 | 0.0000073 | 6.00e-07 | 4 | 4 | After | Before | 1 |
ko04974; Protein digestion and absorption | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0022972 | 0.0087833 | 4.2660718 | 0.0006629 | 0.0001554 | 0.0001001 | 4.60e-06 | 4 | 4 | After | Before | 1 |
ko04975; Fat digestion and absorption | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0028337 | 0.0103770 | 38.0761257 | 0.0000026 | 0.0000001 | 0.0000005 | 0.00e+00 | 4 | 4 | After | Before | 1 |
ko05215; Prostate cancer | Cas0_BF | Overall | 0.0209213 | 0.0415233 | 0.0021661 | 0.0087833 | 3.2311396 | 0.0000849 | 0.0000263 | 0.0000114 | 6.00e-07 | 4 | 4 | After | Before | 1 |
ko03015; mRNA surveillance pathway | Cas1_BF | Overall | 0.0081510 | 0.1640263 | 0.0260165 | 0.1878971 | 3.7760112 | 0.0000190 | 0.0000050 | 0.0000057 | 6.00e-07 | 10 | 10 | After | Before | 1 |
ko03015; mRNA surveillance pathway | Gender | Overall | 0.0308082 | 0.6791520 | 0.0337744 | 0.4082970 | 0.3676912 | 0.0000057 | 0.0000155 | 0.0000007 | 4.30e-06 | 14 | 14 | Male | Female | 1 |
Fourth、Software lists
QC
Merge & clustering
Alignment
R packages
– R v3.4.4
LEfSe
Network construction
Calculate corrlation
Pathway analysis