16S Metagenomics report

First、Project information

Second、Workflow

1. Sequencing process

16S metagenomics library was constructed using two rounds of PCR reactions. The PCR primers,

forward

(5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-3’)

and reverse

(5’-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGACTACHVGGGTWTCTAAT-3’),

were designed to amplify the V3V4 domain of bacterial 16S ribosomal DNA as well as adding adaptor sequences. Illumina index sequences were subsequently attached into the amplicons in the second PCR.

PCR amplification was performed in a 50 μl reaction volume containing 25 μl 2X Phusion High-Fidelity PCR Master Mix (New Enagland Biolabs), 0.2 μM of each forward and reverse primer, and 100 ng DNA template. The reaction conditions consisted of an initial 98°C for 30sec, followed by 30 cycles of 98°C for 10 sec, 55°C for 30 sec, and 72°C for 30 sec, as well as a final extension of 72°C for 5 min. The number of cycle for second PCR was 10.

Next, amplified products were checked by 2 % agarose gel electrophoresis and were purified using the AMPure XP PCR Purification Kit (Agencourt). Quantification was performed by using Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific) on Qubit 2.0 Fluorometer (Thermo Fisher Scientific) and Light cycler 96 (Roche)- all according to respective manufacturer instructions. The final purified libraries were applied for cluster generation and sequencing on the MiSeq using V3 600 cycles kit.

2. Raw data process

3. Data analysis workflow

Third、Analysis results

1. QC results

SampleID Raw.Reads QC QC.rate Merge Merge.rate OTU.cluster OTU.cluster.rate Alignment Alignment.rate Represent.OTU Mapped.OTU Mapped.OTU.rate Number.of.Tax
5-2-F10 88539 74129 83.72% 74057 99.90% 39095 52.79% 38774 99.18% 12 11 91.67% 9
5-2-F11 143479 132752 92.52% 132317 99.67% 118977 89.92% 109463 92.00% 67 59 88.06% 44
5-2-F12 140652 122229 86.90% 122116 99.91% 73277 60.01% 62604 85.43% 40 32 80.00% 27
5-2-F13 117505 102245 87.01% 102057 99.82% 52123 51.07% 48877 93.77% 23 20 86.96% 19
5-2-F14 144270 124658 86.41% 124546 99.91% 73044 58.65% 66811 91.47% 24 21 87.50% 17
5-2-F8 105788 97708 92.36% 97180 99.46% 83329 85.75% 76551 91.87% 67 62 92.54% 51
5-2-F9 128612 110167 85.66% 110033 99.88% 71066 64.59% 71066 100.00% 12 12 100.00% 10
5-2-M1 146199 125347 85.74% 125208 99.89% 70072 55.96% 65419 93.36% 24 19 79.17% 18
5-2-M2 96796 82039 84.75% 81904 99.84% 41094 50.17% 37850 92.11% 22 17 77.27% 15
5-2-M3 95485 71051 74.41% 70918 99.81% 34848 49.14% 32115 92.16% 21 19 90.48% 15
5-2-M4 116560 100152 85.92% 100057 99.91% 51865 51.84% 49116 94.70% 22 19 86.36% 15
5-2-M5 108369 93771 86.53% 93684 99.91% 51858 55.35% 45594 87.92% 41 37 90.24% 30
5-2-M6 120432 102599 85.19% 102411 99.82% 49687 48.52% 45650 91.88% 29 25 86.21% 20
5-2-M7 113999 97404 85.44% 97335 99.93% 61089 62.76% 61083 99.99% 19 18 94.74% 15
5-20-F10 127201 108755 85.50% 108668 99.92% 57792 53.18% 57191 98.96% 23 21 91.30% 20
5-20-F11 109253 93779 85.84% 93698 99.91% 50007 53.37% 47572 95.13% 26 22 84.62% 18
5-20-F12 150373 128307 85.33% 128164 99.89% 75561 58.96% 72493 95.94% 33 30 90.91% 29
5-20-F13 151224 128241 84.80% 128136 99.92% 71281 55.63% 70059 98.29% 32 30 93.75% 27
5-20-F14 140689 119599 85.01% 119479 99.90% 63151 52.86% 58574 92.75% 34 30 88.24% 26
5-20-F8 136456 117599 86.18% 117525 99.94% 62035 52.78% 59418 95.78% 31 25 80.65% 25
5-20-F9 114879 97230 84.64% 97131 99.90% 48557 49.99% 47724 98.28% 18 16 88.89% 14
5-20-M1 140154 119488 85.25% 119396 99.92% 70010 58.64% 67861 96.93% 20 16 80.00% 15
5-20-M2 150905 130488 86.47% 130413 99.94% 77993 59.80% 76477 98.06% 21 18 85.71% 18
5-20-M3 130723 113331 86.70% 113263 99.94% 59241 52.30% 55331 93.40% 21 17 80.95% 15
5-20-M4 178332 153427 86.03% 153324 99.93% 99067 64.61% 86607 87.42% 45 41 91.11% 34
5-20-M5 144602 124905 86.38% 124816 99.93% 67653 54.20% 60245 89.05% 33 27 81.82% 26
5-20-M6 147064 125541 85.36% 125405 99.89% 70268 56.03% 63354 90.16% 40 36 90.00% 31
5-20-M7 177935 153409 86.22% 153298 99.93% 92849 60.57% 90826 97.82% 24 21 87.50% 17

QC: # of sequencing reads passed by the quality threshold

Merge: # of merged sequencing reads

OTU.cluster: # of clustered sequencing reads

Alignment: # of sequencing reads mapped to Silva database

Represent.OTU: # of clusters

Mapped.OTU: # of clusters mapped to Silva database

Number.of.Tax: # of Taxonomy

Path: ./result/OTU & QC/

2. Taxonomy results

Barplot

According to taxonomy annotation, top 15 bacteria with relative abundance in each level (Phylum, Class, Order, family, Genus, Species) are selected for each sample or group and generate a histogram of relative abundance. Bacteria with lower relative abundance are merged into “Others”.

Path: ./result/Abundance/


Case & Control

Phylum

Class

Order

Family

Genus

Species

Cas0_BF

Phylum

Class

Order

Family

Genus

Species

Cas1_BF

Phylum

Class

Order

Family

Genus

Species

Gender

Phylum

Class

Order

Family

Genus

Species

Heatmap

The heatmap is drawn according to the taxonomy annotation and relative abundance of all samples (groups) in each level. “Square root” is applied to numerical conversion for relative abundance.

Path: ./result/Heatmap/

Case & Control

Phylum

Species

Cas0_BF

Phylum

Species

Cas1_BF

Phylum

Species

Gender

Phylum

Species

3. Alpha Diversity

Alpha Diversity analysis is to evaluate the diversity of the microbial community in a sample or grouping. The diversity analysis of a single sample can reflect the microbial diversity (Chao index) and the main bacterial diversity (Shannon index, Inverse Simpson) of the microbial community in the sample. The definition of “Chao index” is regardless of the relative abundance of bacteria, whether they are present or not, the codes of 0 and 1 are used to evaluate the number of bacteria. Diversity of main bacteria, calculate the number of bacteria with high relative abundance, bacteria are ignored in the measurement due to low relative abundance, mainly used to evaluate the number of bacteria with high relative abundance.

Path: ./result/ALPHA/

Chao

Case & Control

Cas0_BF

Cas1_BF

Gender

Inverse-Simpson

Case & Control

Cas0_BF

Cas1_BF

Gender

Shannon

Case & Control

Cas0_BF

Cas1_BF

Gender

Rarefaction curve

The rarefaction curve randomly select a certain number of individuals among the samples, count the number of species represented by these individuals, and construct the curve with the number of individuals and species. It can be used to compare the abundance of species in samples with different amounts of sequencing data, and it can also be used to show whether the amount of sequencing data of a sample is reasonable. Using the method of random sampling of sequences, construct a rareaction curve based on the number of sequences drawn and the number of OTUs they can represent. When the curve tends to be flatten, it means that the amount of sequencing data is reasonable, and more data will only generate a small amount of new data. On the contrary, it indicates that more new OTUs may be generated by continuing the sequencing.

Venn-diagram

Venn diagrams can be used to count the number of common and unique species (such as OTU) in multiple groups or samples, and can more intuitively show the similarity and overlap of the number of species (such as OTU) in environmental samples. Usually,16S analysis uses a similar level of 97% OTU or other taxonomic level sample tables.

2groups

3groups

4groups

4. Beta Diversity

Beta diversity (Distance matrix between samples) is to compare the composition of microbial communities between different samples. The larger value represents the greater the difference in species distribution between samples. According to taxonomy annotations and OTUs abundance information, merge the OTUs information of the same classification to obtain an abundance table. The systematic evolution and relative abundance relationship between microbial sequences in each sample are used to calculate the distance matrix between samples or groups. Principal Coordinates Analysis (PCoA, Principal Coordinates Analysis) is used to reduce the dimensionality of the sample group.

Path: ./result/BETA/

Unweighted

Weighted

5. Differential analysis

Significant difference analysis

The Kruskal-Wallis test (also known as the H test method) in the non-matrix method is used for analysis to determine whether there is a difference in the median of two or more groups. If the chi-square value of the overall test is statistically significant, then the null hypothesis is rejected, which means that at least one pair of groups has an unequal average level. As for which pairs are different, a post-mortem comparison is required. The number of samples in each group for KW verification must be at least 5 or more.

In this analysis, three conditions were used for significance screening: (a) Select the H test method p.value <0.05 (b) Fold change more than doubled (c) Case group or Control group, the average relative abundance of at least one group exceeds 0.5% .

Path: ./result/DEG/Sig_Bac_Visual

Cas0_BF

Cas1_BF

Cas0 vs Cas1

Con0 vs Con1

Gender

Summarize_Table

X Orientation Mean_seperation KW.P.value KW.FDR ANOVA.P.value ANOVA.FDR FC FC_MEAN_UP FC_MEAN_DOWN FC_SEM_UP FC_SEM_DOWN UP DOWN Sign.KW Sign.ANOVA Sign.Both
k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;s__Lactobacillus animalis Cas0_Cas1 Overall 0.1116108 0.5258346 0.0342102 0.5144687 0.2620754 0.042240 0.161175 0.0131833 0.0760204 Cas1 Cas0 0 1 1
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;s__Clostridium disporicum Cas0_Cas1 Overall 0.0344914 0.5258346 0.0376441 0.5144687 Inf 0.134360 0.000000 0.0354972 0.0000000 Cas1 Cas0 1 1 1
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;s__Clostridium saudiense Cas0_Cas1 Overall 0.0031599 0.1295557 0.0020703 0.0848838 0.0000000 0.000000 0.026950 0.0000000 0.0116491 Cas1 Cas0 1 1 1
k__Bacteria;p__Firmicutes;c__Erysipelotrichia;o__Erysipelotrichales;f__Erysipelotrichaceae;gErysipelatoclostridium;s[Clostridium] cocleatum Cas0_Cas1 Overall 0.0479527 0.5258346 0.0730814 0.5866132 0.2291892 0.001060 0.004625 0.0008207 0.0020878 Cas1 Cas0 1 0 1
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes;s__Alistipes senegalensis Cas0_BF Overall 0.0138744 0.2081161 0.0304154 0.4562312 Inf 0.041050 0.000000 0.0145647 0.0000000 After Before 1 1 1
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;s__Clostridium disporicum Cas0_BF Overall 0.0138744 0.2081161 0.0003426 0.0102788 0.0000000 0.000000 0.156100 0.0000000 0.0214490 After Before 1 1 1
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;s__Clostridium saudiense Cas0_BF Overall 0.0472209 0.2833254 0.0599814 0.4633202 Inf 0.026950 0.000000 0.0116491 0.0000000 After Before 1 0 1
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;gLachnoclostridium;s[Clostridium] scindens Cas0_BF Overall 0.0472209 0.2833254 0.0770693 0.4633202 Inf 0.008925 0.000000 0.0041876 0.0000000 After Before 1 0 1
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Neglecta;s__Neglecta timonensis Cas0_BF Overall 0.0472209 0.2833254 0.1092441 0.4633202 Inf 0.014250 0.000000 0.0075822 0.0000000 After Before 1 0 1
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides;s__Parabacteroides goldsteinii Cas1_BF Overall 0.0283060 0.4251487 0.0275870 0.4407532 0.2823173 0.045710 0.161910 0.0150630 0.0460742 After Before 1 1 1
k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;s__Lactobacillus animalis Cas1_BF Overall 0.1597867 0.4251487 0.0423044 0.4407532 0.2965043 0.042240 0.142460 0.0131833 0.0439188 After Before 0 1 1
k__Bacteria;p__Firmicutes;c__Erysipelotrichia;o__Erysipelotrichales;f__Erysipelotrichaceae;gErysipelatoclostridium;s[Clostridium] cocleatum Cas1_BF Overall 0.0022695 0.0998583 0.0723346 0.4407532 0.0490968 0.001060 0.021590 0.0008207 0.0107229 After Before 1 0 1

LEfSe analysis

LDA Effect Size (LEfSe) is an algorithm for discovering biomarkers and features in high-dimensional data. Firstly, the Kruskal-Wallis (KW) sumrank test was used to test the abundance difference of each species between the two groups, and then the Linear Discriminant Analysis (LDA) method was used to evaluate the classification utility of all the different species for the groups.

Cut off :

Kruskal-Wallis test p value > 0.05

|LDA score| < 2

Path: ./result/DEG/lefse

Barplot

Cladogram

6. Co-occurence network

Co-occurence analysis use FastSpar to calculate the correlation coefficient between different level species, and then use the absolute value of the correlation coefficient 0.6 to determine the positive correlation and the negative correlation to make a network diagram.

(This example uses a correlation coefficient of 0.1)

Co-occurence network Edge ( |correlation value| > 0.6)

◎ Green: positive

◎ Red: negitave

Node Size: Depend on degree number

Edge width: Depend on correlation coefficient value

Path: ./result/Network/

Con

Cas

7. Advanced analysis

Tax4fun_pathway

The KEGG (Kyoto Encyclopedia of Genes and Genomes) database was launched in 1995 and has since developed into one of the most representative biological metabolic pathway database in the biological world. The database divides metabolic pathways into six categories, namely Cellular Processes, Environmental Information Processing, Genetic Information Processing, Human Diseases, Metabolism, Organismal Systems.

This analysis is based on 16S sequencing data, Silva database and Tax4fun tools to predict the metabolic pathway function of KEGG database. Use Kruskal Wallis two-sample test for P.value, and use 1.5 times the Fold change of Pathway activity as the screening criteria.

Path: ./result/PATHWAY/

Cas0_BF

Cas1_BF

Gender

Summarize_Table

Pathway Orientation Mean_seperation KW.P.value KW.FDR ANOVA.P.value ANOVA.FDR FC FC_MEAN_UP FC_MEAN_DOWN FC_SEM_UP FC_SEM_DOWN N_UP N_DOWN UP DOWN Sign.ANOVA
ko00100; Steroid biosynthesis Cas0_BF Overall 0.0209213 0.0415233 0.0034788 0.0111406 2.4802253 0.0000057 0.0000023 0.0000003 7.00e-07 4 4 After Before 1
ko00140; Steroid hormone biosynthesis Cas0_BF Overall 0.0209213 0.0415233 0.0008187 0.0084341 2.3645966 0.0007303 0.0003089 0.0000594 3.34e-05 4 4 After Before 1
ko00540; Lipopolysaccharide biosynthesis Cas0_BF Overall 0.0209213 0.0415233 0.0012049 0.0084341 2.4545654 0.0029277 0.0011927 0.0002913 7.87e-05 4 4 After Before 1
ko00601; Glycosphingolipid biosynthesis - lacto and neolacto series Cas0_BF Overall 0.0209213 0.0415233 0.0014072 0.0084341 2.9032122 0.0000895 0.0000308 0.0000103 1.90e-06 4 4 After Before 1
ko03015; mRNA surveillance pathway Cas0_BF Overall 0.0209213 0.0415233 0.0067704 0.0176030 2.1289147 0.0000097 0.0000046 0.0000012 3.00e-07 4 4 After Before 1
ko04113; Meiosis - yeast Cas0_BF Overall 0.0209213 0.0415233 0.0007715 0.0084341 3.2704205 0.0008542 0.0002612 0.0000792 5.19e-05 4 4 After Before 1
ko04142; Lysosome Cas0_BF Overall 0.0209213 0.0415233 0.0086779 0.0204942 2.1488534 0.0022379 0.0010414 0.0003053 6.70e-05 4 4 After Before 1
ko04144; Endocytosis Cas0_BF Overall 0.0209213 0.0415233 0.0089213 0.0204942 0.4286896 0.0000007 0.0000015 0.0000001 2.00e-07 4 4 After Before 1
ko04210; Apoptosis Cas0_BF Overall 0.0209213 0.0415233 0.0035505 0.0111406 11.2872505 0.0003950 0.0000350 0.0000775 5.00e-06 4 4 After Before 1
ko04612; Antigen processing and presentation Cas0_BF Overall 0.0209213 0.0415233 0.0019340 0.0084341 2.4579523 0.0000646 0.0000263 0.0000073 6.00e-07 4 4 After Before 1
ko04723; Retrograde endocannabinoid signaling Cas0_BF Overall 0.0209213 0.0415233 0.0010217 0.0084341 4.2206170 0.0001510 0.0000358 0.0000184 6.00e-06 4 4 After Before 1
ko04912; GnRH signaling pathway Cas0_BF Overall 0.0209213 0.0415233 0.0089213 0.0204942 0.4286896 0.0000007 0.0000015 0.0000001 2.00e-07 4 4 After Before 1
ko04914; Progesterone-mediated oocyte maturation Cas0_BF Overall 0.0209213 0.0415233 0.0019340 0.0084341 2.4579523 0.0000646 0.0000263 0.0000073 6.00e-07 4 4 After Before 1
ko04974; Protein digestion and absorption Cas0_BF Overall 0.0209213 0.0415233 0.0022972 0.0087833 4.2660718 0.0006629 0.0001554 0.0001001 4.60e-06 4 4 After Before 1
ko04975; Fat digestion and absorption Cas0_BF Overall 0.0209213 0.0415233 0.0028337 0.0103770 38.0761257 0.0000026 0.0000001 0.0000005 0.00e+00 4 4 After Before 1
ko05215; Prostate cancer Cas0_BF Overall 0.0209213 0.0415233 0.0021661 0.0087833 3.2311396 0.0000849 0.0000263 0.0000114 6.00e-07 4 4 After Before 1
ko03015; mRNA surveillance pathway Cas1_BF Overall 0.0081510 0.1640263 0.0260165 0.1878971 3.7760112 0.0000190 0.0000050 0.0000057 6.00e-07 10 10 After Before 1
ko03015; mRNA surveillance pathway Gender Overall 0.0308082 0.6791520 0.0337744 0.4082970 0.3676912 0.0000057 0.0000155 0.0000007 4.30e-06 14 14 Male Female 1

Fourth、Software lists

QC

FASTX-Toolkit v0.0.13

Merge & clustering

Usearch v11

Alignment

BLAST v2.2.29+

R packages

R v3.4.4

gplots v3.0.1.1

ggplot2 v3.2.1

plotrix v3.7-1

vegan v2.5-5

vcd v1.4-4

plyr v1.8.4

multtest v2.40.0

Rmarkdown

knitr v1.24

kableExtra v1.1.0

LEfSe

lefse v1.0.7

Network construction

Cytoscape v3.7.1

Calculate corrlation

Fastspar v0.0.10

Pathway analysis

Tax4fun v0.3.1