The shotgun library was constructed with 500 ng of DNA as described by the manufacturer Roche with the GS Rapid library Prep kit. DNA (5��g) was mechanically fragmented on the Hydroshear device (Digilab, Holliston, MA, USA) with an enrichment size at 3-4kb. The DNA fragmentation was visualized through the Agilent 2100 BioAnalyzer on a DNA labchip 7500 done with an optimal size of 3.563kb.The library was constructed according to the 454_Titanium paired end protocol and manufacturer. Circularization and nebulization were performed and generated a pattern with an optimal at 377 bp. After PCR amplification through 15 cycles followed by double size selection, the single stranded paired end library was then quantified on the Quant-it Ribogreen kit (Invitrogen) on the Genios_Tecan fluorometer at 215pg/��L.
The library concentration equivalence was calculated as 10.5E+08 molecules/��L. The library was stocked at -20��C until use. The shotgun library was clonally amplified with 3 cpb in 3emPCR reactions with the GS Titanium SV emPCR Kit (Lib-L) v2 leading to 13.93% yield of the emPCR. The paired end library was clonally amplified with 1cpb in 4 SV-emPCR reactions leading to 17.56% yield was in the range of 5 to 20% from the Roche procedure. 790,000 beads for a ? Region and 340,000 beads for a ? region were loaded on the GS Titanium PicoTiterPlates PTP Kit 70��75 sequenced with the GS Titanium Sequencing Kit XLR70. The runs were performed overnight and then analyzed on the cluster through the gsRunBrowser _Roche. Data from 78.
55 Mb of passed filter wells were generated with an average of length of 228 bp for the paired end library, and 51.3 Mb with an average length of 417 bp were obtained from the shotgun library. The global passed filter sequences were assembled on the gsAssembler_Roche with 90% identity and 40bp as overlap. The final assembly into 4 scaffolds and 40 large contigs (>1500bp) generated a genome size of 4.01 Mb. Genome annotation Open Reading Frames (ORFs) were predicted using Prodigal  with default parameters but the predicted ORFs were excluded if they were spanning a sequencing GAP region. The predicted bacterial protein sequences were searched against the GenBank database  and the Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAScanSE tool  was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer  and BLASTn against the NR database.
ORFans were identified if their BLASTP E-value were lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, Drug_discovery we used an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. To estimate the mean level of nucleotide sequence similarity at the genome level between Alistipes species, we compared the ORFs only using BLASTN and the following parameters: a query coverage of �� 70% and a minimum nucleotide length of 100 bp.