Skip to main content

On the ability to extract MLVA profiles of Vibrio cholerae isolates from WGS data generated with Oxford Nanopore Technologies

Abstract

Objective

Multiple-Locus Variable Number of Tandem Repeats (VNTR) Analysis (MLVA) is widely used to subtype pathogens causing foodborne and waterborne disease outbreaks. The MLVAType shiny application was previously designed to extract MLVA profiles of Vibrio cholerae isolates from whole-genome sequencing (WGS) data, and provide backward compatibility with traditional MLVA typing methods. The previous development and validation work was conducted using short (pair-end 300 and 150 nt long) reads from Illumina MiSeq and Hiseq sequencing. In this study, the MLVAType application was validated using long reads generated by Oxford Nanopore Technologies (ONT) sequencing platforms. In silico MLVA profiles of V. cholerae isolates (n = 9) from the Democratic Republic of the Congo were generated using the MLVAType application on Nanopore WGS data. The WGS-derived in silico MLVA profiles were extracted from Canu (v.2.2) assemblies obtained through MinION and GridION sequencing by ONT. The results were compared to those obtained from SPAdes assemblies (v3.13.0; k-mer 175) generated from short-read (pair-end 300-bp) reference data obtained by MiSeq sequencing, Illumina.

Results

For each isolate, the in silico MLVA profiles were concordant across all three sequencing methods, demonstrating that the MLVAType application can accurately predict the MLVA profiles from assembled genomes generated by long-reads ONT sequencers.

Peer Review reports

Background

Multiple-Locus Variable Number of Tandem Repeats (VNTR) Analysis (MLVA) is widely used by laboratory-based surveillance networks for subtyping pathogens causing foodborne and water-borne disease outbreaks. We recently demonstrated that WGS data generated with short-read Illumina sequencing technology can be used to extract in silico MLVA profiles of V. cholerae isolates from WGS data while maintaining backward compatibility with traditional MLVA typing methods [1]. The percentage of censored estimations in MLVA profiles generated from WGS data was inversely proportional to the k-mer parameter used during genome assembly. However, preventing censored estimation was possible by using a longer k-mer size (e.g. 175) even though the original SPAdes v.3.13.0 [2] software did not propose this k-mer size.

Both MinION and GridION ONT sequencers are quickly gaining popularity because the long sequence reads enable to assemble contiguous microbial genome. However, their base-calling accuracy is significantly lower than that obtained with Illumina short reads, although the resolution of this shortcoming is steadily improving. More specifically, it is well known that ONT sequencers have difficulty in accurately sequencing low-complexity regions, such as homopolymers [3].

Recent studies have looked into methods for deriving in silico MLVA profiles from long-read sequencing data for several bacterial species. For multidrug-resistant organisms such as Klebsiella pneumoniae, Escherichia coli, Enterobacter cloacae, Citrobacter freundii, Pseudomonas aeruginosa, Acinetobacter baumannii, and methicillin-resistant Staphylococcus aureus, perfect concordance was achieved between in silico MLVA profiles derived from long- and short-read data, as well as conventional MLVA typing [4]. Lower concordance rates were observed for Bacillus anthracis [5], where Nanopore and Illumina sequencing yielded an 88% and 83% concordance, respectively.

To the best of our knowledge, the accuracy of in silico MLVA typing using Nanopore data has not yet been assessed on V. cholerae. This species seems to be undergoing unprecedented genetic changes, with climate change possibly acting as a trigger factor [6, 7]. These changes pose an increasing threat to public health in cholera-affected regions. Therefore, this study aimed to compare and validate MLVA results obtained with WGS data from V. cholerae using MinION, GridION and Illumina sequencing, in order to expand the scope of application of our previous MLVAtype shiny application. We analysed the in silico MLVA profiles derived from the three methods on a series of V. cholerae strains. Given that that we had previously demonstrated the accuracy of MLVA profiles derived from Illumina MiSeq on V. cholerae isolates [1], we compared the ONT results to the Illumina results, which served as benchmark.

Method

Sample collection and sequencing technology

Nine V. cholerae isolates were selected from a collection of isolates characterised in a recent study conducted in the DRC between 2014 and 2017 [8].

Two technologies were used to sequence the whole genomes of selected V. cholerae isolates: Illumina (MiSeq) and ONT (MinION and GridION). Regarding Illumina technology, whole genome assemblies were generated from paired-end 300 nt long reads, as previously detailed [1]. In brief, sequencing libraries were prepared using 70 ng of V. cholerae genomic DNA following the Illumina DNA Prep protocol (Illumina, San Diego, CA, USA). In brief, genomic DNA from V. cholerae isolates was simultaneously fragmented and tagged with sequencing adapters in a single step using Nextera transposome (Nextera XT DNA Library Preparation Kit, Illumina, San Diego, CA, USA). Tagged DNA was then amplified with a 12-cycle polymerase chain reaction (PCR), cleaned up with AMPure beads, and subsequently loaded on a MiSeq for a paired-end 2 × 300 nt sequencing run using MiSeq reagent kit V3 (600 cycles) (Illumina, San Diego, CA, USA).

ONT long-read libraries were generated using 400 ng of high molecular weight genomic DNA (GQN > 8). The DNA was initially fragmented to an average fragment length of 11.6 kb using Covaris g-TUBES (Covaris, Woburn, MA, USA). Libraries were then prepared and barcoded according to ONT’s Ligation Sequencing genomic DNA – DNA Barcoding kit SQK-NBD112.24 protocol. The nine libraries were multiplexed and loaded into two FLO-MIN112 (R10 version) flow cells. Sequencing took 72 h on a MinION Mk1C and a GridION.

Sanger-Derived MLVA Typing was used as a reference method for resolving MLVA discrepancies. Sanger-derived MLVA typing was performed by sequencing amplicons on both strands on the ABI 3130 GA, using the BigDye Terminator v1.1 cycle sequencing kit (Applied Biosystems, USA). Motif repeats were counted manually and translated into MLVA profiles.

WGS assembly and MLVA profiling

WGS data from Illumina MiSeq were assembled into contigs using SPAdes v.3.13.0 [2] with a k-mer value of 175 and other default settings. WGS data from ONT MinION and GridION were assembled into contigs using Canu v.2.2 with genome size = 4 m and other default settings [9].

For each isolate and each sequencing platform, the in silico MLVA profiles were extracted from the assembled contigs using the MLVAtype algorithm, which has been implemented in an R shiny application. This application is freely available at https://ucl-irec-ctma.shinyapps.io/NGS-MLVA-TYPING/. It enables users to upload a list of draft genomes and the nucleotide sequences of the motifs. The application was used to predict MLVA profiles for V. cholerae loci listed in Table 1, as demonstrated in our previous study.

Table 1 Loci and motifs characterising the MLVA profiles of V. Cholerae

Results

Tables 2 and 3 summarise the sequence quality reported for Illumina MiSeq, MinION, and GridION, respectively. As expected, forward reads from Illumina MiSeq exhibited higher quality than reverse reads.

Table 2 Quality control metrics of Illumina MiSeq reads

Despite having lower quality than Illumina, both ONT platforms produced significantly longer reads (Table 3).

Table 3 Quality control metrics of ONT MinION and GridION long reads

As expected, assembled genomes generated by SPAdes using Illumina MiSeq reads were more fragmented with contigs (ranging from 74 to 89 contigs, compared to those produced by Canu with MinION and GridION reads, which ranged from 2 to 5 contigs. As shown in Table 4, MLVA profiles were generated using the MLVAType algorithm on WGS data from nine previously reported isolates [1, 8]. The results were perfectly concordant across all sequencing platforms.

Table 4 MLVA profiles of V. Cholerae isolates obtained from WGS data using the MLVAtype shiny application

Discussion

Due to its low cost and rapid turnaround time, ONT sequencing platforms such as MinION and GridION are appealing to clinical laboratories, with the clear potential to replace traditional typing methods. However, this type of analysis is not yet affordable in all institutions due to several new challenges, including data storage, computing power, and bioinformatics expertise. Moreover, sequencing with ONT platforms still faces the issue of base-calling accuracy when compared to other sequencing platforms such as Illumina short-reads sequencer [10]. Accordingly, the current study was designed to assess the impact of the lower sequencing accuracy of the ONT technology on assembled genomic region of V. cholerae, which is characterised by a variable number of tandem repeats, using both ONT platforms.

Given that we had previously demonstrated the accuracy of Illumina MiSeq-derived MLVA profiles on V. cholerae isolates, we compared the ONT results to the Illumina-Miseq results, which served as a reference. Notably, in this study, MiSeq-derived MLVA profiles were free of censored values due to two factors: (i) the limited number of repetitions per motif, with a maximum of 26 for the 4th motif of CTMA-1426, and (ii) the use of a long k-mer size (175) during genome assembly with SPAdes v.3.13.0.

We used the same nine DRC V. cholerae isolates as in our previous study [1] and assembled the Illumina reads into contigs with the same assembler (SPAdes). Our MLVA studies were therefore conducted in several phases, during which Vibrio cholerae strains were recultured between 2019 and 2023. During this process, minor variations in MLVA profiles were observed in two strains (CTMA-1424 and CTMA-1426) across passages, confirmed by Sanger sequencing. The long-term stability of MLVA profiles across passages has been explored by Kendall et al. [11] and Garcia et al. [12] in large-scale studies, where microevolution was observed in V. parahaemolyticus after multiple passages. In contrast, the limited number of passages (≤ 3) in our study and the low mutation rate (on the order of 10 − 4 mutant per generation) observed during culture by Garcia et al. [12] make it unlikely that significant MLVA changes occurred. We therefore believe that experimental conditions are more likely responsible for the observed MLVA variations. While the same initial colonies were used in our 2019 and 2023 studies, different glycerol stocks were employed. Although we cannot conclusively demonstrate this, the few differences in MLVA profiles are more plausibly attributable to these technical factors than to microevolutionary processes. A large-scale study on V. cholerae, involving many strains and numerous passages (e.g., n > 30), falls beyond the current scope of this study but is planned for the near future.

In conclusion, the perfect concordance of results across short- and long-reads sequencing platforms in this study demonstrates the reliability and accuracy of in silico MLVA typing using Nanopore WGS data. While ongoing technological progress is expected to improve ONT base-calling accuracy in the near future, this study confirms that the currently reported lower accuracy of ONT long-read sequencing, compared to short-read Illumina sequencing, does not affect MLVA typing results.

Limitations

As demonstrated in this study, using short- and long-read sequencing for backward comparison with historical MLVA profiles obtained through traditional methods can introduce bias due to unpredictable MLVA profile variations across passages of the same strain. These variations may result from technical factors (e.g., reculturing strains from different aliquots or randomly analysing different colonies from the same culture plate), genetic microevolution, or a combination of both. Consequently, further investigation is needed to assess the variability among multiple colonies from the same culture plate and the long-term stability of MLVA profiles across numerous passages in a larger-scale study.

Although the limited number of V. cholerae isolates in this study could be considered a genuine limitation, this is counterbalanced by the broad diversity of MLVA profiles included in the analysis and the perfect concordance observed across traditional, short-read, and long-read sequencing methods for MLVA profiling.

Data availability

All NGS data are available from the European Nucleotide Archive (ENA, http://www.ebi.ac.uk/ena), available under study accession number PRJEB55717.

Abbreviations

ONT:

Oxford Nanopore Sequencing

WGS:

Whole Genome Sequencing

MLVA:

Multiple-Locus Variable Number of Tandem Repeats (VNTR) Analysis

References

  1. Ambroise J, Irenge LM, Durant JF, Bearzatto B, Bwire G, Stine OC, et al. Backward compatibility of whole genome sequencing data with MLVA typing using a new MLVAtype shiny application for Vibrio cholerae. PLoS ONE. 2019;14(12):e0225848.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19(5):455–77.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Delahaye C, Nicolas J. Sequencing DNA with nanopores: troubles and biases. PLoS ONE. 2021;16(10):e0257521.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Landman F, Jamin C, de Haan A, Witteveen S, Bos J, van der Heide HG, et al. Genomic surveillance of multidrug-resistant organisms based on long-read sequencing. medRxiv. 2024;202402:18–24301916.

    Google Scholar 

  5. Linde J, Brangsch H, Hölzer M, Thomas C, Elschner MC, Melzer F, et al. Comparison of Illumina and Oxford Nanopore Technology for genome analysis of Francisella tularensis, Bacillus anthracis, and Brucella suis. BMC Genomics. 2023;24(1):258.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Irenge LM, Ambroise J, Bearzatto B, Durant J-F, Bonjean M, Wimba LK et al. Genomic evolution and rearrangement of CTX-Φ prophage elements in Vibrio cholerae during the 2018–2024 cholera outbreaks in eastern Democratic Republic of the Congo. Emerging Microbes & Infections. 2024(just-accepted):2399950.

  7. Chowdhury FR, Nur Z, Hassan N, von Seidlein L, Dunachie S. Pandemics, pathogenicity and changing molecular epidemiology of cholera in the era of global warming. Ann Clin Microbiol Antimicrob. 2017;16(1):10.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Irenge LM, Ambroise J, Mitangala PN, Bearzatto B, Kabangwa RKS, Durant JF, et al. Genomic analysis of pathogenic isolates of Vibrio cholerae from eastern Democratic Republic of the Congo (2014–2017). PLoS Negl Trop Dis. 2020;14(4):e0007642.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Petersen LM, Martin IW, Moschetti WE, Kershaw CM, Tsongalis GJ. Third-generation sequencing in the Clinical Laboratory: exploring the advantages and challenges of Nanopore sequencing. J Clin Microbiol. 2019;58(1).

  11. Kendall EA, Chowdhury F, Begum Y, Khan AI, Li S, Thierer JH, et al. Relatedness of Vibrio cholerae O1/O139 isolates from patients and their household contacts, determined by multilocus variable-number tandem-repeat analysis. J Bacteriol. 2010;192(17):4367–76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Garcia K, Gavilan RG, Hofle MG, Martinez-Urtaza J, Espejo RT. Microevolution of pandemic Vibrio parahaemolyticus assessed by the number of repeat units in short sequence tandem repeat regions. PLoS ONE. 2012;7(1):e30823.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This study was funded by the Belgian Cooperation Agency of the ARES (Académie de Recherche et d’Enseignement Supérieur) [grant COOP-CONV-20-022]. The funder did not play any role in the study design, collection, analysis, and interpretation of data, manuscript writing, or the decision to submit the paper for publication.

Author information

Authors and Affiliations

Authors

Contributions

J.A. and J.L.G. wrote the main manuscript text. B.B. J.F.D and L.I. collected the isolates and performed short and long-reads sequencing. All authors reviewed the manuscript.

Corresponding author

Correspondence to Jérôme Ambroise.

Ethics declarations

Ethics approval and consent to participate

The study protocols, including the method for collecting rectal swab samples and conducting genomic analysis, were approved by the Ethical Review Board (ERB) of the Institut Supérieur des Techniques Médicales de Bukavu, RDC (ISTM-BUKAVU/CRPS/CIES/ML/0016/2023). The ERB explicitly granted a waiver of written informed consent, citing compliance with international ethical guidelines for research conducted during severe outbreaks (e.g., CIOMS 2016). These guidelines recommend the use of oral informed consent for participants with low literacy, particularly in public health emergencies where written consent may be impractical. In accordance with this waiver, all participants provided oral informed consent prior to sample collection. The study also adhered to national regulations of the Democratic Republic of Congo governing research ethics in public health emergencies. To ensure participant confidentiality, all collected samples were fully anonymised during the genomic analysis process.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ambroise, J., Bearzatto, B., Durant, JF. et al. On the ability to extract MLVA profiles of Vibrio cholerae isolates from WGS data generated with Oxford Nanopore Technologies. BMC Res Notes 18, 18 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13104-025-07093-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13104-025-07093-7

Keywords