How to assess the accuracy of SNP detection with DNA sequencing?

Comparison with Reference Data

The authors conditionally considered the Illumina HiSeq 2500 data as a standard reference to evaluate the accuracy of the MGISEQ-2000 data. This involved calculating “error rates” such as “False Positive” and “False Negative” rates in the MGISEQ-2000 dataset (E704-M) using the Illumina dataset (E704-I) as a benchmark .


Variant Calling Analysis

The authors utilized multiple software packages for variant calling, including Strelka2, to analyze the datasets generated by both platforms. They reported the total number of SNPs detected, the sensitivity of SNP determination, and the false positive rate (FPR) for the MGISEQ-2000 relative to the Illumina data. The sensitivity for SNP detection in the MGISEQ-2000 sample was found to be 99.51%, with an FPR of 0.000254% .


F1 Metrics Calculation

They calculated the F1 metric, which is a measure of a test’s accuracy that considers both precision and recall. For SNPs, the F1 metric was reported as 99.65%, indicating high accuracy in SNP detection for the MGISEQ-2000 platform .


Indel Detection Accuracy

Similar assessments were made for indel detection, where the sensitivity was reported as 98.84% with an F1 metric of 98.81%. This further demonstrated the reliability of the MGISEQ-2000 in detecting genomic variants .


Through these methods, the authors were able to conclude that the MGISEQ-2000 platform provided a high level of accuracy in SNP detection, comparable to that of the Illumina HiSeq 2500.

Source:

https://doi.org/10.1371/journal.pone.0230301

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *