Automated Checks for Allele Frequency Studies on leapdna Explore
Starting on February 2023, leapdna performs some checking of the data in allele frequency studies available on leapdna Explore.
These checks are intended to provide more context to leapdna users so that they can more quickly find an allele frequency study that matches their needs, and do not necessarily reflect the quality of the data reported by the study authors. In other words, a frequency study which fails some or all of the checks ran by leapdna can still be a high-quality population study, especially if the authors address some of the concerns raised by leapdna checks on the paper that originally reported allele frequencies. Similarly, a study passing all checks need not be perfect or well-suited for all purposes—leapdna checks are far from exhaustive.
The rest of this page provides some details about each of the automated checks that leapdna runs on all allele frequency studies in the Explore database.
Normalized frequencies
This check flags studies where allele frequencies do not sum up to 1 for one or more loci. When all frequencies for each locus add up to 1, this check passes. It is common to say that frequencies that satisfy this condition are normalized (although the term is also refers to the procedure of scaling frequencies so that they sum up to 1).
The original motivation for this test was to allow users to check if a study on leapdna Explore was suitable for use with statistical software, given that a lot of these programs expect normalized allele frequencies and will not work properly if they significantly deviate from this standard. However, we noticed that some studies contain mistakes which make allele frequencies add up to 1.6 or more for some loci.
To account for the limitations of representing real numbers in computers (floating point imprecision), this check has a tolerance currently set at 0.001. In other words, Studies where frequencies add up to anything between 0.999 and 1.001 will pass this check.
Misspelled locus names
This check looks for common misspellings in locus names such as
changing a number 0
for the letter O
.
This check works with a list of common misspellings instead of comparing locus names reported by the authors against a database of valid locus names. This eliminates the possibility of producing false positives, but it may not catch all misspellings. In particular, this check does not enforce using the standard name of a locus (e.g. it won't flag vWA as a misspelling of VWA).