Common questions about the data, methodology, and how to interpret the numbers
ClinVar is a free, public database maintained by the U.S. National Institutes of Health (NIH). Clinical laboratories, research groups, and expert panels submit reports about genetic variants they've observed in patients, along with their assessment of whether each variant causes disease.
Think of it as a shared library where labs around the world contribute their findings about genetic variants. When a lab identifies a variant in a patient and determines it's disease-causing, they can submit that finding to ClinVar so others can benefit from the knowledge.
No — it's an approximate lower bound, not a precise count. Each number on the dashboard represents the number of times a pathogenic variant has been reported in ClinVar. It is almost certainly an undercount of the true number of diagnosed patients for the following reasons:
Not every diagnosis is submitted to ClinVar. When a lab finds a well-known variant in a patient, they may not bother submitting it to ClinVar because it's already been reported many times. This is especially true for common variants in well-studied diseases.
ClinVar launched in 2012–2013. Patients diagnosed before then aren't captured unless their data was retroactively imported. Many historical diagnoses were never added.
One submission doesn't always equal one patient. A single submission might represent multiple patients from the same family or cohort. Conversely, the same patient could appear in multiple submissions if tested by different labs.
ClinVar was established by the NIH in 2012 and began accepting submissions in earnest in 2013. When the database launched, existing data sources like OMIM (Online Mendelian Inheritance in Man) bulk-imported their variant records. That's why you see a cluster of entries dated April 4, 2013 — that was the import date, not the date those variants were originally discovered.
The upward trend after 2013 reflects the growing adoption of ClinVar as a standard repository for clinical genetic findings, as well as the broader expansion of genetic testing (including the rise of whole-exome and whole-genome sequencing).
Because different diseases require different numbers of genetic variants to cause illness, and we want the chart to reflect the closest approximation of patient counts rather than raw variant reports.
Dominant diseases (like SYNGAP1-related disorders) require only one pathogenic variant. So each submission roughly corresponds to one patient. In this case, estimated patients equals the number of submissions.
Recessive diseases (like cystic fibrosis / CFTR) require two pathogenic variants — one inherited from each parent. A patient with CF would typically have two variant submissions. So we divide the total submission count by two and round down to estimate the number of patients.
This adjustment is based on the Human Phenotype Ontology (HPO) classification of each gene's inheritance pattern. See the Methods page for the full table of inheritance modes and formulas.
These are standardized terms used by clinical genetics labs to describe how confident they are that a variant causes disease:
Pathogenic means there is strong evidence that this variant causes the disease in question. This is the highest confidence level.
Likely Pathogenic means there is good evidence that this variant causes disease, but the evidence isn't quite as strong as for "Pathogenic." In clinical practice, both categories are typically treated the same way for patient care decisions.
We exclude variants classified as "Uncertain Significance" (VUS), "Likely Benign," or "Benign" because there isn't sufficient evidence to say they cause disease.
We apply two main filters to ClinVar data to ensure we're counting variants that are truly specific to each gene, rather than large chromosomal events that happen to overlap the gene:
Variant type filter: We include nearly all variant types: point mutations (SNVs), insertions, deletions, duplications, copy number gains and losses, microsatellites (repeat expansions), and indels. We only exclude "Complex" and "Haplotype" types. Copy number variants are included because many diseases — such as Angelman syndrome — are primarily caused by large chromosomal deletions.
Size filter: We exclude any variant that spans more than 50 times the size of the gene (minimum 10 Mb). This removes whole-chromosome or chromosome-arm structural variants that overlap the gene but aren't gene-specific, while still capturing known pathogenic regional deletions (e.g., the ~5 Mb 15q11-q13 deletion in Angelman syndrome).
For full technical details, see the Methods page.
The dashboard fetches fresh data from ClinVar every 168 hours. ClinVar itself is updated on a rolling basis as labs submit new findings.
DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources) is a valuable resource for clinical genomics, but access to individual-level variant data requires a formal data access agreement with the DECIPHER consortium.
While DECIPHER provides publicly available aggregate summary statistics, these do not include the per-variant detail needed for variant-level analysis. Adding DECIPHER aggregate counts without deduplication would risk double-counting patients who are already represented in ClinVar.
ClinVar is the most comprehensive, well-structured, and regularly updated public archive of clinically relevant genetic variants. Each submission (SCV record) includes structured metadata — variant coordinates, classification, submitting laboratory, and submission date — enabling robust temporal and submitter-level analysis.
Other databases such as LOVD and Geno2MP were considered but excluded because they lack the temporal granularity needed for trend analysis, have inconsistent update schedules, or require additional assumptions that reduce confidence in the resulting metrics.
Yes, but with appropriate caveats. We recommend noting that ClinVar submission counts represent an approximate lower bound of identified patients, not a comprehensive epidemiological measure. See the Methods page for language suitable for scientific publications.