Data release overview

Item Description
Study Title Genome-wide study of resistance to severe malaria in eleven worldwide populations
Release date December 2019
Release version 1
Study URL https://www.malariagen.net/resource/25
References https://doi.org/10.1038/s41467-019-13480-z

Background

This dataset contains a set of association test summary statistics for association tests between severe malaria cases and population controls collected in eleven populations as shown in the following table:

Table 1: The number of individuals from each population included in association tests
population acronym cases controls total
Gambia Gambia 2461 2518 4979
Mali Mali 259 163 422
Burkina Faso BurkinaFaso 711 583 1294
Ghana Ghana 391 315 706
Nigeria Nigeria 112 21 133
Cameroon Cameroon 583 634 1217
Malawi Malawi 1161 1310 2471
Tanzania Tanzania 410 388 798
Kenya Kenya 1529 1539 3068
Vietnam Vietnam 703 544 1247
Papua New Guinea PNG 379 342 721
TOTAL 8699 8357 1705

All cases were diagnosed with meeting WHO definitions of severe malaria (see reference [2]), while controls were samples from within the general population and from new births. Underlying genotypes for these samples have also been deposited in the European Genome-Phenome Archive (EGA) under study ID EGAS00001001311 https://ega-archive.org/studies/EGAS00001001311.

This README file contains information on on the following relevant to the dataset:

Terms of use

The summary statistics included here, including allele frequency estimates and association test effect size estimates, parameter standard errors and evidence measures, are made freely available for use. Please include the following text as an acknowledgement in any publication arising from the use of this data: “This study makes use of data generated by MalariaGEN. A full list of the investigators who contributed to the generation of the data is available from www.malariagen.net. Funding for this project was provided by Wellcome Trust (WT077383/Z/05/Z) and the Bill & Melinda Gates Foundation through the Foundation of the National Institutes of Health (566) as part of the Grand Challenges in Global Health Initiative.”

References and citation

This dataset was created for and described in the following manuscript:

[1] Malaria Genomic Epidemiology Network, “New insights into malaria susceptibility from the genomes of 17,000 individuals from Africa, Asia, and Oceania”, Nature Communications (2019). https://doi.org/10.1038/s41467-019-13480-z; bioRxiv link: https://doi.org/10.1101/535898

Please cite the above article if you make use of data from this release in disseminated work.

Details of data

This release includes summary statistics from frequentist and Bayesian meta-analysis of association test with severe malaria (SM), as well as severe malaria subphenotypes (CM, SMA and OTHER) as defined in the following table:

Table 2: phenotype abbreviations
Phenotype abbreviation Description
SM Severe malaria (equivalent to CM, SMA or OTHER)
CM Cerebral malaria
SMA Severe malarial anaemia
OTHER Other or nonspecific severe malaria

The following table summarises the available files:

Files included in this data release.
Filename Description
MalariaGEN_2019_summary_statistics_releasenote.html This release note
MalariaGEN_2019_summary_statistics_releasenote.md This release note as a Markdown document
MalariaGEN_combined_evidence.csv.gz (1.6Gb) Summary of meta-analysis, including overall P-values and Bayes factors
MalariaGEN_case-control:add:effects.csv.gz (666Mb) Fixed-effect meta-analysis of association with SM, assuming additive effect
MalariaGEN_case-control:dom:effects.csv.gz (664Mb) Fixed-effect meta-analysis of association with SM, assuming dominance effect of ‘B’ allele
MalariaGEN_case-control:rec:effects.csv.gz (390Mb) Fixed-effect meta-analysis of association with SM, assuming recessive effect of ‘B’ allele
MalariaGEN_case-control:het:effects.csv.gz (667Mb) Fixed-effect meta-analysis of association with SM, assuming heterozygote effect
MalariaGEN_subphenotype:add:effects.csv.gz (1.2Gb) Fixed-effect meta-analysis of association with SM subtypes, assuming additive effect
MalariaGEN_subphenotype:dom:effects.csv.gz (1.2Gb) Fixed-effect meta-analysis of association with SM subtypes, assuming dominance effect of ‘B’ allele
MalariaGEN_subphenotype:rec:effects.csv.gz (695Mb) Fixed-effect meta-analysis of association with SM subtypes, assuming recessive effect of ‘B’ allele
MalariaGEN_subphenotype:het:effects.csv.gz (1.2Gb) Fixed-effect meta-analysis of association with SM subtypes, assuming heterozygote effect
MalariaGEN_case-control:add:bfs.csv.gz (8.3Gb) Bayesian meta-analysis of association with SM, assuming additive effect
MalariaGEN_case-control:dom:bfs.csv.gz (7.8Gb) Bayesian meta-analysis of association with SM, assuming dominance effect of ‘B’ allele
MalariaGEN_case-control:rec:bfs.csv.gz (3.7Gb) Bayesian meta-analysis of association with SM, assuming recessive effect of ‘B’ allele
MalariaGEN_case-control:het:bfs.csv.gz (8.2Gb) Bayesian meta-analysis of association with SM, assuming heterozygote effect
MalariaGEN_subphenotype:add:bfs.csv.gz (2.1Gb) Bayesian meta-analysis of association with SM subtypes, assuming additive effect
MalariaGEN_subphenotype:dom:bfs.csv.gz (2.0Gb) Bayesian meta-analysis of association with SM subtypes, assuming dominance effect of ‘B’ allele
MalariaGEN_subphenotype:rec:bfs.csv.gz (1.1Gb) Bayesian meta-analysis of association with SM subtypes, assuming recessive effect of ‘B’ allele
MalariaGEN_subphenotype:het:bfs.csv.gz (2.0Gb) Bayesian meta-analysis of association with SM subtypes, assuming heterozygote effect
MalariaGEN_case-control:add:per_population.csv.gz (3.2Gb) Per-population results for association with SM, assuming additive effect
MalariaGEN_case-control:dom:per_population.csv.gz (3.1Gb) Per-population results for association with SM, assuming dominance effect of ‘B’ allele
MalariaGEN_case-control:rec:per_population.csv.gz (3.2Gb) Per-population results for association with SM, assuming recessive effect of ‘B’ allele
MalariaGEN_case-control:het:per_population.csv.gz (3.2Gb) Per-population results for association with SM, assuming heterozygote effect
MalariaGEN_subphenotype:add:per_population.csv.gz (6.3Gb) Per-population results for association with SM subtypes, assuming additive effect
MalariaGEN_subphenotype:dom:per_population.csv.gz (6.2Gb) Per-population results for association with SM subtypes, assuming dominance effect of ‘B’ allele
MalariaGEN_subphenotype:rec:per_population.csv.gz (3.2Gb) Per-population results for association with SM subtypes, assuming recessive effect of ‘B’ allele
MalariaGEN_subphenotype:het:per_population.csv.gz (6.3Gb) Per-population results for association with SM subtypes, assuming heterozygote effect

Below we describe the contents of each of the files provided in this dataset.

Methods summary

A full set of methods can be found in [1]. In brief, genotypes were obtained by typing each sample on the Illumina Omni 2.5M platform, followed by imputation into the 1000 Genomes Reference panel (1000GP) and into a custom panel (“combined panel”) obtained by adding additional whole-genome sequenced samples (https://ega-archive.org/studies/EGAS00001003648) to the 1000GP. Additionally, HLA alleles were imputed using HLA*IMP:02 and glycophorin region CNV alleles were imputed using a panel described previously (Leffler et al, https://doi.org/10.1126/science.aam6393).

Association tests were conducted by logistic regression in each of the eleven populations in Table 1, include 5 principal components as covariates, using SNPTEST (http://www.well.ox.ac.uk/~gav/snptest). Association test results were then meta-analysed using BINGWA (http://www.well.ox.ac.uk/~gav/bingwa), under both frequentist fixed-effect meta-analysis and a flexible bayesian meta-analysis framework described in [1].

Additionally, we implemented a multinomial logistic regression method to test each genetic variant against severe malaria subtype as defined in our data (CM, SMA or other severe malaria; c.f. table 2). Case samples identified as having both CM and SMA were excluded from these tests. Subphenotype association test results were also meta-analysed using BINGWA in a multivariate meta-analysis framework.

Common columns

The meta-analysis results files have the following common columns:

Table 3: columns shared between analysis results files
Column Description
variant_id An internal identifier for this variant. This can be used to match between files.
analysis_id An internal identifier reflecting the imputation panel used for this variant, as described below. This can be used to match between files.
analysis A string identifier reflecting the imputation panel used for this variant, as described below.
chromosome The chromosome the variant maps to
position The position the variant maps to
rsid The rsid (or other identifier) of the variant
alleleA The reference allele of the variant
alleleB The non-reference allele of the variant
A The expected count of haploid reference calls (only nonzero for X chromosome variants).
B The expected count of haploid non-reference calls (only nonzero for X chromosome variants).
AA The expected count of diploid homozygous reference allele calls.
AB The expected count of diploid heterozygous calls.
BB The expected count of diploid homozygous non-reference allele calls.
N Total sample size included in the meta-analysis

Note: In all files the variant_id column refers to the specific variant (i.e. the specific combination of chromosome, position and alleles) being analysed. The analysis_id and analysis columns refer to the imputation reference panel from which the variant was imputed. To match results between files, users should match on both the variant_id and analysis_id columns. Imputation refrence panels are detialed in the following table.

Table 4: analysis column values and imputation reference panels
analysis_id analysis description
1 gwas Combined 1000GP / MalariaGEN reference panel (autosomal variants only)
2 1000GP 1000GP reference panel imputation of autosomal variants
6 1000GP:X 1000GP reference panel imputation of X chromosome variants
3 hlaimp Imputation of HLA alleles using HLA:IMP*02.
4 GYP.all imputation of glycophorin region SNPs and INDELs, from the panel in [2]
5 GYP.cnvs imputation of glycophorin region CNVs, from the panel in [2]

Combined meta-analysis results

An overview of meta-analysis results can be found in this file:

MalariaGEN_summary_statistics_combined_evidence.csv.gz

This is a gzipped comma-seperated file which contains overall measures of evidence under additive, dominant, recessive and heterozygote modes of inheritance, as well as an overall model-averaged Bayes factor and indication of the best-fitting model.

This file has the following columns:

Table 5: columns in the combined evidence file
Column Description
(common columns) As described above
effective_minor_allele_count The effective minor allele count, computed across all samples as described below
included_cohorts A string of eleven 1’s and 0’s indicating which per-population estimates were included in meta-analysis.
case-control:add:pvalue P-value under an additive model of association with case/control status
case-control:dom:pvalue P-value under a dominant model of association of the non-reference allele with case/control status
case-control:rec:pvalue P-value under a recessive model of association of the non-reference allele with case/control status
case-control:het:pvalue P-value under a heterozygote model of association with case/control status
case-control:add:mean_bf Model-averaged Bayes factor (BF) under an additive model of association with case/control status
case-control:dom:mean_bf Model-averaged BF under a dominant model of association of the non-reference allele with case/control status
case-control:rec:mean_bf Model-averaged BF under a recessive model of association of the non-reference allele with case/control status
case-control:het:mean_bf Model-averaged BF under a heterozygote model of association with case/control status
case-control:mean_bf Model-averaged BF under a model of association with case/control status, averaged over mode of inheritance using the weights specified in [1].
subphenotype:add:pvalue P-value under an additive model of association with malaria subtype
subphenotype:dom:pvalue P-value under a dominant model of association of the non-reference allele with malaria subtype
subphenotype:rec:pvalue P-value under a recessive model of association of the non-reference allele with malaria subtype
subphenotype:het:pvalue P-value under a heterozygote model of association with malaria subtype
subphenotype:add:mean_bf Model-averaged BF under an additive model of association with malaria subtype
subphenotype:dom:mean_bf Model-averaged BF under a dominant model of association of the non-reference allele with malaria subtype
subphenotype:rec:mean_bf Model-averaged BF under a recessive model of association of the non-reference allele with malaria subtype
subphenotype:het:mean_bf Model-averaged BF under a heterozygote model of association with malaria subtype
subphenotype:mean_bf Model-averaged BF under a model of association malaria subtype, averaged over mode of inheritance.
add:mean_bf Model-averaged BF under an additive model of association malaria subtype, averaged over case-control and subphenotype effect models using the weights specified in [1]
dom:mean_bf Model-averaged BF under a dominant model of association of the non-reference allele, averaged over case-control and subphenotype effect models using the weights specified in [1]
rec:mean_bf Model-averaged BF under a recessive model of association malaria subtype, averaged over case-control and subphenotype effect models using the weights specified in [1]
het:mean_bf Model-averaged BF under a heterozygote model of association malaria subtype, averaged over case-control and subphenotype effect models using the weights specified in [1]
mean_bf Overall model-averaged BF, using weights specified in [1]. These values are presented in Figure 2 of [1].
best_posterior_model The model with the highest posterior weight amongst all those included in mean_bf.

Further notes:

Mode-specific frequentist meta-analysis results

Detailed meta-analysis results for fixed-effect meta-analysis under specific mode of inheritance are available in these files:

`MalariaGEN_case-control:[mode]:effects.csv.gz`
`MalariaGEN_subphenotype:[mode]:effects.csv.gz`

where mode is ‘add’ (for additive model), ‘dom’ (dominant effect of the non-reference allele), ‘rec’ (recessive effect of the non-reference allele), or ‘het’ (heterozygote effect). The case-control files contain results of meta-analysis of logistic regression against case-control status in each population, and the subphenotype files contain results of meta-analysis analysis of multinomial logistic regression against malaria subtypes in each population.

The following tables list the columns of these files.

Table 6: columns in the case-control effects file
Column Description
(common columns) As described above
included_betas A string of eleven 1’s and 0’s indicating which per-population estimates were included in meta-analysis.
mode Assumed mode of inheritance; either “add”, “dom”, “rec”, or “het”.
beta The estimated log odds ratio for effect of the non-reference allele on SM, computed using fixed-effect inverse variance weighted meta-analysis across included cohorts.
se The estimated standard error for beta
pvalue Wald test P-value for beta
Table 7: columns in the subphenotype effects file
Column Description
(common columns) As described above
included_betas A string of eleven 1’s and 0’s indicating which per-population estimates were included in meta-analysis.
mode Assumed mode of inheritance; either “add”, “dom”, “rec”, or “het”.
beta_1/CM Estimate log odds ratio for effect of the non-reference allele on CM, computed using fixed-effect inverse variance weighted meta-analysis across included cohorts.
se_1 Estimated standard error for beta_1/CM
wald_pvalue_1 Wald test P-value for beta_1/CM
beta_2/OTHER Estimated log odds ratio for effect of non-reference allele on nonspecific severe malaria
se_2 Estimated standard error for beta_2/OTHER
wald_pvalue_2 Wald test P-value for beta_2/OTHER
beta_3/SMA Estimated log odds ratio for effect of non-reference allele on severe malaria anaemia
se_3 Estimated standard error for beta_3/SMA
wald_pvalue_3 Wald test P-value for beta_3/SMA
cov_1,2 Estimated covariance between beta_1/CM and beta_2/OTHER
cov_1,3 Estimated covariance between beta_1/CM and beta_3/SMA
cov_2,3 Estimated covariance between beta_2/OTHER and beta_3/SMA
pvalue Overall P-value for beta_1..beta_3

Detailed bayesian meta-analysis results

Detailed meta-analysis results for bayesian meta-analysis under specific mode of inheritance are available in these files:

`MalariaGEN_case-control:[mode]:bfs.csv.gz`
`MalariaGEN_subphenotype:[mode]:bfs.csv.gz`

where [mode] is one of: add (additive effect), dom (dominant effect of the non-reference allele), rec (recessive effect of the non-reference allele), or het (heterozygote effect).

Results are presented as a set of Bayes factors (BFs). All Bayes factors were computed assuming an asymptotic approximation and a Gaussian prior on the effect size variance σ2, and we used an equal mixture of σ=0.2, 0.4, 0.6, 0.8 throughout. Details of model assumptions and prior weights can be found in [1].

The following table lists the columns of the case-control Bayesian analysis files.

Table 8: columns in the case-control Bayesian meta-analysis file
Column Description
(common columns) As described above
included_betas A string of eleven 1’s and 0’s indicating which per-population estimates were included in meta-analysis.
mode Assumed mode of inheritance; either “add”, “dom”, “rec”, or “het”.
Gambia:bf BF for association using only data from The Gambia
Mali:bf BF for association using only data from Mali
BurkinaFaso:bf BF for association using only data from Burkina Faso
Ghana:bf BF for association using only data from Ghana
Nigeria:bf BF for association using only data from Nigeria
Cameroon:bf BF for association using only data from Camaeroon
Malawi:bf BF for association using only data from Malawi
Tanzania:bf BF for association using only data from Tanzania
Kenya:bf BF for association using only data from Kenya
Vietnam:bf BF for association using only data from Vietnam
PNG:bf BF for association using only data from Papua New Guinea
fix:[populations]:bf BF under fixed-effect model of effect across specified populations, where populations denotes a string of eleven 1’s and 0’s as described below
cor:[populations]:bf BF under correlated-effect model of effect across specified populations
ind:[populations]:bf BF under independent-effect model of effect across specified populations
str:bf BF under a structured effect model across populations
mean_bf Model-averaged BF for case-control effects for the specific mode, across a subset of models with weights as described in [1].
max_bf_model The model with the highest Bayes factor across all those those tested
max_bf The highest BF
best_posterior_model The model with the highest posterior weight, given the weights specified our manuscript [1]
best_posterior The highest posterior weight
2nd_best_posterior_model The model with the second highest posterior weight
2nd_best_posterior The second highest posterior weight

Bayes factor columns for population groups are encoded using the populations indicator, which is a string of 0’s and 1’s indicating whether the effect is assumed nonzero or zero in each population. For this purpose are taken in the order shown in Table 1 (i.e. roughly west-east order). Examples are given below:

Table 9: examples of Bayes factor column names
Example Description
fix:11111111111:bf Fixed-effect model of effects across all populations
fix:11111111100:bf Fixed-effect model of effects restricted to African populations
fix:10000000000:bf Gambia-specific effect
cor:11111111100:bf Correlated-effect model of effects restricted to African populations
ind:00000011100:bf Independent-effect model of effects restricted to east African populations

See [1] for full details of models included.

The following table lists the columns of the subphenotype Bayesian analysis files.

Table 10: columns in the subphenotype Bayesian meta-analysis file
Column Description
(common columns) As described above
included_betas A string of 1’s and 0’s (3 per each of the eleven populations) indicating which per-population estimates were included in meta-analysis.
Gambia:bf BF using only data from The Gambia, assuming independent effects between phenotypes
Mali:bf BF using only data from Mali, assuming independent effects between phenotypes
BurkinaFaso:bf BF using only data from Burkina Faso, assuming independent effects between phenotypes
Ghana:bf BF using only data from Ghana, assuming independent effects between phenotypes
Nigeria:bf BF using only data from Nigeria, assuming independent effects between phenotypes
Cameroon:bf BF using only data from Camaeroon, assuming independent effects between phenotypes
Malawi:bf BF using only data from Malawi, assuming independent effects between phenotypes
Tanzania:bf BF using only data from Tanzania, assuming independent effects between phenotypes
Kenya:bf BF using only data from Kenya, assuming independent effects between phenotypes
Vietnam:bf BF using only data from Vietnam, assuming independent effects between phenotypes
PNG:bf BF using only data from Papua New Guinea, assuming independent effects between phenotypes
cm_sma_other_cor:bf BF for correlated-effect model of effects on CM, SMA and OTHER cases
cm_sma_other_fix:bf BF for fixed-effect model of effects on CM, SMA and OTHER cases (similar to a case-control effect)
cm_sma_other_ind:bf BF for independent-effect model of effects on CM, SMA and OTHER cases
cm_sma_cor:bf BF for correlated-effect model of effects on CM and SMA cases
cm_sma_fix:bf BF for fixed-effect model of effects on CM and SMA cases
cm_sma_ind:bf BF for independent-effect model of effects on CM and SMA cases
cm_other_cor:bf BF for correlated-effect model of effects on CM and OTHER cases
cm_other_fix:bf BF for fixed-effect model of effects on CM and OTHER cases
cm_other_ind:bf BF for independent-effect model of effects on CM and OTHER cases
sma_other_cor:bf BF for correlated-effect model of effects on SMA and OTHER cases
sma_other_fix:bf BF for fixed-effect model of effects on SMA and OTHER cases
sma_other_ind:bf BF for independent-effect model of effects on SMA and OTHER cases
cm:bf BF for model of effects restricted to CM cases
other:bf BF for model of effects restricted to OTHER cases
sma:bf BF for model of effects restricted to SMA cases
mean_bf Model-averaged BF for subphenotype effects for the specific mode, across a subset of models with weights as described in [1].
max_bf_model The model with the highest BF across all those those tested
max_bf The highest BF
best_posterior_model The model with the highest posterior weight, given the weights specified our manuscript [1]
best_posterior The highest posterior weight
2nd_best_posterior_model The model with the second highest posterior weight
2nd_best_posterior The second highest posterior weight

Per-population results files

Per-population results, including estimated allele frequency estimates, IMPUTE info scores, and per-population association test results can be found in these files:

`MalariaGEN_case-control:[mode]:percohort.csv.gz`
`MalariaGEN_subphenotype:[mode]:percohort.csv.gz`

The following table lists columns common to these files in addition to those listed above:

Column Description
(common columns) As described above
mode Assumed mode of inheritance; either “add”, “dom”, “rec”, or “het”.
[population]:N Total sample size of non-missing genotypes in this population (computed as the sum of imputed genotype probabilities for non-missing genotypes)
[population]:B_allele_frequency Estimated frequency of the ‘B’ (non-reference) allele in this population
[population]:minor_predictor_count Minor predictor count in this population, as described below.
[population]:all_info IMPUTE info measure computed across all samples in this population
[population]:comment comment field, as output by SNPTEST. Values other than NA reflect potential model fit errors.
[population]:trusted Indicator of whether the estimate for this population was included in meta-analysis, as described below.

In the above, [population] refers to the acronym column in Table 1, i.e. is one of Gambia, Mali, BurkinaFaso, Ghana, Nigeria, Cameroon, Malawi, Tanzania, Kenya, Vietnam, or PNG. Results are provided for all eleven populations.

The following table lists association test-related columns found in the case-control files:

Column Description
[population]:beta_1:SM Estimated log odds ratio for effect of the non-reference allele on SM in this population, computed using logistic regression
[population]:se_1 The estimated standard error for the effect size estimate in this population.
[population]:pvalue The Wald test P-value against the null that the effect is zero in this population.

The following table lists association test-related columns found in the subphenotype files:

Column Description
[population]:beta_1:CM Estimated log odds ratio for effect of the non-reference allele on CM in this population, computed using multinomial logistic regression
[population]:se_1 The estimated standard error for the effect size estimate in this population.
[population]:beta_1:OTHER Estimated log odds ratio for effect of the non-reference allele on OTHER in this population
[population]:se_2 The estimated standard error for the effect size estimate in this population.
[population]:beta_1:SMA Estimated log odds ratio for effect of the non-reference allele on SMA in this population
[population]:se_3 The estimated standard error for the effect size estimate in this population.
[population]:cov_1,2 The estimated covariance between beta_1 and beta_2 in this population.
[population]:cov_1,3 The estimated covariance between beta_1 and beta_3 in this population.
[population]:cov_3,3 The estimated covariance between beta_2 and beta_3 in this population.
[population]:pvalue P-value against the null that all three effects are zero in this population.

URLs