A Community Standard Reporting Format for KIR Genotyping Data
Martin Maiers1, Rebecca Cullen2, Raja Rajalingam3, Harriet Noreen4, Neng Yu5, Elaine Reed3, Steven GE Marsh6, Stephen Spellman2, Libby Guethlein7, Elisabeth Trachtenberg8 Sarah Cooley4
- Bioinformatics, National Marrow Donor Program, Minneapolis, MN, USA
- Scientific Services, National Marrow Donor Program, Minneapolis, MN, USA
- UCLA Immunogenetics Center, University of California at Los Angeles, Los Angeles, CA
- University of Minnesota, Minneapolis, MN, USA
- HLA Laboratory, American Red Cross Blood Services, Northeast Division, Dedham, MA
- Anthony Nolan Research Institute, London, England
- School of Medicine, Stanford University, Stanford, CA, USA
- Children’s Hospital Oakland Research Institute, Oakland, CA, USA
KIR genes encode activating and inhibitory receptors that regulate the function of natural killer cells which may be important in donor selection for stem cell transplantation. Current KIR typing methodologies cannot resolve the extensive allelic variation of the 17 KIR genes. Although an XML standard for KIR genotype reporting is being developed, most laboratories need a reliable data format for sharing data electronically via spreadsheets. Our proposed format reports any level of allelic resolution, distinguishes between haploid and diploid ambiguities, and reports the observed number of loci, and is easily translated in to XML and parsed for downstream storage and analysis. By using a combination of 4 symbols with hierarchical precedence as defined in the table, the typing result for each KIR gene is entered into a single spreadsheet cell with no ambiguities in interpretation. For example: “001+007|006+010” represents a 3DL2 result with two possible genotypes:
- heterozygous for the alleles 001 and 007 or
- heterozygous for the alleles 006 and 010
Alternatively, the string “001+002/008” represents a 3DL2 genotype result showing heterozygosity, with the allele 001 on one chromosome and either 002 or 008 on the other. If haplotype “phase” is known between two genes (based on the output of segregation analysis, haplotype estimation programs or typing methods that separate haplotypes) this can be represented with a specific “in cis symbol. We have used this scheme successfully to transmit genotyping results for two large-scale KIR typing projects, finding that it balances the need for rich data representation standards with the accessibility and ubiquity of spreadsheets. We propose this format to the community as a standard for representing KIR allele typing data.
Spreadsheets have been used to report KIR typing results to the NMDP for the High Resolution KIR typing project. The current spreadsheet format for the NMDP KIR Typing project has one row per locus. The format uses the separator symbols that are described below and and allow a clear and unambiguous interpretation to be programmed easily for downstream storage analysis.
| Symbol | Meaning | Precedence |
|---|---|---|
| | Genotype list separator | 1 |
+ | Gene (copy) separator | 2 |
~ | Gene separator (in cis) | 3 |
/ | Allele list separator | 4 |
In order to represent a typing with two possible genotypes (e.g. 001+002 or 004+006), the “|” character is used, resulting in the following string: 001+002|004+006.
| Project Name | Reporting Center | Sample ID | Center Code | Interpretation Date | Gene | Allele | Present | Comments |
|---|---|---|---|---|---|---|---|---|
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 2DL1 | 00102 | Y | |
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 2DL2 | 001+003 | Y | |
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 2DL3 | 002 | Y | |
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 2DL4 | Y | ||
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 2DL5 | 001+003 | Y | |
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 2DS1 | Y | ||
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 2DS2 | N | ||
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 2DS3 | 001 | Y | |
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 2DS4 | 001+002|004+006 | Y | |
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 2DS5 | 003/004/005+001 | Y | |
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 3DL1 | Y | ||
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 3DL2 | N | ||
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 3DL3 | NEW | Y | D1 codon151 CAC>CAY; D1 codon147ATT>RTT; TM codon300 AAC>CAC Asn>His |
| KIR P3 | 974 | 0082-7767-5 | 0058 | 20060310 | 3DS1 | Y |
Note: representing >2 copies of a gene
If three copies of a gene are observed (3 different alleles), this can be represented simply with two “+” symbols: 001+002+003
Note: representing haplotype phase
If haplotype “phase” is known between two genes (based on the output of pedigree analysis, haplotype estimation programs or typing methods that separate haplotypes) this can be represented with the “~” symbol: 2DL2*001~2DS5*003
Note: conversion to/from XML
During the past year, the NMDP began using the same symbol system proposed here (with “,” as a synonym for “+”) to represent HLA allele, gene and genotype lists. Tools have been developed for conversion between this “string” format and XML. Web tools and utilities for accepting this format have been developed and have been used successfully in a number of projects. Using this same system for KIR will allow reuse of these tools and methods and facilitate much easier data conversion, communication and database import/export.
An example spreadsheet with annotations of the above points is available to download.
References
Maiers M, Spellman S, Marsh SGE, Parham P, Rajalingam R, Reed E, Noreen H, Yu N, Cooley S. A community standard reporting format for KIR genotyping data. Human Immunology (2007) 68 S105