A Community Standard Reporting Format for KIR Genotyping Data

Martin Maiers1, Rebecca Cullen2, Raja Rajalingam3, Harriet Noreen4, Neng Yu5, Elaine Reed3, Steven GE Marsh6, Stephen Spellman2, Libby Guethlein7, Elisabeth Trachtenberg8 Sarah Cooley4

  1. Bioinformatics, National Marrow Donor Program, Minneapolis, MN, USA
  2. Scientific Services, National Marrow Donor Program, Minneapolis, MN, USA
  3. UCLA Immunogenetics Center, University of California at Los Angeles, Los Angeles, CA
  4. University of Minnesota, Minneapolis, MN, USA
  5. HLA Laboratory, American Red Cross Blood Services, Northeast Division, Dedham, MA
  6. Anthony Nolan Research Institute, London, England
  7. School of Medicine, Stanford University, Stanford, CA, USA
  8. Children’s Hospital Oakland Research Institute, Oakland, CA, USA

KIR genes encode activating and inhibitory receptors that regulate the function of natural killer cells which may be important in donor selection for stem cell transplantation. Current KIR typing methodologies cannot resolve the extensive allelic variation of the 17 KIR genes. Although an XML standard for KIR genotype reporting is being developed, most laboratories need a reliable data format for sharing data electronically via spreadsheets. Our proposed format reports any level of allelic resolution, distinguishes between haploid and diploid ambiguities, and reports the observed number of loci, and is easily translated in to XML and parsed for downstream storage and analysis. By using a combination of 4 symbols with hierarchical precedence as defined in the table, the typing result for each KIR gene is entered into a single spreadsheet cell with no ambiguities in interpretation. For example: “001+007|006+010” represents a 3DL2 result with two possible genotypes:

  1. heterozygous for the alleles 001 and 007 or
  2. heterozygous for the alleles 006 and 010

Alternatively, the string “001+002/008” represents a 3DL2 genotype result showing heterozygosity, with the allele 001 on one chromosome and either 002 or 008 on the other. If haplotype “phase” is known between two genes (based on the output of segregation analysis, haplotype estimation programs or typing methods that separate haplotypes) this can be represented with a specific “in cis� symbol. We have used this scheme successfully to transmit genotyping results for two large-scale KIR typing projects, finding that it balances the need for rich data representation standards with the accessibility and ubiquity of spreadsheets.�We propose this format to the community as a standard for representing KIR allele typing data.

Spreadsheets have been used to report KIR typing results to the NMDP for the High Resolution KIR typing project. The current spreadsheet format for the NMDP KIR Typing project has one row per locus. The format uses the separator symbols that are described below and and allow a clear and unambiguous interpretation to be programmed easily for downstream storage analysis.


Symbol Meaning Precedence
|
Genotype list separator
1
+
Gene (copy) separator
2
~
Gene separator (in cis)
3
/
Allele list separator
4

In order to represent a typing with two possible genotypes (e.g. 001+002 or 004+006), the “|” character is used, resulting in the following string: 001+002|004+006.


Project Name Reporting Center Sample ID Center Code Interpretation Date Gene Allele Present Comments
KIR P3 974 0082-7767-5 0058 20060310 2DL1 00102 Y  
KIR P3 974 0082-7767-5 0058 20060310 2DL2 001+003 Y  
KIR P3 974 0082-7767-5 0058 20060310 2DL3 002 Y  
KIR P3 974 0082-7767-5 0058 20060310 2DL4   Y  
KIR P3 974 0082-7767-5 0058 20060310 2DL5 001+003 Y  
KIR P3 974 0082-7767-5 0058 20060310 2DS1   Y  
KIR P3 974 0082-7767-5 0058 20060310 2DS2   N  
KIR P3 974 0082-7767-5 0058 20060310 2DS3 001 Y  
KIR P3 974 0082-7767-5 0058 20060310 2DS4 001+002|004+006 Y  
KIR P3 974 0082-7767-5 0058 20060310 2DS5 003/004/005+001 Y  
KIR P3 974 0082-7767-5 0058 20060310 3DL1   Y  
KIR P3 974 0082-7767-5 0058 20060310 3DL2   N  
KIR P3 974 0082-7767-5 0058 20060310 3DL3 NEW Y D1 codon151 CAC>CAY;
D1 codon147ATT>RTT;
TM codon300 AAC>CAC  Asn>His
KIR P3 974 0082-7767-5 0058 20060310 3DS1   Y  

Note: representing >2 copies of a gene

If three copies of a gene are observed (3 different alleles), this can be represented simply with two “+” symbols: 001+002+003

Note: representing haplotype phase

If haplotype “phase” is known between two genes (based on the output of pedigree analysis, haplotype estimation programs or typing methods that separate haplotypes) this can be represented with the “~” symbol: 2DL2*001~2DS5*003

Note: conversion to/from XML

During the past year, the NMDP began using the same symbol system proposed here (with “,” as a synonym for “+”) to represent HLA allele, gene and genotype lists.  Tools have been developed for conversion between this “string” format and XML.  Web tools and utilities for accepting this format have been developed and have been used successfully in a number of projects. Using this same system for KIR will allow reuse of these tools and methods and facilitate much easier data conversion, communication and database import/export.

An example spreadsheet with annotations of the above points is available to download.

References

Maiers M, Spellman S, Marsh SGE, Parham P, Rajalingam R, Reed E, Noreen H, Yu N, Cooley S. A community standard reporting format for KIR genotyping data. Human Immunology (2007) 68 S105

 

IPD-KIR