Release of the PDBe mmCIF Validator: VSCode extension and Python script for PDBx/mmCIF validation

PDBe mmCIF Validator

We are pleased to announce the release of the PDBe mmCIF Validator, a tool for validating mmCIF files against the official PDBx/mmCIF dictionary. This tool aims at making pre-deposition validation more accessible, help ensuring dictionary compliance and data quality before submission via the wwPDB OneDep system.

The PDBe mmCIF Validator is available from:

The tool is available in two complementary forms:

  1. Visual Studio Code extension
  • Real-time validation as you edit (on open, save, and with debounced validation on changes)
  • Errors and warnings highlighted in the editor with precise positioning
  • Full syntax highlighting and hover information for mmCIF tags
  • Metadata completeness indicator: percentage score in the status bar and a dedicated Metadata Completeness view in the sidebar (missing categories/items, method detection for xray/em/nmr)
  • Works out-of-the-box with automatic dictionary download from the official wwPDB repository

2. Standalone Python script

  • Command-line validation for single files or batch processing
  • No external dependencies (Python standard library only; Python 3.7+)
  • Machine-readable JSON output with line numbers, severity, and character positions for CI/CD integration
  • Exit codes (0/1) for automated workflows
  • Supports local dictionary files or download from a URL

Both implementations share the same validation engine, ensuring consistent results.

The validator performs comprehensive checks including: mandatory item presence (category-aware), enumeration and data type validation, range constraints (with distinction between strictly allowed and advisory ranges), parent/child category relationships, foreign key and composite key integrity, operation expression validation (e.g. for virus assemblies), and duplicate category/item detection. It also applies regex-based validation from deposition-specific dictionary categories (e.g. email, ORCID ID, PDB ID) aligned with OneDep requirements.

The tool is aimed at structural biologists preparing structures for deposition, and institutions implementing automated quality control pipelines.

The software is freely available under the MIT licence. It is developed by Deborah Harrus and maintained by the Protein Data Bank in Europe (PDBe) at EMBL-EBI. We acknowledge the wwPDB mmCIF working group and our wwPDB partners for maintaining the PDBx/mmCIF dictionary.

Demo videos are available here:

https://www.youtube.com/watch?v=CCkC9Bc6FY8

https://www.youtube.com/watch?v=li7ETeSA8FI

We welcome feedback and contributions. For questions or issues, please email pdbehelp@ebi.ac.uk or use the GitHub repository.