PDB Chemical Component Dictionary Format Description

The PDB Chemical Component Dictionary (formerly the HET Group Dictionary) is available in mmCIF format [68.7 Mb uncompressed or 15.97 Mb .gz format], and is updated weekly. This dictionary, created by the curation efforts of the RCSB PDB team, is under active development. Any comments and suggestions are greatly appreciated.

This page provides descriptions and examples of the contents of the PDB format and mmCIF format Chemical Component Dictionaries as well as a description of the contents of a Ligand Expo entry.

Ligand Expo1, formerly the Ligand Depot, has been created as a data warehouse which integrates databases, services, tools and methods related to small molecules bound to macromolecules. Its purpose is to help users explore the PDB Chemical Component Dictionary and the small molecule contents of the PDB. In particular, it allows users to:

Introduction

The mmCIF format [~16 Mb] Chemical Component Dictionaries describe all residues in the PDB, both standard and non-standard, in addition to all the small molecule ligands. The overall format of the dictionaries is an alphabetical concatenation of all available groups.

Ligands are identified in the mmCIF format files as chemical components and PDB format files as HET groups. Residues such as prosthetic groups, inhibitors, solvent molecules, and ions for which coordinates are supplied are considered to be non-standard if they are:

Each chemical component is assigned an ID code of not more than three alphanumeric characters.

Each time a new chemical component is created, it is entered into both the mmCIF format and PDB format dictionaries. A new group is not released in the public version of the dictionary until the PDB entry containing the novel chemical component is released.

Table of Contents

TOC

mmCIF Format

The mmCIF format combines collections of related data items (tokens) into categories. A category is essentially a table in which each token represents a row in the table. The question mark (?) is used to mark an item value as missing. A period (.) may be used to identify that there is no appropriate value for the item or that a value has been intentionally omitted.

Vectors and tables of data may be encoded in mmCIF using a loop_ directive. To build a table, the data item names corresponding to the table columns are preceded by the loop_ directive, and followed by the corresponding rows of data.

A detailed description of the mmCIF syntax and logic structure is available.

In an mmCIF format coordinate file the chem_comp category is used to describe the chemical components in an entry. The chemical name for the chemical component is given by chem_comp.name, the chemical formula by chem_comp.formula, and the molecular weight by chem_comp.formula_weight.

For example entry 1t5d contains the ligand 4-Chloro-benzoic Acid (ID code: 174):

loop_
_chem_comp.id
_chem_comp.name
_chem_comp.ndb_synonyms
_chem_comp.formula
_chem_comp.formula_weight
_chem_comp.ndb_component_no
174 '4-CHLORO-BENZOIC ACID' ? 'C7 H5 O2 CL1'    156.568 ?
#

Further information describing each non-standard residue is then provided in the Chemical Component Dictionary.

Please see the mmCIF format dictionary for more information about the chem_comp category.

TOC

Dictionary Record Format

In the mmCIF format Chemical Component Dictionary, each chemical component is defined by sets of tokens in the five categories: chem_comp (Table 1), chem_comp_atom (Table 2), chem_comp_bond (Table 3), pdbx_chem_comp_descriptor (Table 4), and pdbx_chem_comp_identifier (Table 5).

Table 1: chem_comp category
Token Definition Example
_chem_comp.id The alphanumeric code for the chemical component.
HYP
_chem_comp.name The name of the chemical component.
4-HYDROXYPROLINE
_chem_comp.type The type of monomer.
L-peptide linking
_chem_comp.pdbx_type A preliminary internal classification used by PDB.
ATOMP
_chem_comp.formula The chemical formula of the chemical component.
C5 H9 N1 O3'
_chem_comp.mon_nstd_parent_comp_id The identifier for the parent component of the nonstandard component. May be a comma-separated list if this component is derived from multiple components.
PRO
_chem_comp.pdbx_synonyms Synonym list for the non-standard residue.
HYDROXYPROLINE
_chem_comp.pdbx_formal_charge The formal charge on the chemical component.
+1
_chem_comp.pdbx_initial_date Date the chemical component was added to the database.
yyyy-mm-dd
_chem_comp.pdbx_modified_date Date that the component was last modified.
yyyy-mm-dd
_chem_comp.pdbx_ambiguous_flag For ligands with unconventional bonding (i.e. ligands with transition metal complexes).
code
_chem_comp.pdbx_release status Status of ligand (released, hold, obsoleted).
yyyy-mm-dd
_chem_comp.pdbx_replaced_by Identifies the _chem_comp.id of the new component that has replaced this component.
3-letter identifier
_chem_comp.pdbx_replaces Identifies the _chem_comp.id of the component this entry replaces. Converse of _replaced_by.
3-letter identifier
_chem_comp.formula_weight Formula mass of the chemical component in Daltons.
131.131
_chem_comp.one_letter_code Reports the one-letter code of the component, if applicable.
one-letter identifier
_chem_comp.three_letter_code Reports the three-letter code of the component, if applicable.
ATP
_chem_comp.pdbx_model_coordinates_details Provides additional details about the model coordinates in the component definition.
text
_chem_comp.pdbx_model_coordinates_missing_flag This data item identifies if model coordinates are missing in this definition
Y or N
_chem_comp.pdbx_ideal_coordinates_details Identifies the source of the ideal coordinates in the component definition.
text
_chem_comp.pdbx_ideal_coordinates_missing_flag Identifies if ideal coordinates are missing in this definition.
Y or N
_chem_comp.pdbx_model_coordinates_db_code Identifies the PDB database code from which the heavy atom model coordinates were obtained.
PDB entry id
_chem_comp.pdbx_processing_site Identifies the deposition site that processed this chemical component defintion.
RCSB PDB, PDBj, PDBe

Table 2: chem_comp_atom category: tokens in this section are looped through for each atom in the chemical component
Token Definition Example
_chem_comp_atom.comp_id Same as _chem_comp.id
HYP
_chem_comp_atom.atom_id Identifier for each atom in the chemical component - new format
CA
_chem_comp_atom.alt_atom_id Previous format of identifier for each atom in the chemical component.
CA
_chem_comp_atom.type_symbol The element type for each atom in the chemical component.
C  O  N, etc.
_chem_comp_atom.charge The formal charge assigned to each atom in the chemical component.
0
_chem_comp_atom.pdbx_align Determines which column the atom name appears within the PDB coordinate files. The possible values are 0 or 1.
0 or 1
_chem_comp_atom.pdbx_aromatic_flag Defines atoms in an aromatic moiety.
Y or N
_chem_comp_atom.pdbx_leaving_atom_flag Flags atoms with "leaving" capability.
Y or N
_chem_comp_atom.pdbx_stereo_config Defines the stereochemical configuration of the chiral center atom.
R or S or N
_chem_comp_atom.model_Cartn_x The x component of the coordinates for each atom specified as orthogonal angstroms.
26.056
_chem_comp_atom.model_Cartn_y The y component of the coordinates for each atom specified as orthogonal angstroms.
5.609 
_chem_comp_atom.model_Cartn_z The z component of the coordinates for each atom specified as orthogonal angstroms.
5.594
_chem_comp_atom.pdbx_model_Cartn_x_ideal Computed idealized coordinates, x component of the vector (in Angstroms)
number
_chem_comp_atom.pdbx_model_Cartn_y_ideal Computed idealized coordinates, y component of the vector (in Angstroms)
number
_chem_comp_atom.pdbx_model_Cartn_z_ideal Computed idealized coordinates, z component of the vector (in Angstroms)
number
_chem_comp_atom.pdbx_ordinal Ordinal index for the chemical component atom list.
1 (integer)

Table 3: chem_comp_bond category: tokens in this section are looped through for each bond in the chemical component
Token Definition Example
_chem_comp_bond.comp_id Same as _chem_comp.id
HYP
_chem_comp_bond.atom_id_1 The ID of the first of the two atoms that define the bond.
N 
_chem_comp_bond.atom_id_2 The ID of the second of the two atoms that define the bond.
CA
_chem_comp_bond.value_order The bond order of the chemical bond associated with the specified atoms.
SING 
_chem_comp_bond.pdbx_aromatic_flag Defines aromatic bonds.
Y or N
_chem_comp_bond.pdbx_stereo_config Defines stereochemical bonds.
Y or N
_chem_comp_bond.pdbx_ordinal Ordinal index for the component bond list.
1 (integer)

Table 4: _pdbx_chem_comp_descriptor category
_pdbx_chem_comp_descriptor.comp_id This data item is a pointer to _chem_comp.id in the CHEM_COMP category.
text
_pdbx_chem_comp_descriptor.type The type of the program or library used to compute the descriptor.
text
_pdbx_chem_comp_descriptor.program The name of the program or library used to compute the descriptor.
text
_pdbx_chem_comp_descriptor.program_version The version of the program or library used to compute the descriptor.
version number
_pdbx_chem_comp_descriptor.descriptor The chemical descriptor value for this component.
code

Table 5: _pdbx_chem_comp_identifier category
_pdbx_chem_comp_identifier.comp_id This data item is a pointer to _chem_comp.id in the CHEM_COMP category.
text
_pdbx_chem_comp_identifier.type Contains the identifier type.
CAS Reg No. or PUBCHEM, etc.
_pdbx_chem_comp_identifier.program The name of the program or library used to compute the identifier.
OpenEye OECHEM program, etc.
_pdbx_chem_comp_identifier.program_version The version of the program or library used to compute the identifier.
v1.2 (numbers)
_pdbx_chem_comp_identifier.identifier Contains the identifier value for this chemical component..
text

TOC

Example: 4-Chloro-benzoic Acid

Diagram of 4-Chloro-benzoic Acid

Note: Diagrams are not included in the Chemical Component Dictionary. It is included here for illustrative purposes.

mmCIF Format Chemical Component Dictionary Entry for 4-Chloro-benzoic Acid


data_ACY
# 
_chem_comp.id               174 
_chem_comp.name             '4-CHLORO-BENZOIC ACID'
_chem_comp.type             non-polymer
_chem_comp.pdbx_type        HETAIN
_chem_comp.formula          'C7 H5 O2 CL1'
_chem_comp.mon_nstd_flag    n 

_chem_comp.formula_weight   156.568
# 
loop_
_chem_comp_atom.comp_id 
_chem_comp_atom.atom_id
_chem_comp_atom.type_symbol 
_chem_comp_atom.charge 
_chem_comp_atom.model_Cartn_x
_chem_comp_atom.model_Cartn_y 
_chem_comp_atom.model_Cartn_z 

_chem_comp_atom.pdbx_align 
174 CL4 CL 0 -19.787 95.862 18.541 0 
174 C4  C  0 -19.932 94.201 19.219 1 
174 C5  C  0 -18.817 93.715 19.901 1 
174 C6  C  0 -18.847 92.452 20.466 1 
174 C3  C  0 -21.099 93.428 19.089 1 
174 C2  C  0 -21.127 92.158 19.664 1 
174 C1  C  0 -19.996 91.681 20.342 1 
174 C   C  0 -19.962 90.330 20.989 1 
174 O1  O  0 -20.968 89.592 20.924 1 
174 O2  O  0 -18.919 89.991 21.597 1 
174 HO1 H  0 ?       ?      ?      1 
174 H2  H  0 ?       ?      ?      1 
174 H3  H  0 ?       ?      ?      1 
174 H5  H  0 ?       ?      ?      1 
174 H6  H  0 ?       ?      ?      1 
# 
loop_
_chem_comp_bond.comp_id 
_chem_comp_bond.atom_id_1 
_chem_comp_bond.atom_id_2 
_chem_comp_bond.value_order 
174 CL4 C4  SING 
174 C4  C5  AROM 
174 C4  C3  AROM 
174 C5  C6  AROM 
174 C5  H5  SING 
174 C6  C1  AROM 
174 C6  H6  SING 
174 C3  C2  AROM 
174 C3  H3  SING 
174 C2  C1  AROM 
174 C2  H2  SING 
174 C1  C   SING 
174 C   O1  SING 
174 C   O2  DOUB 
174 O1  HO1 SING 
# 

PDB Format

We also have the Heterogen List available for download [13.2 Mb]. However, the RCSB PDB recommends use of the mmCIF format chemical component dictionary because the heterogen list does not take stereochemistry into account, among other chemical properties, making this ligand library more limited in its functionality. But many still find it a useful tool.

The heterogen section of a PDB coordinate file describes ligands in the entry. The chemical name of the ligand is given in the HETNAM record and the chemical formula is given in the FORMUL record. Any synonyms for the chemical name are given in the HETSYN records.

For example entry 1t5d contains the ligand 4-Chloro-benzoic Acid (ID code 174):

HET    174             15
HETNAM     174 4-CHLORO-BENZOIC ACID
FORMUL      174    C7 H5 O2 CL1

Further information describing each non-standard residue is then provided in the Chemical Component Dictionary.

Please refer to the PDB File Format Contents Guide for additional information about the Heterogen Section within PDB format coordinate files.

TOC

Dictionary Record Format

Each entry in the PDB format Chemical Component Dictionary is represented by a series of fields:

Field Definition Example
RESIDUE Contains the ID code of the chemical component followed by how many lines of connect records the dictionary entry contains.
HYP 18

				
CONECT For each atom in the chemical component, lists to how many and to which other atoms that atom is bonded. The list of CONECT records is concluded with an END record.
N 3 CA CD H
HET This is the same as the RESIDUE field.
HYP 18
HETSYN Any synonyms for the chemical component. This field may occupy more than one line and may not appear for each dictionary entry.
HYP HYDROXYPROLINE
HETNAM The name of the chemical component. This field may occupy more than one line.
HYP 4-HYDROXYPROLINE
FORMUL The chemical formula of the chemical component.
HYP    C5 H9 N1 O3

TOC

Example: 4-Chloro-benzoic Acid

Diagram of 4-Chloro-benzoic Acid

Note: Diagrams are not included in the Chemical Component Dictionary. It is included here for illustrative purposes.

PDB Format Chemical Component Dictionary Entry for 4-Chloro-benzoic Acid

RESIDUE   174     15
CONECT     CL4     1 C4  
CONECT      C4     3CL4   C5   C3  
CONECT      C5     3 C4   C6   H5  
CONECT      C6     3 C5   C1   H6  
CONECT      C3     3 C4   C2   H3  
CONECT      C2     3 C3   C1   H2  
CONECT      C1     3 C6   C2   C   
CONECT      C      3 C1   O1   O2  
CONECT      O1     2 C    HO1 
CONECT      O2     1 C   
CONECT      HO1    1 O1  
CONECT      H2     1 C2  
CONECT      H3     1 C3  
CONECT      H5     1 C5  
CONECT      H6     1 C6  
END
HET    174             15
HETNAM     174 4-CHLORO-BENZOIC ACID
FORMUL      174    C7 H5 O2 CL1
TOC

Ligand Expo

For each entry in the Ligand Expo the following information is provided:

To see an example, the Ligand Expo entry for 4-Chloro-benzoic Acid is at http://ligand-expo.rcsb.org/reports/1/174/index.html

TOC

Additional Resources

TOC

Reference

1Z Feng, L Chen, H Maddula, O Akcan, HM Berman, J Westbrook, ACA Program and Abstract Book Series 2 Vol 30 ISSN 0569-4221, Northern Kentucky Convention Center, July 26-31, 2003.

TOC

Questions, comments, and suggestions should be sent to info@rcsb.org.