The PDB Chemical Component Dictionary (formerly the HET Group Dictionary) is available in mmCIF format [68.7 Mb uncompressed or 15.97 Mb .gz format], and is updated weekly. This dictionary, created by the curation efforts of the RCSB PDB team, is under active development. Any comments and suggestions are greatly appreciated.
This page provides descriptions and examples of the contents of the PDB format and mmCIF format Chemical Component Dictionaries as well as a description of the contents of a Ligand Expo entry.
Ligand Expo1, formally the Ligand Depot, has been created as a data warehouse which integrates databases, services, tools and methods related to small molecules bound to macromolecules. Its purpose is to help users explore the PDB Chemical Component Dictionary and the small molecule contents of the PDB. In particular, it allows users to:
The mmCIF format [~16 Mb] Chemical Component Dictionaries describe all residues in the PDB, both standard and non-standard, in addition to all the small molecule ligands. The overall format of the dictionaries is an alphabetical concatenation of all available groups.
Ligands are identified in the mmCIF format files as chemical components and PDB format files as HET groups. Residues such as prosthetic groups, inhibitors, solvent molecules, and ions for which coordinates are supplied are considered to be non-standard if they are:
Each chemical component is assigned an ID code of not more than three alphanumeric characters.
Each time a new chemical component is created, it is entered into both the mmCIF format and PDB format dictionaries. A new group is not released in the public version of the dictionary until the PDB entry containing the novel chemical component is released.
The mmCIF format combines collections of related data items (tokens) into categories. A category is essentially a table in which each token represents a row in the table. The question mark (?) is used to mark an item value as missing. A period (.) may be used to identify that there is no appropriate value for the item or that a value has been intentionally omitted.
Vectors and tables of data may be encoded in mmCIF using a loop_ directive. To build a table, the data item names corresponding to the table columns are preceded by the loop_ directive, and followed by the corresponding rows of data.
| A detailed description of the mmCIF syntax and logic structure is available. |
In an mmCIF format coordinate file the chem_comp category is used to describe the chemical components in an entry. The chemical name for the chemical component is given by chem_comp.name, the chemical formula by chem_comp.formula, and the molecular weight by chem_comp.formula_weight.
For example entry 1t5d contains the ligand 4-Chloro-benzoic Acid (ID code: 174):
loop_ _chem_comp.id _chem_comp.name _chem_comp.ndb_synonyms _chem_comp.formula _chem_comp.formula_weight _chem_comp.ndb_component_no 174 '4-CHLORO-BENZOIC ACID' ? 'C7 H5 O2 CL1' 156.568 ? #
Further information describing each non-standard residue is then provided in the Chemical Component Dictionary.
| Please see the mmCIF format dictionary for more information about the chem_comp category. |
In the mmCIF format Chemical Component Dictionary, each chemical component is defined by sets of tokens in the five categories: chem_comp (Table 1), chem_comp_atom (Table 2), chem_comp_bond (Table 3), pdbx_chem_comp_descriptor (Table 4), and pdbx_chem_comp_identifier (Table 5).
| Table 1: chem_comp category | ||
| Token | Definition | Example |
| _chem_comp.id | The alphanumeric code for the chemical component. |
HYP |
| _chem_comp.name | The name of the chemical component. |
4-HYDROXYPROLINE |
| _chem_comp.type | The type of monomer. |
L-peptide linking |
| _chem_comp.pdbx_type | A preliminary internal classification used by PDB. |
ATOMP |
| _chem_comp.formula | The chemical formula of the chemical component. |
C5 H9 N1 O3' |
| _chem_comp.mon_nstd_parent_comp_id | The identifier for the parent component of the nonstandard component. May be a comma-separated list if this component is derived from multiple components. |
PRO |
| _chem_comp.pdbx_synonyms | Synonym list for the non-standard residue. |
HYDROXYPROLINE |
| _chem_comp.pdbx_formal_charge | The formal charge on the chemical component. |
+1 |
| _chem_comp.pdbx_initial_date | Date the chemical component was added to the database. |
yyyy-mm-dd |
| _chem_comp.pdbx_modified_date | Date that the component was last modified. |
yyyy-mm-dd |
| _chem_comp.pdbx_ambiguous_flag | For ligands with unconventional bonding (i.e. ligands with transition metal complexes). |
code |
| _chem_comp.pdbx_release status | Status of ligand (released, hold, obsoleted). |
yyyy-mm-dd |
| _chem_comp.pdbx_replaced_by | Identifies the _chem_comp.id of the new component that has replaced this component. |
3-letter identifier |
| _chem_comp.pdbx_replaces | Identifies the _chem_comp.id of the component this entry replaces. Converse of _replaced_by. |
3-letter identifier |
| _chem_comp.formula_weight | Formula mass of the chemical component in Daltons. |
131.131 |
| _chem_comp.one_letter_code | Reports the one-letter code of the component, if applicable. |
one-letter identifier |
| _chem_comp.three_letter_code | Reports the three-letter code of the component, if applicable. |
ATP |
| _chem_comp.pdbx_model_coordinates_details | Provides additional details about the model coordinates in the component definition. |
text |
| _chem_comp.pdbx_model_coordinates_missing_flag | This data item identifies if model coordinates are missing in this definition |
Y or N |
| _chem_comp.pdbx_ideal_coordinates_details | Identifies the source of the ideal coordinates in the component definition. |
text |
| _chem_comp.pdbx_ideal_coordinates_missing_flag | Identifies if ideal coordinates are missing in this definition. |
Y or N |
| _chem_comp.pdbx_model_coordinates_db_code | Identifies the PDB database code from which the heavy atom model coordinates were obtained. |
PDB entry id |
| _chem_comp.pdbx_processing_site | Identifies the deposition site that processed this chemical component defintion. |
RCSB PDB, PDBj, PDBe |
| Table 2: chem_comp_atom category: tokens in this section are looped through for each atom in the chemical component | ||
| Token | Definition | Example |
| _chem_comp_atom.comp_id | Same as _chem_comp.id |
HYP |
| _chem_comp_atom.atom_id | Identifier for each atom in the chemical component - new format |
CA |
| _chem_comp_atom.alt_atom_id | Previous format of identifier for each atom in the chemical component. |
CA |
| _chem_comp_atom.type_symbol | The element type for each atom in the chemical component. |
C O N, etc. |
| _chem_comp_atom.charge | The formal charge assigned to each atom in the chemical component. |
0 |
| _chem_comp_atom.pdbx_align | Determines which column the atom name appears within the PDB coordinate files. The possible values are 0 or 1. |
0 or 1 |
| _chem_comp_atom.pdbx_aromatic_flag | Defines atoms in an aromatic moiety. |
Y or N |
| _chem_comp_atom.pdbx_leaving_atom_flag | Flags atoms with "leaving" capability. |
Y or N |
| _chem_comp_atom.pdbx_stereo_config | Defines the stereochemical configuration of the chiral center atom. |
R or S or N |
| _chem_comp_atom.model_Cartn_x | The x component of the coordinates for each atom specified as orthogonal angstroms. |
26.056 |
| _chem_comp_atom.model_Cartn_y | The y component of the coordinates for each atom specified as orthogonal angstroms. |
5.609 |
| _chem_comp_atom.model_Cartn_z | The z component of the coordinates for each atom specified as orthogonal angstroms. |
5.594 |
| _chem_comp_atom.pdbx_model_Cartn_x_ideal | Computed idealized coordinates, x component of the vector (in Angstroms) |
number |
| _chem_comp_atom.pdbx_model_Cartn_y_ideal | Computed idealized coordinates, y component of the vector (in Angstroms) |
number |
| _chem_comp_atom.pdbx_model_Cartn_z_ideal | Computed idealized coordinates, z component of the vector (in Angstroms) |
number |
| _chem_comp_atom.pdbx_ordinal | Ordinal index for the chemical component atom list. |
1 (integer) |
| Table 3: chem_comp_bond category: tokens in this section are looped through for each bond in the chemical component | ||
| Token | Definition | Example |
| _chem_comp_bond.comp_id | Same as _chem_comp.id |
HYP |
| _chem_comp_bond.atom_id_1 | The ID of the first of the two atoms that define the bond. |
N |
| _chem_comp_bond.atom_id_2 | The ID of the second of the two atoms that define the bond. |
CA |
| _chem_comp_bond.value_order | The bond order of the chemical bond associated with the specified atoms. |
SING |
| _chem_comp_bond.pdbx_aromatic_flag | Defines aromatic bonds. |
Y or N |
| _chem_comp_bond.pdbx_stereo_config | Defines stereochemical bonds. |
Y or N |
| _chem_comp_bond.pdbx_ordinal | Ordinal index for the component bond list. |
1 (integer) |
| Table 4: _pdbx_chem_comp_descriptor category | ||
| _pdbx_chem_comp_descriptor.comp_id | This data item is a pointer to _chem_comp.id in the CHEM_COMP category. |
text |
| _pdbx_chem_comp_descriptor.type | The type of the program or library used to compute the descriptor. |
text |
_pdbx_chem_comp_descriptor.program | The name of the program or library used to compute the descriptor. |
text |
| _pdbx_chem_comp_descriptor.program_version | The version of the program or library used to compute the descriptor. |
version number |
| _pdbx_chem_comp_descriptor.descriptor | The chemical descriptor value for this component. |
code |
| Table 5: _pdbx_chem_comp_identifier category | ||
| _pdbx_chem_comp_identifier.comp_id | This data item is a pointer to _chem_comp.id in the CHEM_COMP category. |
text |
| _pdbx_chem_comp_identifier.type | Contains the identifier type. |
CAS Reg No. or PUBCHEM, etc. |
| _pdbx_chem_comp_identifier.program | The name of the program or library used to compute the identifier. |
OpenEye OECHEM program, etc. |
| _pdbx_chem_comp_identifier.program_version | The version of the program or library used to compute the identifier. |
v1.2 (numbers) |
| _pdbx_chem_comp_identifier.identifier | Contains the identifier value for this chemical component.. |
text |

| Note: Diagrams are not included in the Chemical Component Dictionary. It is included here for illustrative purposes. |
data_ACY # _chem_comp.id 174 _chem_comp.name '4-CHLORO-BENZOIC ACID' _chem_comp.type non-polymer _chem_comp.pdbx_type HETAIN _chem_comp.formula 'C7 H5 O2 CL1' _chem_comp.mon_nstd_flag n _chem_comp.formula_weight 156.568 # loop_ _chem_comp_atom.comp_id _chem_comp_atom.atom_id _chem_comp_atom.type_symbol _chem_comp_atom.charge _chem_comp_atom.model_Cartn_x _chem_comp_atom.model_Cartn_y _chem_comp_atom.model_Cartn_z _chem_comp_atom.pdbx_align 174 CL4 CL 0 -19.787 95.862 18.541 0 174 C4 C 0 -19.932 94.201 19.219 1 174 C5 C 0 -18.817 93.715 19.901 1 174 C6 C 0 -18.847 92.452 20.466 1 174 C3 C 0 -21.099 93.428 19.089 1 174 C2 C 0 -21.127 92.158 19.664 1 174 C1 C 0 -19.996 91.681 20.342 1 174 C C 0 -19.962 90.330 20.989 1 174 O1 O 0 -20.968 89.592 20.924 1 174 O2 O 0 -18.919 89.991 21.597 1 174 HO1 H 0 ? ? ? 1 174 H2 H 0 ? ? ? 1 174 H3 H 0 ? ? ? 1 174 H5 H 0 ? ? ? 1 174 H6 H 0 ? ? ? 1 # loop_ _chem_comp_bond.comp_id _chem_comp_bond.atom_id_1 _chem_comp_bond.atom_id_2 _chem_comp_bond.value_order 174 CL4 C4 SING 174 C4 C5 AROM 174 C4 C3 AROM 174 C5 C6 AROM 174 C5 H5 SING 174 C6 C1 AROM 174 C6 H6 SING 174 C3 C2 AROM 174 C3 H3 SING 174 C2 C1 AROM 174 C2 H2 SING 174 C1 C SING 174 C O1 SING 174 C O2 DOUB 174 O1 HO1 SING #
We also have the Heterogen List available for download [13.2 Mb]. However, the RCSB PDB recommends use of the mmCIF format chemical component dictionary because the heterogen list does not take stereochemistry into account, among other chemical properties, making this ligand library more limited in its functionality. But many still find it a useful tool.
The heterogen section of a PDB coordinate file describes ligands in the entry. The chemical name of the ligand is given in the HETNAM record and the chemical formula is given in the FORMUL record. Any synonyms for the chemical name are given in the HETSYN records.
For example entry 1t5d contains the ligand 4-Chloro-benzoic Acid (ID code 174):
HET 174 15 HETNAM 174 4-CHLORO-BENZOIC ACID FORMUL 174 C7 H5 O2 CL1
Further information describing each non-standard residue is then provided in the Chemical Component Dictionary.
| Please refer to the PDB File Format Contents Guide for additional information about the Heterogen Section within PDB format coordinate files. |
Each entry in the PDB format Chemical Component Dictionary is represented by a series of fields:
| Field | Definition | Example |
| RESIDUE | Contains the ID code of the chemical component followed by how many lines of connect records the dictionary entry contains. |
HYP 18 |
| CONECT | For each atom in the chemical component, lists to how many and to which other atoms that atom is bonded. The list of CONECT records is concluded with an END record. |
N 3 CA CD H |
| HET | This is the same as the RESIDUE field. |
HYP 18 |
| HETSYN | Any synonyms for the chemical component. This field may occupy more than one line and may not appear for each dictionary entry. |
HYP HYDROXYPROLINE |
| HETNAM | The name of the chemical component. This field may occupy more than one line. |
HYP 4-HYDROXYPROLINE |
| FORMUL | The chemical formula of the chemical component. |
HYP C5 H9 N1 O3 |

| Note: Diagrams are not included in the Chemical Component Dictionary. It is included here for illustrative purposes. |
RESIDUE 174 15 CONECT CL4 1 C4 CONECT C4 3CL4 C5 C3 CONECT C5 3 C4 C6 H5 CONECT C6 3 C5 C1 H6 CONECT C3 3 C4 C2 H3 CONECT C2 3 C3 C1 H2 CONECT C1 3 C6 C2 C CONECT C 3 C1 O1 O2 CONECT O1 2 C HO1 CONECT O2 1 C CONECT HO1 1 O1 CONECT H2 1 C2 CONECT H3 1 C3 CONECT H5 1 C5 CONECT H6 1 C6 END HET 174 15 HETNAM 174 4-CHLORO-BENZOIC ACID FORMUL 174 C7 H5 O2 CL1
To see an example, the Ligand Expo entry for 4-Chloro-benzoic Acid is at http://ligand-expo.rcsb.org/reports/1/174/index.html
1Z Feng, L Chen, H Maddula, O Akcan, HM Berman, J Westbrook, ACA Program and Abstract Book Series 2 Vol 30 ISSN 0569-4221, Northern Kentucky Convention Center, July 26-31, 2003.
Questions, comments, and suggestions should be sent to info@rcsb.org.