Properties¶
When reading files, chemfiles read multiple kind of data: positions, velocities, atomic informations (name, type, mass, charge, …), bonds and other connectivity elements. This model allow to read the most commonly available data in various formats. But sometimes, a format defines additional information. Instead of adding a new field/function for every kind of data there can be in a file, chemfiles defines a generic interface to read and store this additional data. These additional data are stored inside properties.
A property has a name and a value. The value can either be a real number, a string, a boolean value (true/false) or a 3 dimmensional vector. A property is either stored inside an atom, and associated with this atom (for example the total atomic force), or stored in and associated with a frame. The later case is used for general properties, such as the temperature of the system, or the author of the file.
This section documents which format set and use properties.
Atomic properties¶
Name | Type | Format | Description |
---|---|---|---|
altloc | string | MMTF | On reading, this property is set the the alternative location
character stored in both of these formats. On writing, this character is stored
with the ATOM or HETATM record. If the property is not set, a space
character is used. |
mmCIF | On reading, this property is set the the alternative location
character stored in both of these formats. On writing, this character is stored
with the ATOM or HETATM record. If the property is not set, a space
character is used. |
||
sybyl | string | MOL2 | The sybyl atom type is typically stored in a MOL2 file in a column
between the Z coordinate and the residue id. These types typically consist of
an element ID, a period, then a hybridization state (eg C.2 or O.3 ).
When reading, if the sybyl type column contains a period or is a valid element
ID, this property is set to the value in the column. Otherwise, the property is
not set and the atom type is guessed from the atom name. When writing, if this
propery is set, then this value is written between the Z coordinate and the
residue number. Otherwise, the atom’s type is used to replace the sybyl
type. |
Residue properties¶
Name | Type | Format | Description |
---|---|---|---|
chainid | string | mmCIF | The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a biologic assembly. It is unique to both the biologic assembly and the crystal structure. |
PDB | For PDB files, the chainid is identical to chainname |
||
MMTF | The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a biologic assembly. It is unique to both the biologic assembly and the crystal structure. | ||
assembly | string | MMTF | The assembly property defines the assignment of biologically
relevent groupings of residues in a crystal structure. For example, the PDBID
4XUF contains two biologically identical copies of the protein FLT3, labled
bioA and bioB , as this protein does not function as a homodimer. The
PDBID 3OGF , however, is a homodimer and therefore only contains one
biological assembly with two chains. This property is not used for writing. |
insertion_code | string | PDB | On reading, this insertion_code is set to the insertion code of the
residue. This code is stored as a single character in the PDB file after the
residue id. If this character is a space character, the property is not set. On
writing, this character is stored with the ATOM or HETATM record. If the
property is not set, a space character is used. |
composition_type | string | MMTF | The composition_type defines how residues are chemically bonded.
For example, the majority of proteins consist of L-peptide linking
residues, RNAs consist of L-RNA linking , DNAs consist of L-DNA linking ,
and saccharides consist of D-saccharide . See chemCompType in the mmCIF
dictionary for more details. Since this is a required group property, other
is used when writing unless specified. |
mmCIF | The composition_type defines how residues are chemically bonded.
For example, the majority of proteins consist of L-peptide linking
residues, RNAs consist of L-RNA linking , DNAs consist of L-DNA linking ,
and saccharides consist of D-saccharide . See chemCompType in the mmCIF
dictionary for more details. |
||
secondary_structure | string | PDB | On reading, the secondary_structure is assigned via HELIX ,
SHEET , and TURN records. If a residue is listed in a SHEET record,
then the secondary_structure is set to extended . Similarly, the TURN
record will set residues as turn . The HELIX record is more complex as
the PDB standard allows for multiple types of helicies include alpha ,
pi , and 3-10 helicies. This property is not used for writing. |
chainname | string | mmCIF | The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a crystallographic file where it may not be unique. This name is unique to a given biological assembly, however. |
PDB | The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a crystallographic file where it may not be unique. | ||
MMTF | The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a crystallographic file where it may not be unique. This name is unique to a given biological assembly, however. | ||
is_standard_pdb | bool | MMTF | When reading, is_standard_pdb is set to false when the
composition_type for the group is related to peptide or nucleotide
linkage. See the composition_type property for residues for more
information. This property is ignored while writing. |
PDB | When reading, is_standard_pdb is set to true for residues
defined with a ATOM record, and false for atoms defined with an
HETATM record. When writing, is_standard_pdb is used to determine
whether to emit an HETATM or an ATOM record. If the property is not set,
HETATM is used. |
||
mmCIF | When reading, is_standard_pdb is set to true when
_atom_site.group_PDB is ATOM , false when it is HETATM , and is
unset in the absense of this field. When writing, is_standard_pdb is used
to determine whether to use ATOM or HETATM for _atom_site.group_PDB .
If the property is not set, HETATM is used. |
||
chainindex | number | MMTF | The chainindex is a numeric representation of the chainindex and it may be removed in a future release. |
Frame properties¶
Name | Type | Format | Description |
---|---|---|---|
deposition_date | string | MMTF | Date (YYYY-MM-DD format) of the deposition in the PDB. Read from the depositionDate field. |
PDB | Date (DD-MMM-YY format) of the deposition in the PDB. Read from the HEADER record. |
||
name | string | MMTF | The text in the title field is used as the frame name when reading. |
MOL2 | The first line after @<TRIPOS>MOLECULE is used as the frame name when reading. |
||
SDF | The first line of an SDF file is used as the frame name when reading. | ||
mmCIF | The text in the _struct.title field is used as the frame name when reading. |
||
GRO | The first line of a GRO file is used as the frame name when reading. | ||
PDB | The text described by the TITLE record is used as the frame name when reading. |
||
classification | string | PDB | The classification of a structure assigned by the PDB. Read from the HEADER record. |
pdb_idcode | string | MMTF | Four letter code for structures deposited in the PDB. Read from the structuresId field. |
PDB | Four letter code for structures deposited in the PDB. Read from the HEADER record. |
||
mmCIF | Four letter code for structures deposited in the PDB. Read from the _entry.id field. |
Additionally, the SDF format reads any property formated as > <...>
, using
the value inside the angle brackets (...
here) as the property name.