Properties¶
When reading files, chemfiles read multiple kind of data: positions, velocities, atomic information (name, type, mass, charge, …), bonds and other connectivity elements. This model allow to read the most commonly available data in various formats. But sometimes, a format defines additional information. Instead of adding a new field/function for every kind of data there can be in a file, chemfiles defines a generic interface to read and store this additional data. These additional data are stored inside properties.
A property has a name and a value. The value can either be a real number, a string, a Boolean value (true/false) or a 3 dimensional vector. A property is either stored inside an atom, and associated with this atom (for example the total atomic force), or stored in and associated with a frame. The later case is used for general properties, such as the temperature of the system, or the author of the file.
This section documents which format set and use properties.
Atomic properties¶
Name |
Type |
Format |
Description |
---|---|---|---|
altloc |
string |
MMTF |
On reading, this property is set the the alternative location
character stored in both of these formats. On writing, this character is stored
with the |
mmCIF |
On reading, this property is set the the alternative location
character stored in both of these formats. On writing, this character is stored
with the |
||
hydrogen_count |
number |
CML |
The number of hydrogens attached to the atom. The property is only set if the attribute is given and is non-zero. |
SMI |
The number of hydrogens attached to the atom. The property is only set if the property ‘H’ is given. |
||
sybyl |
string |
MOL2 |
The sybyl atom type is typically stored in a MOL2 file in a column
between the Z coordinate and the residue id. These types typically consist of
an element ID, a period, then a hybridization state (eg |
is_aromatic |
bool |
SMI |
Describes if the atom is flagged as aromatic. This flag is set if the atom type is given as ‘b’, ‘c’, ‘o’, ‘p’, or ‘s’. This flag is also set when the atom is lowercase and in a property bracket. If the flag is set when the SMILES string is written, then the atom will be written in lowercase. |
smiles_class |
number |
SMI |
The class of the atom given in the SMILES string. This propery is set if a ‘:’ character followed by a number is found in a property bracket while reading. If this property is set when the SMILES string is written, then the atom will be written in a property bracket with the ‘:’ character. |
chirality |
string |
SMI |
The chirality tag of the atom. If the chirality given in an atom property bracket is ‘@@’, then this string is set to ‘CW’. If the chirality is given as ‘@’, followed by ‘TH’, ‘AL’, ‘SP’, ‘TB’, or ‘OH’, which is in turn followed by a number, then this string is set to ‘CCW <character tag><number>’. When writing, if the string begins with ‘CW’, then ‘@@’ will be added to the property bracket. Otherwise, ‘@’ will be writing followed by the remaining string. At the moment, no attempt is made to ensure that the chirality of the atom is valid. |
wildcard |
bool |
SMI |
Sets if the atom was defined as a wildcard card atom. |
atom_type |
number |
Tinker |
The Tinker atom type. |
Residue properties¶
Name |
Type |
Format |
Description |
---|---|---|---|
is_standard_pdb |
bool |
PDB |
When reading, is_standard_pdb is set to |
MMTF |
When reading, is_standard_pdb is set to |
||
mmCIF |
When reading, is_standard_pdb is set to |
||
chainname |
string |
PDB |
The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a crystallographic file where it may not be unique. |
mmCIF |
The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a crystallographic file where it may not be unique. This name is unique to a given biological assembly, however. |
||
MMTF |
The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a crystallographic file where it may not be unique. This name is unique to a given biological assembly, however. |
||
chainid |
string |
PDB |
For |
mmCIF |
The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a biologic assembly. It is unique to both the biologic assembly and the crystal structure. |
||
MMTF |
The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a biologic assembly. It is unique to both the biologic assembly and the crystal structure. |
||
chainindex |
number |
MMTF |
The chainindex is a numeric representation of the chainindex and it may be removed in a future release. It will be negative if the residue is created through a symmetry operation. |
composition_type |
string |
MMTF |
The composition_type defines how residues are chemically bonded.
For example, the majority of proteins consist of |
mmCIF |
The composition_type defines how residues are chemically bonded.
For example, the majority of proteins consist of |
||
assembly |
string |
MMTF |
The assembly property defines the assignment of biologically
relevent groupings of residues in a crystal structure. For example, the PDBID
|
insertion_code |
string |
PDB |
On reading, this insertion_code is set to the insertion code of the
residue. This code is stored as a single character in the PDB file after the
residue id. If this character is a space character, the property is not set. On
writing, this character is stored with the |
secondary_structure |
string |
PDB |
On reading, the secondary_structure is assigned via |
MMTF |
On reading, the secondary_structure is assigned via the
|
||
resname |
string |
PDB |
|
Frame properties¶
Name |
Type |
Format |
Description |
---|---|---|---|
name |
string |
CML |
The text in the |
PDB |
The text described by the |
||
GRO |
The first line of a GRO file is used as the frame name when reading. |
||
SDF |
The first line of an SDF file is used as the frame name when reading. |
||
MOL2 |
The first line after |
||
MMTF |
The text in the |
||
mmCIF |
The text in the |
||
SMI |
Any string after a terminating (blank) character in a SMILES string. |
||
classification |
string |
PDB |
The classification of a structure assigned by the PDB. Read from the |
pdb_idcode |
string |
PDB |
Four letter code for structures deposited in the PDB. Read from the |
MMTF |
Four letter code for structures deposited in the PDB. Read from the |
||
mmCIF |
Four letter code for structures deposited in the PDB. Read from the |
||
deposition_date |
string |
PDB |
Date (DD-MMM-YY format) of the deposition in the PDB. Read from the |
MMTF |
Date (YYYY-MM-DD format) of the deposition in the PDB. Read from the |
||
title |
string |
CML |
The text of the |
time |
number |
TRR |
The time of the frame in pico seconds. |
XTC |
The time of the frame in pico seconds. |
||
xtc_precision |
number |
XTC |
The precision used to compress the coordinates. Only used for ten or more atoms. Default is 1000. |
has_positions |
bool |
TRR |
Set to |
trr_lambda |
number |
TRR |
This is usually the free energy coupling parameter. |
Additionally, the SDF format reads any property formatted as > <...>
, using
the value inside the angle brackets (...
here) as the property name.