Properties

When reading files, chemfiles read multiple kind of data: positions, velocities, atomic information (name, type, mass, charge, …), bonds and other connectivity elements. This model allow to read the most commonly available data in various formats. But sometimes, a format defines additional information. Instead of adding a new field/function for every kind of data there can be in a file, chemfiles defines a generic interface to read and store this additional data. These additional data are stored inside properties.

A property has a name and a value. The value can either be a real number, a string, a Boolean value (true/false) or a 3 dimensional vector. A property is either stored inside an atom, and associated with this atom (for example the total atomic force), or stored in and associated with a frame. The later case is used for general properties, such as the temperature of the system, or the author of the file.

This section documents which format set and use properties.

Atomic properties

Name

Type

Format

Description

altloc

string

MMTF

On reading, this property is set the the alternative location character stored in both of these formats. On writing, this character is stored with the ATOM or HETATM record. If the property is not set, a space character is used.

mmCIF

On reading, this property is set the the alternative location character stored in both of these formats. On writing, this character is stored with the ATOM or HETATM record. If the property is not set, a space character is used.

hydrogen_count

number

CML

The number of hydrogens attached to the atom. The property is only set if the attribute is given and is non-zero.

SMI

The number of hydrogens attached to the atom. The property is only set if the property ‘H’ is given.

sybyl

string

MOL2

The sybyl atom type is typically stored in a MOL2 file in a column between the Z coordinate and the residue id. These types typically consist of an element ID, a period, then a hybridization state (eg C.2 or O.3). When reading, if the sybyl type column contains a period or is a valid element ID, this property is set to the value in the column. Otherwise, the property is not set and the atom type is guessed from the atom name. When writing, if this propery is set, then this value is written between the Z coordinate and the residue number. Otherwise, the atom’s type is used to replace the sybyl type.

is_aromatic

bool

SMI

Describes if the atom is flagged as aromatic. This flag is set if the atom type is given as ‘b’, ‘c’, ‘o’, ‘p’, or ‘s’. This flag is also set when the atom is lowercase and in a property bracket. If the flag is set when the SMILES string is written, then the atom will be written in lowercase.

smiles_class

number

SMI

The class of the atom given in the SMILES string. This propery is set if a ‘:’ character followed by a number is found in a property bracket while reading. If this property is set when the SMILES string is written, then the atom will be written in a property bracket with the ‘:’ character.

chirality

string

SMI

The chirality tag of the atom. If the chirality given in an atom property bracket is ‘@@’, then this string is set to ‘CW’. If the chirality is given as ‘@’, followed by ‘TH’, ‘AL’, ‘SP’, ‘TB’, or ‘OH’, which is in turn followed by a number, then this string is set to ‘CCW <character tag><number>’. When writing, if the string begins with ‘CW’, then ‘@@’ will be added to the property bracket. Otherwise, ‘@’ will be writing followed by the remaining string. At the moment, no attempt is made to ensure that the chirality of the atom is valid.

wildcard

bool

SMI

Sets if the atom was defined as a wildcard card atom.

atom_type

number

Tinker

The Tinker atom type.

Residue properties

Name

Type

Format

Description

is_standard_pdb

bool

PDB

When reading, is_standard_pdb is set to true for residues defined with a ATOM record, and false for atoms defined with an HETATM record. When writing, is_standard_pdb is used to determine whether to emit an HETATM or an ATOM record. If the property is not set, HETATM is used.

MMTF

When reading, is_standard_pdb is set to false when the composition_type for the group is related to peptide or nucleotide linkage. See the composition_type property for residues for more information. This property is ignored while writing.

mmCIF

When reading, is_standard_pdb is set to true when _atom_site.group_PDB is ATOM, false when it is HETATM, and is unset in the absense of this field. When writing, is_standard_pdb is used to determine whether to use ATOM or HETATM for _atom_site.group_PDB. If the property is not set, HETATM is used.

chainname

string

PDB

The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a crystallographic file where it may not be unique.

mmCIF

The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a crystallographic file where it may not be unique. This name is unique to a given biological assembly, however.

MMTF

The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a crystallographic file where it may not be unique. This name is unique to a given biological assembly, however.

chainid

string

PDB

For PDB files, the chainid is identical to chainname

mmCIF

The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a biologic assembly. It is unique to both the biologic assembly and the crystal structure.

MMTF

The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a biologic assembly. It is unique to both the biologic assembly and the crystal structure.

chainindex

number

MMTF

The chainindex is a numeric representation of the chainindex and it may be removed in a future release. It will be negative if the residue is created through a symmetry operation.

composition_type

string

MMTF

The composition_type defines how residues are chemically bonded. For example, the majority of proteins consist of L-peptide linking residues, RNAs consist of L-RNA linking, DNAs consist of L-DNA linking, and saccharides consist of D-saccharide. See chemCompType in the mmCIF dictionary for more details. Since this is a required group property, other is used when writing unless specified.

mmCIF

The composition_type defines how residues are chemically bonded. For example, the majority of proteins consist of L-peptide linking residues, RNAs consist of L-RNA linking, DNAs consist of L-DNA linking, and saccharides consist of D-saccharide. See chemCompType in the mmCIF dictionary for more details.

assembly

string

MMTF

The assembly property defines the assignment of biologically relevent groupings of residues in a crystal structure. For example, the PDBID 4XUF contains two biologically identical copies of the protein FLT3, labled bioA and bioB, as this protein does not function as a homodimer. The PDBID 3OGF, however, is a homodimer and therefore only contains one biological assembly with two chains. This property is not used for writing.

insertion_code

string

PDB

On reading, this insertion_code is set to the insertion code of the residue. This code is stored as a single character in the PDB file after the residue id. If this character is a space character, the property is not set. On writing, this character is stored with the ATOM or HETATM record. If the property is not set, a space character is used.

secondary_structure

string

PDB

On reading, the secondary_structure is assigned via HELIX, SHEET, and TURN records. If a residue is listed in a SHEET record, then the secondary_structure is set to extended. Similarly, the TURN record will set residues as turn. The HELIX record is more complex as the PDB standard allows for multiple types of helicies include alpha, pi, and 3-10 helicies. This property is not used for writing.

MMTF

On reading, the secondary_structure is assigned via the secStructList field in the MMTF standard. The values assigned match the descriptions by the Define Secondary Strucutre of Proteins (DSSP) algorithm. Examples include extended, alpha helix, pi helix, turn and coil. This property is not set for undefined secondary structures and is not used for writing.

resname

string

PDB

PDB files used with CHARM, NAMD and a few other software package can contain a non-standard segment name associated with a residue. If set when writting a frame, this property is written to the file.

Frame properties

Name

Type

Format

Description

name

string

CML

The text in the name node of a molecule. Only set if the node is present.

PDB

The text described by the TITLE record is used as the frame name when reading.

GRO

The first line of a GRO file is used as the frame name when reading.

SDF

The first line of an SDF file is used as the frame name when reading.

MOL2

The first line after @<TRIPOS>MOLECULE is used as the frame name when reading.

MMTF

The text in the title field is used as the frame name when reading.

mmCIF

The text in the _struct.title field is used as the frame name when reading.

SMI

Any string after a terminating (blank) character in a SMILES string.

classification

string

PDB

The classification of a structure assigned by the PDB. Read from the HEADER record.

pdb_idcode

string

PDB

Four letter code for structures deposited in the PDB. Read from the HEADER record.

MMTF

Four letter code for structures deposited in the PDB. Read from the structuresId field.

mmCIF

Four letter code for structures deposited in the PDB. Read from the _entry.id field.

deposition_date

string

PDB

Date (DD-MMM-YY format) of the deposition in the PDB. Read from the HEADER record.

MMTF

Date (YYYY-MM-DD format) of the deposition in the PDB. Read from the depositionDate field.

title

string

CML

The text of the title attribute in a molecule node. Only set if the attribute is present.

time

number

TRR

The time of the frame in pico seconds.

XTC

The time of the frame in pico seconds.

xtc_precision

number

XTC

The precision used to compress the coordinates. Only used for ten or more atoms. Default is 1000.

has_positions

bool

TRR

Set to true if the frame contains positions. All positions are zero if not.

trr_lambda

number

TRR

This is usually the free energy coupling parameter.

Additionally, the SDF format reads any property formatted as > <...>, using the value inside the angle brackets (... here) as the property name.