Properties

When reading files, chemfiles read multiple kind of data: positions, velocities, atomic informations (name, type, mass, charge, …), bonds and other connectivity elements. This model allow to read the most commonly available data in various formats. But sometimes, a format defines additional information. Instead of adding a new field/function for every kind of data there can be in a file, chemfiles defines a generic interface to read and store this additional data. These additional data are stored inside properties.

A property has a name and a value. The value can either be a real number, a string, a boolean value (true/false) or a 3 dimmensional vector. A property is either stored inside an atom, and associated with this atom (for example the total atomic force), or stored in and associated with a frame. The later case is used for general properties, such as the temperature of the system, or the author of the file.

This section documents which format set and use properties.

Atomic properties

Name Type Format Description
altloc string MMTF On reading, this property is set the the alternative location character stored in both of these formats. On writing, this character is stored with the ATOM or HETATM record. If the property is not set, a space character is used.
    mmCIF On reading, this property is set the the alternative location character stored in both of these formats. On writing, this character is stored with the ATOM or HETATM record. If the property is not set, a space character is used.
sybyl string MOL2 The sybyl atom type is typically stored in a MOL2 file in a column between the Z coordinate and the residue id. These types typically consist of an element ID, a period, then a hybridization state (eg C.2 or O.3). When reading, if the sybyl type column contains a period or is a valid element ID, this property is set to the value in the column. Otherwise, the property is not set and the atom type is guessed from the atom name. When writing, if this propery is set, then this value is written between the Z coordinate and the residue number. Otherwise, the atom’s type is used to replace the sybyl type.

Residue properties

Name Type Format Description
chainid string mmCIF The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a biologic assembly. It is unique to both the biologic assembly and the crystal structure.
    PDB For PDB files, the chainid is identical to chainname
    MMTF The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a biologic assembly. It is unique to both the biologic assembly and the crystal structure.
assembly string MMTF The assembly property defines the assignment of biologically relevent groupings of residues in a crystal structure. For example, the PDBID 4XUF contains two biologically identical copies of the protein FLT3, labled bioA and bioB, as this protein does not function as a homodimer. The PDBID 3OGF, however, is a homodimer and therefore only contains one biological assembly with two chains. This property is not used for writing.
insertion_code string PDB On reading, this insertion_code is set to the insertion code of the residue. This code is stored as a single character in the PDB file after the residue id. If this character is a space character, the property is not set. On writing, this character is stored with the ATOM or HETATM record. If the property is not set, a space character is used.
composition_type string MMTF The composition_type defines how residues are chemically bonded. For example, the majority of proteins consist of L-peptide linking residues, RNAs consist of L-RNA linking, DNAs consist of L-DNA linking, and saccharides consist of D-saccharide. See chemCompType in the mmCIF dictionary for more details. Since this is a required group property, other is used when writing unless specified.
    mmCIF The composition_type defines how residues are chemically bonded. For example, the majority of proteins consist of L-peptide linking residues, RNAs consist of L-RNA linking, DNAs consist of L-DNA linking, and saccharides consist of D-saccharide. See chemCompType in the mmCIF dictionary for more details.
secondary_structure string PDB On reading, the secondary_structure is assigned via HELIX, SHEET, and TURN records. If a residue is listed in a SHEET record, then the secondary_structure is set to extended. Similarly, the TURN record will set residues as turn. The HELIX record is more complex as the PDB standard allows for multiple types of helicies include alpha, pi, and 3-10 helicies. This property is not used for writing.
chainname string mmCIF The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a crystallographic file where it may not be unique. This name is unique to a given biological assembly, however.
    PDB The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a crystallographic file where it may not be unique.
    MMTF The chainname defines the name assigned to a protein chain by biologists. It is is a single character used to group residues which are linked together in a crystallographic file where it may not be unique. This name is unique to a given biological assembly, however.
is_standard_pdb bool MMTF When reading, is_standard_pdb is set to false when the composition_type for the group is related to peptide or nucleotide linkage. See the composition_type property for residues for more information. This property is ignored while writing.
    PDB When reading, is_standard_pdb is set to true for residues defined with a ATOM record, and false for atoms defined with an HETATM record. When writing, is_standard_pdb is used to determine whether to emit an HETATM or an ATOM record. If the property is not set, HETATM is used.
    mmCIF When reading, is_standard_pdb is set to true when _atom_site.group_PDB is ATOM, false when it is HETATM, and is unset in the absense of this field. When writing, is_standard_pdb is used to determine whether to use ATOM or HETATM for _atom_site.group_PDB. If the property is not set, HETATM is used.
chainindex number MMTF The chainindex is a numeric representation of the chainindex and it may be removed in a future release.

Frame properties

Name Type Format Description
deposition_date string MMTF Date (YYYY-MM-DD format) of the deposition in the PDB. Read from the depositionDate field.
    PDB Date (DD-MMM-YY format) of the deposition in the PDB. Read from the HEADER record.
name string MMTF The text in the title field is used as the frame name when reading.
    MOL2 The first line after @<TRIPOS>MOLECULE is used as the frame name when reading.
    SDF The first line of an SDF file is used as the frame name when reading.
    mmCIF The text in the _struct.title field is used as the frame name when reading.
    GRO The first line of a GRO file is used as the frame name when reading.
    PDB The text described by the TITLE record is used as the frame name when reading.
classification string PDB The classification of a structure assigned by the PDB. Read from the HEADER record.
pdb_idcode string MMTF Four letter code for structures deposited in the PDB. Read from the structuresId field.
    PDB Four letter code for structures deposited in the PDB. Read from the HEADER record.
    mmCIF Four letter code for structures deposited in the PDB. Read from the _entry.id field.

Additionally, the SDF format reads any property formated as > <...>, using the value inside the angle brackets (... here) as the property name.