Tutorials

This section present some hand-on tutorials to the chemfiles Fortran API. These tutorial do not check for errors. When using chemfiles you might want to check for error using the status parameter that all functions take.

All the code here is under the CC-0 Universal Licence which means that you are free to do whatever you want with it (i.e. it is Public Domain code)

Read a single frame

In this tutorials we will read a frame from a trajectory, and print the indexes of all the atom in the half-space x < 5.

We start by use``ing the modules we will need: ``chemfiles, and iso_fortran_env. iso_fortran_env provides fixed width types that are very usefull when using a C++ library like chemfiles from Fortran. Here, we only need 64-bits integers and real.

use iso_fortran_env, only: int64, real64
use chemfiles
implicit none

We can then declare all the variables we will need. All chemfiles specific types are user defined type that should be declared with type(chfl_xxx). In this example we will also need a few other values: an array for positions and an array for the indexes of atoms with x < 5. The positions array is a pointer because we will not allocate memory for it. Instead we will directly access some memory inside chemfiles internal data structures.

type(chfl_trajectory) :: file
type(chfl_frame) :: frame
real(real64), dimension(:, :), pointer :: positions
integer(int64), dimension(:), allocatable :: less_than_five

Then we open a chfl_trajectory in read ('r') mode and read the first frame. We need to initialize memory for the chfl_frame before calling chfl_trajectory%read().

call file%open("filename.xyz", 'r')
call frame%init()
call file%read(frame)

We can now and get the positions of the atoms and the number of atoms in the frame with the chfl_frame%positions() subroutine, as well as the number of atoms in the frame.

positions => frame%positions()
natoms = size(positions, 2)

Knowning the total number of atoms in the frame, we can allocate memory to store the indices of the atoms with x < 5:

allocate(less_than_five(natoms))

Iterating through the atoms in the frame, we get the ones matching our condition. We need to track the number of matched atoms to know where to add them in the less_than_five array.

matched = 0
do i = 1, natoms
    if (positions(0, i) .lt. 5) then
        less_than_five(matched) = i
        matched = matched + 1
    end if
end do

At the end we can print our results

print*, "Atoms with x < 5:"
do i=1,matched
    print*, "  - ", less_than_five(i)
end do

And free all allocated memory. We don’t need to free positions, as it points into memory allocated and controlled by the frame.

deallocate(less_than_five)
call frame%free()
call file%close()
Click here to see the whole program
program indexes
    use iso_fortran_env, only: int64, real64
    use chemfiles
    implicit none

    type(chfl_trajectory) :: file
    type(chfl_frame) :: frame
    real(real64), dimension(:, :), pointer :: positions
    integer(int64), dimension(:), allocatable :: less_than_five
    integer(int64) :: i, natoms
    integer :: matched

    call file%open("filename.xyz", 'r')
    call frame%init()
    call file%read(frame)

    positions => frame%positions()
    natoms = size(positions, 2)
    allocate(less_than_five(natoms))

    matched = 0
    do i = 1, natoms
        if (positions(0, i) .lt. 5) then
            less_than_five(matched) = i
            matched = matched + 1
        end if
    end do

    print*, "Atoms with x < 5:"
    do i=1,matched
        print*, "  - ", less_than_five(i)
    end do

    deallocate(less_than_five)
    call frame%free()
    call file%close()
end program

For more information about reading frame in a trajectory, see the following functions:

Generating a structure

Now that we know how to read frames from a trajectory, let’s try to create a new structure and write it to a file. As previsouly, we start by use``ing the chemfiles and ``iso_fortran_env modules:

use iso_fortran_env, only: int64, real64
use chemfiles
implicit none

Again, we declare everything we will need: a topology, some atoms, a frame, a cell and a trajectory.

type(chfl_topology) :: topology
type(chfl_atom) :: O, H, C
type(chfl_frame) :: frame
type(chfl_cell) :: cell
type(chfl_trajectory) :: trajectory
real(real64), dimension(:, :), pointer :: positions

Everything starts with a chfl_topology. This is the type that defines the atoms and the connectivity in a system. Here, we initialize three chfl_atom and add them to the topology.

call H%init("H")
call O%init("O")
call C%init("C")

call topology%add_atom(H)
call topology%add_atom(O)
call topology%add_atom(H)

Then we can add bonds between these atoms. It is worth noting two things here: first the atomic number are converted to int64 before passing them to the function (one could also change the default size of integers with a global compiler flag). Second, the atomic numbers starts at 0, not 1. This might be source of confusion when using chemfiles, as indexes usually start at 1 in Fortran.

call topology%add_bond(int(0, int64), int(1, int64))
call topology%add_bond(int(2, int64), int(1, int64))

We can then create a chfl_frame and set its topology. We free the topology right after, because we no longer need it.

call frame%init()
call frame%resize(int(3, int64))
call frame%set_topology(topology)
call topology%free()

Once we set the topology, we can set the positions of the atoms. Notice how the indexes starts at 1 here, as we are using a standard fortran array.

positions => frame%positions()
positions(:, 1) = [1d0, 0d0, 0d0]
positions(:, 2) = [0d0, 0d0, 0d0]
positions(:, 3) = [0d0, 1d0, 0d0]

Another possibility is to directly add atoms and bonds to the frame. Here we define a second molecule representing carbon dioxyde.

call frame%add_atom(O, [5d0, 0d0, 0d0])
call frame%add_atom(C, [6d0, 0d0, 0d0])
call frame%add_atom(O, [7d0, 0d0, 0d0])
call frame%add_bond(int(3, int64), int(4, int64))
call frame%add_bond(int(4, int64), int(5, int64))

Finally, we can set the chfl_cell associated with this frame. We also free the cell memory, as it is no longer needed.

call cell%init([10d0, 10d0, 10d0])
call frame%set_cell(cell)
call cell%free()

Now that our frame is constructed, it is time to write it to a file. For that, we open a trajectory in write ('w') mode, and write to it.

call trajectory%open("water-co2.pdb", 'w')
call trajectory%write(frame)
call trajectory%close()

And free all remaining memory with the right functions.

call frame%free()
call C%free()
call H%free()
call O%free()
Click here to see the whole program
program generate
    use iso_fortran_env, only: int64, real64
    use chemfiles
    implicit none

    type(chfl_topology) :: topology
    type(chfl_atom) :: O, H, C
    type(chfl_frame) :: frame
    type(chfl_cell) :: cell
    type(chfl_trajectory) :: trajectory
    real(real64), dimension(:, :), pointer :: positions

    call topology%init()
    call H%init("H")
    call O%init("O")
    call C%init("C")

    call topology%add_atom(H)
    call topology%add_atom(O)
    call topology%add_atom(H)

    call topology%add_bond(int(0, int64), int(1, int64))
    call topology%add_bond(int(2, int64), int(1, int64))

    call frame%init()
    call frame%resize(int(3, int64))
    call frame%set_topology(topology)
    call topology%free()

    positions => frame%positions()
    positions(:, 1) = [1d0, 0d0, 0d0]
    positions(:, 2) = [0d0, 0d0, 0d0]
    positions(:, 3) = [0d0, 1d0, 0d0]

    call frame%add_atom(O, [5d0, 0d0, 0d0])
    call frame%add_atom(C, [6d0, 0d0, 0d0])
    call frame%add_atom(O, [7d0, 0d0, 0d0])
    call frame%add_bond(int(3, int64), int(4, int64))
    call frame%add_bond(int(4, int64), int(5, int64))

    call cell%init([10d0, 10d0, 10d0])
    call frame%set_cell(cell)
    call cell%free()

    call trajectory%open("water-co2.pdb", 'w')
    call trajectory%write(frame)
    call trajectory%close()

    call frame%free()
    call C%free()
    call H%free()
    call O%free()
end program

Using selections

Now that we know how to read and write frame from trajectories, how about we do a bit a filtering? In this tutorial, we will read all the frames from a file, and use selections to filter which atoms we will write back to another file. This example will also show how chemfiles can be used to convert from a file format to another one.

We start by declaring all the variables we need: two trajectories, a frame, a selection and an array of matches.

type(chfl_trajectory) :: input, output
type(chfl_frame) :: frame
type(chfl_selection) :: selection
type(chfl_match), allocatable, dimension(:) :: matches
integer(int64), allocatable, dimension(:) :: to_remove
integer(int64) :: count, i, step

Then we can open the two trajectories we need.

call input%open("input.arc", 'r')
call output%open("output.pdb", 'w')

We create a chfl_frame and a chfl_selection object to filter the atoms we want to keep.

call frame%init()
call selection%init("name Zn or name N")

Then we can iterate over the frames in the trajectory

do step=1,input%nsteps()
    call input%read(frame)

From here, we need to use the selection to get the atoms we want to remove. This is a two steps process: first we evaluate the selection and get the number of matches

count = 0
call selection%evaluate(frame, count)

Second we allocate some memory and get all the matches (represented as chfl_match):

allocate(matches(count))
call selection%matches(matches)

We can get the index of atoms in a to_remove array

allocate(to_remove(count))
do i = 1, count
    to_remove(i) = matches(i)%atoms(1)
end do

In order to remove the atoms from the frame, we need to sort to_remove in descending order: removing the atom at index i will shift the index of all the atoms after i. So we need start from the end and work toward the start of the frame.

call sort(to_remove)
do i = count - 1, 0, -1
    call frame%remove(to_remove(i))
end do

Finally, we can write the cleaned frame to the output file, and free the memory we allocated:

call output%write(frame)
deallocate(matches, to_remove)

The sort function we used to sort the matches is defined as follow:

subroutine sort(array)
    integer(int64), intent(inout) :: array(:)
    integer(int64) :: i, j, min, pos, tmp
    do i = 1, size(array) - 1
        min = array(i)
        pos = i
        do j = i, size(array)
            if (array(j) < min) then
                min = array(j)
                pos = j
            end if
        end do
        tmp = array(i)
        array(i) = min
        array(pos) = tmp
    end do
end subroutine
Click here to see the whole program
program select
    use iso_fortran_env, only: int64
    use chemfiles
    implicit none

    type(chfl_trajectory) :: input, output
    type(chfl_frame) :: frame
    type(chfl_selection) :: selection
    type(chfl_match), allocatable, dimension(:) :: matches
    integer(int64), allocatable, dimension(:) :: to_remove
    integer(int64) :: count, i, step

    call input%open("input.arc", 'r')
    call output%open("output.pdb", 'w')

    call frame%init()
    call selection%init("name Zn or name N")

    do step=1,input%nsteps()
        call input%read(frame)

        count = 0
        call selection%evaluate(frame, count)
        allocate(matches(count))
        call selection%matches(matches)

        allocate(to_remove(count))
        do i = 1, count
            to_remove(i) = matches(i)%atoms(1)
        end do

        call sort(to_remove)
        do i = count - 1, 0, -1
            call frame%remove(to_remove(i))
        end do

        call output%write(frame)
        deallocate(matches, to_remove)
    end do

    call selection%free()
    call frame%free()
    call input%close()
    call output%close()

contains
    ! A very simple and ineficient sorting routine
    subroutine sort(array)
        integer(int64), intent(inout) :: array(:)
        integer(int64) :: i, j, min, pos, tmp
        do i = 1, size(array) - 1
            min = array(i)
            pos = i
            do j = i, size(array)
                if (array(j) < min) then
                    min = array(j)
                    pos = j
                end if
            end do
            tmp = array(i)
            array(i) = min
            array(pos) = tmp
        end do
    end subroutine
end program