Tutorials¶
This section present some hand-on tutorials to the chemfiles Fortran API. These
tutorial do not check for errors. When using chemfiles you might want to check
for error using the status
parameter that all functions take.
All the code here is under the CC-0 Universal Licence which means that you are free to do whatever you want with it (i.e. it is Public Domain code)
Read a single frame¶
In this tutorials we will read a frame from a trajectory, and print the indexes
of all the atom in the half-space x < 5
.
We start by use``ing the modules we will need: ``chemfiles
, and
iso_fortran_env
. iso_fortran_env
provides fixed width types that are
very usefull when using a C++ library like chemfiles from Fortran. Here, we only
need 64-bits integers and real.
use iso_fortran_env, only: int64, real64
use chemfiles
implicit none
We can then declare all the variables we will need. All chemfiles specific types
are user defined type that should be declared with type(chfl_xxx)
. In this
example we will also need a few other values: an array for positions and an
array for the indexes of atoms with x < 5
. The positions array is a pointer
because we will not allocate memory for it. Instead we will directly access some
memory inside chemfiles internal data structures.
type(chfl_trajectory) :: file
type(chfl_frame) :: frame
real(real64), dimension(:, :), pointer :: positions
integer(int64), dimension(:), allocatable :: less_than_five
Then we open a chfl_trajectory
in read ('r'
) mode and read the
first frame. We need to initialize memory for the chfl_frame
before
calling chfl_trajectory%read()
.
call file%open("filename.xyz", 'r')
call frame%init()
call file%read(frame)
We can now and get the positions of the atoms and the number of atoms in the
frame with the chfl_frame%positions()
subroutine, as well as the number
of atoms in the frame.
positions => frame%positions()
natoms = size(positions, 2)
Knowning the total number of atoms in the frame, we can allocate memory to store
the indices of the atoms with x < 5
:
allocate(less_than_five(natoms))
Iterating through the atoms in the frame, we get the ones matching our
condition. We need to track the number of matched
atoms to know where to add
them in the less_than_five
array.
matched = 0
do i = 1, natoms
if (positions(0, i) .lt. 5) then
less_than_five(matched) = i
matched = matched + 1
end if
end do
At the end we can print our results
print*, "Atoms with x < 5:"
do i=1,matched
print*, " - ", less_than_five(i)
end do
And free all allocated memory. We don’t need to free positions
, as it points
into memory allocated and controlled by the frame.
deallocate(less_than_five)
call frame%free()
call file%close()
Click here to see the whole program
program indexes
use iso_fortran_env, only: int64, real64
use chemfiles
implicit none
type(chfl_trajectory) :: file
type(chfl_frame) :: frame
real(real64), dimension(:, :), pointer :: positions
integer(int64), dimension(:), allocatable :: less_than_five
integer(int64) :: i, natoms
integer :: matched
call file%open("filename.xyz", 'r')
call frame%init()
call file%read(frame)
positions => frame%positions()
natoms = size(positions, 2)
allocate(less_than_five(natoms))
matched = 0
do i = 1, natoms
if (positions(0, i) .lt. 5) then
less_than_five(matched) = i
matched = matched + 1
end if
end do
print*, "Atoms with x < 5:"
do i=1,matched
print*, " - ", less_than_five(i)
end do
deallocate(less_than_five)
call frame%free()
call file%close()
end program
For more information about reading frame in a trajectory, see the following functions:
chfl_trajectory%nsteps()
to know when to stop readingchfl_trajectory%read_step()
to directlty read a given step.chfl_trajectory%set_cell()
andchfl_trajectory%set_topology()
to specify an unit cell or a topology for all frames in a trajectory.
Generating a structure¶
Now that we know how to read frames from a trajectory, let’s try to create a new
structure and write it to a file. As previsouly, we start by use``ing the
chemfiles and ``iso_fortran_env
modules:
use iso_fortran_env, only: int64, real64
use chemfiles
implicit none
Again, we declare everything we will need: a topology, some atoms, a frame, a cell and a trajectory.
type(chfl_topology) :: topology
type(chfl_atom) :: O, H, C
type(chfl_frame) :: frame
type(chfl_cell) :: cell
type(chfl_trajectory) :: trajectory
real(real64), dimension(:, :), pointer :: positions
Everything starts with a chfl_topology
. This is the type that defines
the atoms and the connectivity in a system. Here, we initialize three
chfl_atom
and add them to the topology.
call H%init("H")
call O%init("O")
call C%init("C")
call topology%add_atom(H)
call topology%add_atom(O)
call topology%add_atom(H)
Then we can add bonds between these atoms. It is worth noting two things here:
first the atomic number are converted to int64
before passing them to the
function (one could also change the default size of integers with a global
compiler flag). Second, the atomic numbers starts at 0, not 1. This might be
source of confusion when using chemfiles, as indexes usually start at 1 in
Fortran.
call topology%add_bond(int(0, int64), int(1, int64))
call topology%add_bond(int(2, int64), int(1, int64))
We can then create a chfl_frame
and set its topology. We free the
topology right after, because we no longer need it.
call frame%init()
call frame%resize(int(3, int64))
call frame%set_topology(topology)
call topology%free()
Once we set the topology, we can set the positions of the atoms. Notice how the indexes starts at 1 here, as we are using a standard fortran array.
positions => frame%positions()
positions(:, 1) = [1d0, 0d0, 0d0]
positions(:, 2) = [0d0, 0d0, 0d0]
positions(:, 3) = [0d0, 1d0, 0d0]
Another possibility is to directly add atoms and bonds to the frame. Here we define a second molecule representing carbon dioxyde.
call frame%add_atom(O, [5d0, 0d0, 0d0])
call frame%add_atom(C, [6d0, 0d0, 0d0])
call frame%add_atom(O, [7d0, 0d0, 0d0])
call frame%add_bond(int(3, int64), int(4, int64))
call frame%add_bond(int(4, int64), int(5, int64))
Finally, we can set the chfl_cell
associated with this frame. We
also free the cell memory, as it is no longer needed.
call cell%init([10d0, 10d0, 10d0])
call frame%set_cell(cell)
call cell%free()
Now that our frame is constructed, it is time to write it to a file. For that,
we open a trajectory in write ('w'
) mode, and write to it.
call trajectory%open("water-co2.pdb", 'w')
call trajectory%write(frame)
call trajectory%close()
And free all remaining memory with the right functions.
call frame%free()
call C%free()
call H%free()
call O%free()
Click here to see the whole program
program generate
use iso_fortran_env, only: int64, real64
use chemfiles
implicit none
type(chfl_topology) :: topology
type(chfl_atom) :: O, H, C
type(chfl_frame) :: frame
type(chfl_cell) :: cell
type(chfl_trajectory) :: trajectory
real(real64), dimension(:, :), pointer :: positions
call topology%init()
call H%init("H")
call O%init("O")
call C%init("C")
call topology%add_atom(H)
call topology%add_atom(O)
call topology%add_atom(H)
call topology%add_bond(int(0, int64), int(1, int64))
call topology%add_bond(int(2, int64), int(1, int64))
call frame%init()
call frame%resize(int(3, int64))
call frame%set_topology(topology)
call topology%free()
positions => frame%positions()
positions(:, 1) = [1d0, 0d0, 0d0]
positions(:, 2) = [0d0, 0d0, 0d0]
positions(:, 3) = [0d0, 1d0, 0d0]
call frame%add_atom(O, [5d0, 0d0, 0d0])
call frame%add_atom(C, [6d0, 0d0, 0d0])
call frame%add_atom(O, [7d0, 0d0, 0d0])
call frame%add_bond(int(3, int64), int(4, int64))
call frame%add_bond(int(4, int64), int(5, int64))
call cell%init([10d0, 10d0, 10d0])
call frame%set_cell(cell)
call cell%free()
call trajectory%open("water-co2.pdb", 'w')
call trajectory%write(frame)
call trajectory%close()
call frame%free()
call C%free()
call H%free()
call O%free()
end program
Using selections¶
Now that we know how to read and write frame from trajectories, how about we do a bit a filtering? In this tutorial, we will read all the frames from a file, and use selections to filter which atoms we will write back to another file. This example will also show how chemfiles can be used to convert from a file format to another one.
We start by declaring all the variables we need: two trajectories, a frame, a selection and an array of matches.
type(chfl_trajectory) :: input, output
type(chfl_frame) :: frame
type(chfl_selection) :: selection
type(chfl_match), allocatable, dimension(:) :: matches
integer(int64), allocatable, dimension(:) :: to_remove
integer(int64) :: count, i, step
Then we can open the two trajectories we need.
call input%open("input.arc", 'r')
call output%open("output.pdb", 'w')
We create a chfl_frame
and a chfl_selection
object to
filter the atoms we want to keep.
call frame%init()
call selection%init("name Zn or name N")
Then we can iterate over the frames in the trajectory
do step=1,input%nsteps()
call input%read(frame)
From here, we need to use the selection to get the atoms we want to remove. This is a two steps process: first we evaluate the selection and get the number of matches
count = 0
call selection%evaluate(frame, count)
Second we allocate some memory and get all the matches (represented as
chfl_match
):
allocate(matches(count))
call selection%matches(matches)
We can get the index of atoms in a to_remove array
allocate(to_remove(count))
do i = 1, count
to_remove(i) = matches(i)%atoms(1)
end do
In order to remove the atoms from the frame, we need to sort to_remove
in
descending order: removing the atom at index i will shift the index of all the
atoms after i. So we need start from the end and work toward the start of the
frame.
call sort(to_remove)
do i = count - 1, 0, -1
call frame%remove(to_remove(i))
end do
Finally, we can write the cleaned frame to the output file, and free the memory we allocated:
call output%write(frame)
deallocate(matches, to_remove)
The sort
function we used to sort the matches is defined as follow:
subroutine sort(array)
integer(int64), intent(inout) :: array(:)
integer(int64) :: i, j, min, pos, tmp
do i = 1, size(array) - 1
min = array(i)
pos = i
do j = i, size(array)
if (array(j) < min) then
min = array(j)
pos = j
end if
end do
tmp = array(i)
array(i) = min
array(pos) = tmp
end do
end subroutine
Click here to see the whole program
program select
use iso_fortran_env, only: int64
use chemfiles
implicit none
type(chfl_trajectory) :: input, output
type(chfl_frame) :: frame
type(chfl_selection) :: selection
type(chfl_match), allocatable, dimension(:) :: matches
integer(int64), allocatable, dimension(:) :: to_remove
integer(int64) :: count, i, step
call input%open("input.arc", 'r')
call output%open("output.pdb", 'w')
call frame%init()
call selection%init("name Zn or name N")
do step=1,input%nsteps()
call input%read(frame)
count = 0
call selection%evaluate(frame, count)
allocate(matches(count))
call selection%matches(matches)
allocate(to_remove(count))
do i = 1, count
to_remove(i) = matches(i)%atoms(1)
end do
call sort(to_remove)
do i = count - 1, 0, -1
call frame%remove(to_remove(i))
end do
call output%write(frame)
deallocate(matches, to_remove)
end do
call selection%free()
call frame%free()
call input%close()
call output%close()
contains
! A very simple and ineficient sorting routine
subroutine sort(array)
integer(int64), intent(inout) :: array(:)
integer(int64) :: i, j, min, pos, tmp
do i = 1, size(array) - 1
min = array(i)
pos = i
do j = i, size(array)
if (array(j) < min) then
min = array(j)
pos = j
end if
end do
tmp = array(i)
array(i) = min
array(pos) = tmp
end do
end subroutine
end program