Phenix developers doc

Install and use line_profiler

It is a good idea to profile code to speed up runtime. line_profiler is a module for doing line-by-line profiling of functions: https://github.com/rkern/line_profiler

Installation and set up

It is preferable to install line_profiler with mamba because it is faster and it is less likely to make unnecessary changes to your dependencies.

If you don't have mamba, you'll first have to install mamba. To do this, activate your conda base environment:

conda activate base

Then install mamba:

conda install -c conda-forge mamba

Deactivate your conda environment:

conda deactivate

Then go to your phenix installation which contains the conda_base folder. If you install line_profiler in conda_base, then it will be available in phenix scripts.

mamba install -p ./conda_base -c conda-forge line_profiler

Now you can use line_profiler in your scripts.

Using line_profiler in the program template

Let's say we want to profile the run() method of a class that is created in a ProgramTemplate, such as in the second line of the code snippet below:

reduce_add_h_obj = reduce_hydrogen.place_hydrogens(model = self.model)
reduce_add_h_obj.run()

To profile the run() function, you can wrap it with the following lines of code:

reduce_add_h_obj = reduce_hydrogen.place_hydrogens(model = self.model)
import line_profiler
lp = line_profiler.LineProfiler(reduce_add_h_obj.run)
lp.enable()
reduce_add_h_obj.run()
lp.disable()
lp.print_stats()

Note that the argument of LineProfiler, reduce_add_h_obj.run, does not include the parentheses; so it is .run and not run().

line_profiler output

Below an example for the output from line_profiler. To save space, some lines were omitted.

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    82                                             def run(self):
    83                                               '''
    84                                               Function that places H atoms
    85                                               '''
    86         1          4.0      4.0      0.0      model_has_bogus_cs = False
    87
    90         1         28.0     28.0      0.0      cs = self.model.crystal_symmetry()
    91         1          1.0      1.0      0.0      if (cs is None) or (cs.unit_cell() is None):
    92                                                 self.model = shift_and_box_model(model = self.model)
    93                                                 model_has_bogus_cs = True
    94
    96         1       3367.0   3367.0      0.1      self.n_H_initial = self.model.get_hd_selection().count(True)
    97         1          2.0      2.0      0.0      if not self.keep_existing_H:
    98         1       4562.0   4562.0      0.1        self.model = self.model.select(~self.model.get_hd_selection())
    99
   100         1          5.0      5.0      0.0      t0 = time.time()
   101                                               # Add H atoms and place them at center of coordinates
   102         1    1479990.0 1479990.0     31.3      pdb_hierarchy = self.add_missing_H_atoms_at_bogus_position()
   103         1          2.0      2.0      0.0      if print_time:
   104                                                 print("add_missing_H_atoms_at_bogus_position:", round(time.time()-t0, 2))
   128
   129         1       1379.0   1379.0      0.0      pdb_hierarchy.sort_atoms_in_place()
   130         1         19.0     19.0      0.0      pdb_hierarchy.atoms().reset_serial()
   131                                           #    f = open("intermediate2.pdb","w")
   132                                           #    f.write(self.model.model_as_pdb())
   133
   134         1     513441.0 513441.0     10.9      p = mmtbx.model.manager.get_default_pdb_interpretation_params()

Here we can see that the functions add_missing_H_atoms_at_bogus_position() and get_default_pdb_interpretation_params() use 30% and 10 % of time respectively. To find out what slows down these functions, we would have to profile them separately, because line_profiler does not recursively analyse the calls.

Furthermore, if the code does not execute a statement, such as those in conditional parts, then it will not show up in the list. For example, line 92 was not executed in this call, so we don't know how much time it would need.

Running line_profiler in a script

This example is from stackoverflow:

from line_profiler import LineProfiler
import random

def do_stuff(numbers):
    s = sum(numbers)
    l = [numbers[i]/43 for i in range(len(numbers))]
    m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

numbers = [random.randint(1,100) for i in range(1000)]
lp = LineProfiler()
lp_wrapper = lp(do_stuff)
lp_wrapper(numbers)
lp.print_stats()

line_profiler in the cctbx

An example of the line_profiler in cctbx is here:
/simtbx/command_line/stage_two.py#L84

The output is printed here
/simtbx/command_line/stage_two.py#L112

using this simple function
/simtbx/diffBragg/hopper_utils.py#L1321