It is a good idea to profile code to speed up runtime.
line_profiler is a module for doing line-by-line profiling of functions: https://github.com/rkern/line_profiler
It is preferable to install line_profiler with mamba because it is faster and it is less likely to make unnecessary changes to your dependencies.
If you don't have mamba, you'll first have to install mamba. To do this, activate your conda base environment:
conda activate base
Then install mamba:
conda install -c conda-forge mamba
Deactivate your conda environment:
Then go to your phenix installation which contains the
conda_base folder. If you install line_profiler in conda_base, then it will be available in phenix scripts.
mamba install -p ./conda_base -c conda-forge line_profiler
Now you can use line_profiler in your scripts.
Let's say we want to profile the run() method of a class that is created in a ProgramTemplate, such as in the second line of the code snippet below:
reduce_add_h_obj = reduce_hydrogen.place_hydrogens(model = self.model) reduce_add_h_obj.run()
To profile the
run() function, you can wrap it with the following lines of code:
reduce_add_h_obj = reduce_hydrogen.place_hydrogens(model = self.model) import line_profiler lp = line_profiler.LineProfiler(reduce_add_h_obj.run) lp.enable() reduce_add_h_obj.run() lp.disable() lp.print_stats()
Note that the argument of LineProfiler,
reduce_add_h_obj.run, does not include the parentheses; so it is .run and not run().
Below an example for the output from line_profiler. To save space, some lines were omitted.
Line # Hits Time Per Hit % Time Line Contents ============================================================== 82 def run(self): 83 ''' 84 Function that places H atoms 85 ''' 86 1 4.0 4.0 0.0 model_has_bogus_cs = False 87 90 1 28.0 28.0 0.0 cs = self.model.crystal_symmetry() 91 1 1.0 1.0 0.0 if (cs is None) or (cs.unit_cell() is None): 92 self.model = shift_and_box_model(model = self.model) 93 model_has_bogus_cs = True 94 96 1 3367.0 3367.0 0.1 self.n_H_initial = self.model.get_hd_selection().count(True) 97 1 2.0 2.0 0.0 if not self.keep_existing_H: 98 1 4562.0 4562.0 0.1 self.model = self.model.select(~self.model.get_hd_selection()) 99 100 1 5.0 5.0 0.0 t0 = time.time() 101 # Add H atoms and place them at center of coordinates 102 1 1479990.0 1479990.0 31.3 pdb_hierarchy = self.add_missing_H_atoms_at_bogus_position() 103 1 2.0 2.0 0.0 if print_time: 104 print("add_missing_H_atoms_at_bogus_position:", round(time.time()-t0, 2)) 128 129 1 1379.0 1379.0 0.0 pdb_hierarchy.sort_atoms_in_place() 130 1 19.0 19.0 0.0 pdb_hierarchy.atoms().reset_serial() 131 # f = open("intermediate2.pdb","w") 132 # f.write(self.model.model_as_pdb()) 133 134 1 513441.0 513441.0 10.9 p = mmtbx.model.manager.get_default_pdb_interpretation_params()
Here we can see that the functions
get_default_pdb_interpretation_params() use 30% and 10 % of time respectively. To find out what slows down these functions, we would have to profile them separately, because line_profiler does not recursively analyse the calls.
Furthermore, if the code does not execute a statement, such as those in conditional parts, then it will not show up in the list. For example, line 92 was not executed in this call, so we don't know how much time it would need.
This example is from stackoverflow:
from line_profiler import LineProfiler import random def do_stuff(numbers): s = sum(numbers) l = [numbers[i]/43 for i in range(len(numbers))] m = ['hello'+str(numbers[i]) for i in range(len(numbers))] numbers = [random.randint(1,100) for i in range(1000)] lp = LineProfiler() lp_wrapper = lp(do_stuff) lp_wrapper(numbers) lp.print_stats()
An example of the line_profiler in cctbx is here:
The output is printed here
using this simple function