Using LABELIT to Analyze Pseudotranslation

This document is part of the Usage Primer for LABELIT's command-line interface. Here we focus on pseudotranslation (non-crystallographic translation where the translation vector is a rational fraction of the unit cell). The resulting diffraction patterns contain alternating strong & weak Bragg spots. Since indexing relies on choosing candidate spots that represent the lattice, there is always a chance that the alternating weak spots will be missed. LABELIT now performs an extra search step to hunt for these weak spots, as described in Acta Cryst D. The general workflow is as follows, with (*) indicating the new steps that are targeted at analyzing pseudotranslation:
  (1)         (2)         (*3)           (4)          (5)            (*6)
            Indexing    Targeted        Print       Print          Analyze
raw data -> to get   -> search for   -> possible -> preliminary -> integrated
            triclinic   weak spots;     Bravais     integration    signal for
            basis       correct         lattices    results        alternating
            vectors     basis                                      strong/weak
                        vectors                                    pattern
                        accordingly 
  
Within this framework datasets give rise to four different types of outcome:

The following two sections give a detailed explanation of the program output, and the relevant command line parameters.

Program Output

We will examine the LABELIT output generated upon indexing the 2qyv dataset, available from the Joint Center for Structural Genomics.
% labelit.index 52009_4_###.img 1 90
This command requests the indexing of two 1-degree rotation images, taken at rotational settings 89-degrees apart. The targeted search for weak spots (*3) produces this output:

Code             Transformation N_off  Z_on  Z_off Overlap Overlap2 Position N_off  Z_on  Z_off Overlap Overlap2 Position
 1.0  1  0  0  0  1  0  0  0  1  1022  95.6            ---      ---      ---  1034  56.2            ---      ---      ---
 3.0  1  0  0  0  1  0  0  0  3   325   0.0             No      ---      ---    86   0.0             No      ---      ---
 3.1  1  0  0  0  3  0  0 -1  1  1244  93.8  1.14G     ---      ---      ---  1504  73.5  1.36G     ---      ---      ---
 3.2  1  0  0  0  3  0  0  0  1    18   0.0             No      ---      ---   999   0.0             No      ---      ---
 3.3  1  0  0  0  3  0  0  1  1   356   0.0             No      ---      ---  1125  86.4  2.20E     ---       No   54% Ok
 3.4  3  0  0 -1  1  0 -1  0  1  2088  95.6  1.28G     ---      ---      ---  2065  58.1  1.19G     ---      ---      ---
 3.5  3  0  0 -1  1  0  0  0  1  2031  98.4  1.15G     ---      ---      ---  2068  57.8  1.11G     ---      ---      ---
 3.6  3  0  0 -1  1  0  1  0  1  2102  96.6  1.19G     ---      ---      ---  2071  57.1  1.18G     ---      ---      ---
 3.7  3  0  0  0  1  0 -1  0  1  2051  96.2  1.15G     ---      ---      ---  2091  57.0  1.14G     ---      ---      ---
 3.8  3  0  0  0  1  0  0  0  1  1706 114.6  1.30G     ---      ---      ---  1383  70.5  1.22G     ---      ---      ---
 3.9  3  0  0  0  1  0  1  0  1  1002   0.0             No      ---      ---   783   0.0             No      ---      ---
3.10  3  0  0  1  1  0 -1  0  1  1930 100.7  1.17G     ---      ---      ---  2061  56.2  1.15G     ---      ---      ---
3.11  3  0  0  1  1  0  0  0  1  1616 118.1  1.30G     ---      ---      ---   226   0.0             No      ---      ---
3.12  3  0  0  1  1  0  1  0  1  1249 121.4  1.21G     ---      ---      ---   196   0.0             No      ---      ---
 2.0  1  0  0  0  1  0  0  0  2   782 106.0  1.10G     ---      ---      ---   663  23.7  1.30G     ---      ---      ---
 2.1  1  0  0  0  2  0  0  0  1  1014  95.0  1.05G     ---      ---      ---   893  67.5  1.26G     ---      ---      ---
 2.2  1  0  0  0  2  0  0  1  1   734 101.5  4.95E     ---      New   77% Ok   405  63.7  6.71E     ---      New   82% Ok *
 2.3  2  0  0  0  1  0  0  0  1  1072  95.6  1.20G     ---      ---      ---  1055  56.2  1.07G     ---      ---      ---
 2.4  2  0  0  0  1  0  1  0  1  1030  97.1  1.18G     ---      ---      ---  1029  58.0  1.13G     ---      ---      ---
 2.5  2  0  0  1  1  0  0  0  1  1043  96.0  1.28G     ---      ---      ---  1007  56.2  1.03G     ---      ---      ---
 2.6  2  0  0  1  1  0  1  0  1  1097  95.6  1.20G     ---      ---      ---  1036  56.2  1.07G     ---      ---      ---

Transforming the lattice and unit cell to account for the discovered pseudotranslation indicated by (*)
        See http://cci.lbl.gov/labelit/html/sublattice.html for a detailed explanation of this output.
The Code column identifies the potential sublattice transformation being considered. The value is of the form n.m. The n is the index of the sublattice as defined in the paper; for example n=2 identifies the possibility that the true asymmetric unit has twice the volume of the a.s.u. originally identified. Billiet and Rolley-Le Coz (Acta Cryst. A36, 242-248, 1980) showed that for n=2 there are exactly seven unique transformations leading to cell doubling, thus the subindex m ranges from 0 to 6 in this listing. The Transformation column gives the matrix M-transpose, which figures in equations (3) and (5) of the paper.

The remainder of the listing contains a group of six columns to analyze each diffraction image. N_off gives the the size of the sample of predicted Bragg spots under consideration. In the first row (code=1.0) N_off represents the number of spots on the main lattice, produced by the original indexing solution. For the remaining rows, N_off gives the number of spots on the potential coset lattice, which may or may not have a weak signal present. Notice that if the main lattice has about 1000 spots, then a tripling of the unit cell leads to about 2000 coset spots, and a doubling of the unit cell leads to about 1000 coset spots.

An exception to this rule of thumb occurs if the Overlap column contains the word "No". This indicates that there is significant spot overlap between the main lattice and the potential coset lattice, rendering any further analysis suspect. A "No" listing immediately disqualifies a potential lattice.

The Z_on column lists the signal strength of the main lattice. The value is given as a Z-score as described in section 3.1 of the paper, where the relevant variance describes the distribution of local background pixels around the local background plane. Slightly different Z-scores are reported for each potential transformation; this is because in the cell-doubling and -tripling cases, some main lattice Bragg spots are disqualified due to overlap with the coset lattice.

The Z_off value is a similar calculation of the signal strength on the coset lattice. Here, however, an analysis is done as in Fig. 2 of the paper, to determine how the signal is distributed. A letter "G" indicates that the distribution is best described as a Gaussian, implying that the signal on the coset lattice is really just noise. A letter "E" indicates an Exponential signal, that is deserving of further analysis.

In the Overlap2 the second overlap-detection algorithm of section 3.3 of the paper is used to determine of any of the coset spots might be suspect. If so, the word "New" appears in this column, indicating that some of the coset spots have been thrown out, and the "N_off" and "Z_off" values recalculated.

Finally, the Position column reports the analysis of section 3.2 and Figure 3b of the paper. After analyzing the offset of the main lattice spots away from their ideal predicted positions, we determine the ellipse bounding 95% of main-lattice positions. Then turning to the observed signal on the coset lattice, we report the percentage of spots that fall within this bounding ellipse. If the percentage drops below 50%, the transformation is disqualified.

Any transformation that passes all tests for both images is accepted as a positive result, and is applied to the unit cell basis. LABELIT then executes step (4) to list the possible Bravais lattices and step (5) to show preliminary resolution limts based on spot integration:

LABELIT Indexing results:
Beam center x   94.07mm, y   93.94mm, distance  180.02mm ; 80% mosaicity=0.25 deg.

Solution  Metric fit  rmsd  #spots  crystal_system   unit_cell                                  volume

:)   5     0.0638 dg 0.136    564  orthorhombic oP   84.59 123.41 174.40  90.00  90.00  90.00  1820614
:)   4     0.0571 dg 0.134    563    monoclinic mP   84.58 123.41 174.41  90.00  90.03  90.00  1820568
:)   3     0.0638 dg 0.134    563    monoclinic mP   84.59 174.40 123.41  90.00  90.01  90.00  1820544
:)   2     0.0311 dg 0.137    567    monoclinic mP  123.40  84.59 174.40  90.00  90.06  90.00  1820352
:)   1     0.0000 dg 0.120    569     triclinic aP   84.55 123.36 174.46  90.06  90.03  90.01  1819788

MOSFLM Integration results:
Solution  SpaceGroup Beam x   y  distance  Resolution Mosaicity RMS
:)   5         P222  94.09  94.12  180.05       2.24    0.250000    0.041
     1           P1  94.07  94.13  180.05       2.23    0.250000    0.041
Now in step (*6), the integrated signals are interpreted to reveal the alternating strong and weak spots:
MOSFLM analysis of partials & fulls:
            Potential Sublattice    Index Lattice                        Ratio
 Resolution Intensity I/sigma    N  Intensity I/sigma    N  ssth/lambda
       7.26    607.1   16.45    170   7699.0   24.14    158 0.009480      12.7
 7.26  5.14    691.7   11.37    332   2966.7   16.42    272 0.028440       4.3
 5.14  4.19    941.2   11.06    368   4490.7   16.16    376 0.047400       4.8
 4.19  3.63    693.2    8.81    452   2683.2   12.48    446 0.066361       3.9
 3.63  3.25    445.1    5.75    495   1616.0    9.87    488 0.085321       3.6
 3.25  2.96    235.7    3.22    544    843.9    6.56    535 0.104281       3.6
 2.96  2.74    138.5    2.06    581    439.1    4.42    577 0.123241       3.2
 2.74  2.57     86.9    1.45    447    270.7    3.28    422 0.142201       3.1
 2.57  2.42     47.6    0.82    289    160.0    2.31    292 0.161162       3.4
 2.42  2.30     37.3    0.63    212    115.0    1.78    197 0.180122       3.1
 2.30  2.19     26.3    0.45    112    112.7    1.29    102 0.199082       4.3
 2.19  2.10     36.2    0.61     42     41.2    0.69     39 0.218042       1.1
 
This listing is similar to the one in Table 2 of the paper. For each resolution bin, we analyze the main lattice and the coset lattice as to integrated intensity, I/sigma, and number of spots. The final column is the ratio of /; as in many cases, the ratio here is most pronounced in the very low-resolution range. All possible transformations are analyzed for this listing, but the listing is only printed if there is a significant Ratio, and then only the transformation giving the largest ratio.

Command Line Parameters

It is worth considering how parameters can be added to the command line, for activating specific program features. [Important: at present these parameters must be added to the command line, rather than to the dataset_preferences.py file described elsewhere in this primer. The sole exception is "distl_maximum_number_spots_for_indexing" which must be placed in dataset_preferences.py. More flexibility will be worked into future program development.]

sublattice_allow=True|False (default=True)
Switch on steps (*3) and (*6) for the detection of sublattices. Normally, if there is no sublattice, the silent execution of these steps should not interfere with indexing. Exceptions (false positives) should be reported to the authors, but an immediate workaround is to set this flag to False.
sublattice_verbose=False|True (default=False)
Print the results from step (*3), even if there are no accepted sublattice transformations.
sublattice_maximum_modulus=3
Specifies the maximum cell multiple to consider; for example n=3 considers both cell doubling and tripling.
sublattice_force_index=n.m
The possible values are the same identifiers listed in the Code column from step (*3) above. If a specific code is given, LABELIT is forced to apply the specified transformation to the indexing solution. This allows the user to index a diffraction pattern in the specified way, after examining sublattice plots, even if the Z-statistics do not reveal an acceptable signal.
sublattice_significance_cutoff=2.0 (default=2.0 standard deviations)
The mean Z-score (Z_off, from step *3) from a coset class of candidate reflections must exceed this value to be considered significant.
distl_maximum_number_spots_for_indexing = 300
Even though many hundreds of Bragg spot candidates may be considered "good" for autoindexing, DISTL chooses the brightest 300 by default. (But for very large unit cells, it instead chooses the brightest 15%). The rationale for using the brightest spots is laid out in the paper--it is empirically found to give the best indexing results. However, it is evident that this approach can limit the inclusion of weak spots that might be evidence of pseudotranslation. THE PURPOSE OF STEP (*3) IS TO PROVIDE AN OPTIMAL ALTERNATIVE TO AVOID THIS DILEMMA. However, if the user wishes to reproduce the results in section 4 of the paper, it is useful to manipulate this parameter. Parameter must be placed in the file dataset_preferences.py, not on the command line.
sublattice_pdf_file=filename.pdf
If given, specify a file path to output a picture of the sublattice model, as in Fig. 1 of the paper.
sublattice_pdf_render_all=False|True (default=False)
Normally the sublattice_pdf_file shows the spot predictions only for the accepted sublattice candidate. However, if the render_all flag is set True, spot predictions are shown for all possible sublattices, on separate pages.
sublattice_pdf (numerous other tweaks)
When producing pdf files showing lattice predictions, there are numerous other parameters to be set, that are documented in the labelit/phil_preferences.py file.
         sublattice_pdf_box_selection=all|index|coset
         sublattice_pdf_enable_legend=False|True
         sublattice_pdf_enable_legend_font_size=10
         sublattice_pdf_enable_legend_ink_color=black
         sublattice_pdf_enable_legend_vertical_offset=10
         sublattice_pdf_box_linewidth=0.04
         sublattice_pdf_window_fraction=0.666666
         sublattice_pdf_window_offset_x=0.16667
         sublattice_pdf_window_offset_y=0.16667
         sublattice_pdf_markup_inliers=True
         sublattice_pdf_profile_shrink=0
         image_brightness=1.0

mosaicity= (default = None)
If a value is given in degrees, alter the mosaicity model used in the calculation of spot overlap, affecting for example, Figure 4 of the paper.