Usage & syntax for LABELIT command-line programs

This document explains how to use the command-line version of LABELIT.  It is assumed that the user has downloaded and installed the program, and configured the environment according to the README file.  LABELIT can now be used to autoindex a set of two image frames from an oscillation dataset.  Refer to the user manual to understand the program output.

Working Directory

A separate current working directory should be created specifically for LABELIT.  This is because LABELIT writes intermediate results to a number of files (some of them in python-pickle format), and because flat files are used for MOSFLM input and output.  

The images do not necessarily have to be in the current working directory.  However, it can be advantageous to set up a separate current working directory for each dataset, where the working files will permanently reside.  There are several reasons for this:
  1. Keeping the intermediate files allows image overlay pictures to be immediately generated by a unix command, as shown below.
  2. The user may want to keep the MOSFLM input & output files.
  3. Dataset-specific parameters can be defined in a separate file called dataset_preferences.py; see below.

Specific LABELIT commands:

-----------------Autoindexing--------------------


labelit.index


Index one or two diffraction images.  Image names can be specified on the command line in one of four ways:
  1. Implicit search for one or two images:
    labelit.index                      #search current working directory
    labelit.index directory_name     #search named directory

    Image files are identified by name where the name obeys one of these two lexical templates:
    prefix_001.img    # allowable extensions include ["img","tif","tiff","image"]
                      # type labelit.extensions for the full list
    prefix.0001       # 3 or 4 numerical digits allowed


    The program will exit if too many lexical matches are found in the requested directory, such as several datasets being present, or more than two numbered files from a single dataset.  Also, the program will exit if the files are not in a supported file format (see user manual) or if two images are requested having inconsistent experimental parameters (such as different detector distances) in the file header.

  2. Explicit path names for one or two images:
    labelit.index ./prefix_0*.img
    Note:  as shown in this example, wildcards are permissible under Unix as they are expanded to a lexical list before being passed to LABELIT.

  3. Combined directory and image number:
    labelit.index directory 1 90   #specify a directory path followed by one or two image numbers

  4. Combined filename template and image number:
    labelit.index directory/template_###.img 1 90   #specify template followed by one or more image numbers
The --index_only flag can always be specified to prevent the MOSFLM step from running.  In this case, LABELIT indexing solutions will be printed to the screen and the program will stop there.  MOSFLM orientation matrix files will not be written.

File Outputs: lookat

In the event that the user wants to do further analysis with MOSFLM, appropriate control files are provided.  If the user wishes to circumvent the indexing results produced by LABELIT, the lookat file is a good starting point for interactive indexing.  The resolution and mosaicity values are meant to be altered according to user preference.

File Outputs: labelit.mosflm_script

A more common scenario is that the user will want to take the indexing solution from LABELIT, and use it for further analysis of the dataset.  This analysis will not be limited to the two images used for indexing; it may include postrefinement based on multiple images, and integration of the entire dataset.  labelit.mosflm_script outputs a command script for starting an interactive MOSFLM session.  Once started, the user can press the "Predict" button to show the LABELIT model.  Refinement and integration can then be started.  Note that the user must specify which Bravais symmetry solution to use, since each solution is represented by a different orientation matrix.  For example, to choose solution 2, issue the command "labelit.mosflm_script 2" which produces a command script called integration02.csh. If no solution number is specified, a list of all solutions is printed.

File Outputs: labelit.mosflm_scripts

Generates MOSFLM command scripts for all possible Bravais lattice settings, all at once.

File Outputs: index##

The MOSFLM integration results internally used by LABELIT are also available, and can be found in files numerically coded by Bravais solution.  For example, from solution 2 we have the following files:

index02
Actual MOSFLM input file used to produce the results
index02.mat
Input orientation matrix for solution 2, from LABELIT
index02.out
MOSFLM screen output
index02.sum
MOSFLM summary file
index02.mtz
CCP4-formatted integrated spot file


The user is cautioned that the resolution limit of index02 is meant to be an outside value so that MOSLFM integrates more spots than are actually expected to have good signal-to-noise ratios.

Informational Output: labelit.stats_distl

LABELIT uses the SSRL program DISTL to find candidate Bragg spots prior to indexing. The entire process is silent: no printout is produced, and the results are deposited in the file DISTL_pickle. To see a summary of the results after the calculation is finished, use the command labelit.stats_distl.

Informational Output: labelit.stats_index

Print again the screen output from the last run of labelit.index. Results are extracted from the pickle-format files in the current working directory. Note to beamline software developers: A command-line option, labelit.stats_index --api, prints a python dictionary containing the same information. The underlying Python call [ SummaryPrinter().get_api() ] is intended to make this data available within a Python application server.

Special Parameter Input:  When Things Go Wrong

Normally, critical model paramters such as the beam position are taken from the file header.  If for some reason these parameters are wrong, it may be impossible to index the dataset unless LABELIT is instructed with override values.  On the command line, specify any or all of the following items:

autoindex_override_beam=93.2,100.4   #your xy beam coordinates in mm
autoindex_override_distance=250.00   #your detector distance in mm
autoindex_override_wavelength=0.9793 #your X-ray wavelength in Angstroms
autoindex_override_twotheta=0.5      #your two-theta swing in degrees; can be + or -
autoindex_override_deltaphi=1.0      #your rotation deltaphi in degrees

...Now labelit.reset followed by labelit.index will use these new parameters.

Any key=value parameter from the command line can alternately be defined in one of three file locations where configurable parameters may be set. Parameters take higher precedence when defined in files further down this list, with command-line values taking the highest precedence:
  1. labelit_sources/labelit/site_preferences.py. A location where the system administrator can set parameters shared by all users.
  2. $HOME/.labelit_preferences.py. A place where a particular user can define settings applicable to all work done under that unix account.
  3. dataset_preferences.py. Placed in the current working directory, this file contains parameters applicable only to the particular dataset being analyzed.
  Beginning in April, 2011, there is no distinction between parameters defineable in files vs. the command line; any parameter can go in either location, with the proviso that whitespaces are not allowed on the command line

Most oscillation datasets can be indexed without overriding any parameters. Indeed, LABELIT has been designed without a graphical user interface in an effort to create a command-line autoindexing program to work behind the scenes at synchrotron beamlines. However, there will always be some cases where the output can be improved if the user carefully inspects the results (see Graphical Output below) and adjusts certain things in response. In addition to the autoindex_override items listed above, here are the other optionally configurable values, listed with their defaults:

beamplot_pdf_file = None
This is the most useful parameter which can be set. It is used to create a color contour map showing probable positions of the direct beam on the detector face, as was done for Figure 3d of the LABELIT paper. The most probable candidate beam positions are also ranked and listed. If there are more than 1 highly ranked candidates, LABELIT always autoindexes with each candidate, and chooses the one which gives the lowest rmsd score (observed - predicted) when predicting the diffraction pattern. If there is close contention between different candidate beam centers, the user should use the contour map to judge whether the autoindexing solution can be believed, or if it is necessary to manually experiment with other beam centers (see next paragraph). The name of an output file can be given, e.g., beamplot_pdf_file = '/home/user/public_html/plot.pdf' creates a pdf document in the user's html document directory.
beam_search_scope = 4.0
Bayesian beam search radius in mm. After examining the probability map above, the user may want to manually select a particular beam candidate. This is useful in rare cases (<2% of the time) where LABELIT's rigorous beam search fails to rank the candidate beam centers in the correct order. To choose a different beam center, use autoindex_override_beam to set the coordinates of the desired beam candidate, and reduce the size of the beam_search_scope, so only one beam candidate is within the radius. The radius must be >= 0.1 mm. LABELIT has not been tested with radius values greater than 4 mm.
overlapping_spot_criterion = 1.2
As of LABELIT v0.974, it is no longer necessary to fiddle with this parameter. If more than a certain percentage of spots overlap, LABELIT now uses a special procedure to check if the pattern of spots seems like it arises from a large unit cell axis. The spots are accepted for autoindexing. Special MOSFLM keywords are given to adjust for close spots. A check is also run to reject spots that look like they are unresolved aggregates of several Bragg reflections. In versions of LABELIT prior to v0.974, this parameter is useful for judging whether two candidate Bragg spots are too close together to be safely used for autoindexing. Normally the centers of the two spots must be more than 1.2 times the major axis of the largest spot, when the spots are modelled as ellipses. This conservative default becomes problematic for samples with very large unit cells, where the detector has not been pulled back far enough to cleanly separate the spots. If all pairs of close spots have been thrown out, autoindexing will never correctly identify the largest of the three unit cell dimensions. This situation can be detected by carefully looking at the pattern of colored spots in the overlay_distl plot (see Graphical Output below). If close-lying spots are never colored blue or green, it is perfectly reasonable in such cases to set the criterion as low as 0.7 or 0.8.
distl_lowres_limit = 50.0
Low resolution limit, in Angstroms, used by DISTL when choosing spots for autoindexing. Reasons the user might want to set this to a lower number include: 1) to eliminate artifacts due to scatter off the beamstop; 2) to eliminate extremely bright low-resolution spots from well diffracting crystals such as lysozyme; and 3) to eliminate low resolution diffuse scatter which interferes with autoindexing. Alternatively, the user might need to set this to a higher number to include extremely low resolution spots from crystals with unusually large unit cells. In most cases the default value will work.
distl_highres_limit = None
High resolution limit, in Angstroms, used by DISTL when choosing spots for autoindexing; by default there is no limit. However, in a small percentage of cases the user will notice that at high resolutions Bragg spots seem to form a distorted lattice (this can be seen by inspecting the overlay_distl plot; see Graphical Output below). In such cases, use this parameter to place a limit on spots that are used for autoindexing.
distl_maximum_number_spots_for_indexing = 300
Even though many hundreds of Bragg spot candidates may be considered "good" for autoindexing, DISTL chooses the brightest 300 by default. In certain images taken from crystals with large unit cells (see overlapping_spot_criterion above), closely spaced spots may not be among the brightest reflections. In extreme cases, the largest unit cell dimension can be missed by the autoindexing procedure. This situation can be recognized by inspecting the overlay_distl plot (see Graphical Output below). If there are closely spaced blue spots (the weaker ones) but no closely spaced green spots (the brightest ones selected for autoindexing), then this parameter must be increased.
distl_minimum_number_spots_for_indexing = 40
Normally, DISTL requires that each image contain 40 good Bragg spot candidates, or autoindeinxg will be aborted. However, in a small percentage of cases, otherwise well-behaved crystals may have small unit cell dimensions and diffract out only to low resolution, resulting in fewer than 40 spots on an image. This parameter may be decreased in such circumstances.
lepage_max_delta = 1.4
Angular tolerance in degrees for detecting symmetry elements to list out the candidate Bravais lattices. The user is referred to the LABELIT paper and to LePage J. Appl. Cryst. 15:255(1982) for a full description. The concern here is that if this parameter is too small, potential symmetry may be missed. We have another web site where the user can experiment with different values. The 1.4 degree default has proven to be high enough in all cases examined so far; other parameters such as the beam position are more important for determining correct Bravais symmetry. The user should remember that it is always possible that candidate Bravais symmetry will turn out to be false once the data are integrated and scaled; therefore LABELIT always outputs a list of all possible subgroups.
rmsd_tolerance = 2.0
For listing subgroups (see paragraph above) this parameter gives the permitted ratio between model rmsd and the triclinic rmsd. For example, if the tetragonal model has rmsd=0.9mm and the triclinic model has rmsd=0.1mm the tetragonal candidate model will not be listed, even if it's LePage delta falls within the permitted range. It is unlikely that the user will be interested in changing this parameter.
difflimit_sigma_cutoff = 0.75
After MOSFLM integration, the intensities of unmerged fulls and partials are analyzed to determine the resolution limit. The I/sigma cutoff used for this calculation is 0.75 by default.
distl_profile_bumpiness = 2
When DISTL chooses candidate Bragg spots, this is maximum number of local maxima allowed in a diffraction peak which is to be considered a valid spot. For almost all synchrotron-based datasets, the spot profile is smooth and bell-shaped, so it is fine to filter on 1 or 2 local maxima. However, some images collected on home sources equipped with Nickel mirrors show very detailed structure within the spot profile, requiring this parameter to be set as high as 10.

Compatible Indexing

The need for consistent indexing. How many times have you wanted to index a new dataset so that the unit cell is aligned with a previously solved isomorphous structure, or a previously collected dataset? Perhaps you are screening 100 crystals for bound ligands or heavy atoms. Routine autoindexing procedures do not guarantee consistency. Here are two common pitfalls that might lead to a need for reindexing:
  1. The indexing of a P1 cell where a~b and alpha~beta. Alignment with a pre-existing dataset might require reindexing as -b,-a,-c, leading to a non-standard setting. This could actually happen with any cell where the cell dimensions are close to matching a higher-symmetry metric.
  2. An ambiguous axis direction. For example, on the c-axis of a tetragonal or hexagonal cell, (for point groups 4/m or 6/m), the macromolecule is aligned along the c-axis in a particular direction, which is not known ahead of time from the diffracted lattice.
Even if there is no reindexing to be done, the autoindexing process only gives the metric (the cell dimensions), and the user needs to type in the space group separately. Why can't all this information be provided to the autoindexing program in the form of command line input?

A new LABELIT feature. Now there is help. LABELIT versions 1.000wcpcw and higher give a command-line option to input a PBD structure file prior to indexing. Structure factors are calculated, taking the PHENIX bulk solvent correction into account. Labelit 1.1.8/Phenix 1.6.1 and higher also allow the input of observed X-ray intensities or structure factor amplitudes instead of a pdb file.

Integration is done with one or two frames of the raw data, using a preliminary triclinic model. Afterwards, all possible reindexing transformations are considered to produce the best scaling between data and model. This is all done automatically before the final indexing solution is printed out.

Example output. Using the normal indexing procedure, analysis of an example dataset produces the Patterson group with the highest possible symmetry, I 4/mmm, and lists all metrics that are subgroups of I 4/mmm. The actual space group is I 4:

% labelit.index tm1347_mpd_1_###.img 1 90

LABELIT Indexing results:
Solution  Metric fit  rmsd  #spots  crystal_system   unit_cell                                  volume
:)   9     0.1804 dg 0.194    460    tetragonal tI  120.18 120.18 144.29  90.00  90.00  90.00  2083945
:)   8     0.1804 dg 0.188    435  orthorhombic oF  144.33 169.80 170.03  90.00  90.00  90.00  4166840
;(   7     0.1804 dg 0.263    453    monoclinic mC  144.38 169.71 111.50  90.00 130.31  90.00  2083292
:)   6     0.1588 dg 0.148    440    monoclinic mC  144.28 170.05 111.44  90.00 130.26  90.00  2086598
:)   5     0.0857 dg 0.184    459  orthorhombic oI  120.24 120.29 144.18  90.00  90.00  90.00  2085478
;(   4     0.0857 dg 0.250    466    monoclinic mC  170.18 144.11 120.17  90.00 134.93  90.00  2086454
:)   3     0.0627 dg 0.159    454    monoclinic mC  187.74 120.19 120.44  90.00 129.77  90.00  2088961
:)   2     0.0669 dg 0.170    456    monoclinic mC  187.73 120.23 120.23  90.00 129.80  90.00  2084759
:)   1     0.0000 dg 0.153    440     triclinic aP  111.46 111.56 111.57  99.39 114.61 114.81  1044409

MOSFLM Integration results:
Solution  SpaceGroup Beam x   y  distance  Resolution Mosaicity RMS
:)   9           I4  94.12  99.54  199.94       2.43    0.700000    0.057
     1           P1  94.12  99.55  200.02       2.43    0.700000    0.058
With the new command line option, eight reindexing possibilities are considered before picking the correct direction of the fourfold. Also, all subsequent processing steps (such as labelit.mosflm_scripts for data integration) will explicitly use the provided space group, I4. Only three space groups are printed in the output, I4 and its two proper subgroups:
% labelit.index tm1347_mpd_1_###.img 1 90 compatibility_file=1vrd.pdb

LABELIT Indexing results:

Comparing observed data (Frames 1, 90)...
to reference model 1vrd.pdb

There are 8 reindexing choices with compatible cells, of which 4 cluster around the lowest Rscale
Best overall Rscale factor is 30.06% out to resolution  3.70 Angstroms

Solution  Metric fit  rmsd  #spots  space_group      unit_cell                                  volume
:)   3     0.1804 dg 0.187    463   tI          I 4 120.19 120.19 144.32  90.00  90.00  90.00  2084927
:)   2     0.0857 dg 0.255    460   mC      C 1 2 1 169.84 144.33 120.07  90.00 134.91  90.00  2084708
:)   1     0.0000 dg 0.292    446   aP          P 1 111.57 111.62 111.46  99.37 114.61 114.87  1044409

MOSFLM Integration results:
Solution  SpaceGroup Beam x   y  distance  Resolution Mosaicity RMS
:)   3           I4  94.12  99.54  199.96       2.43    0.700000    0.053
     1           P1  94.12  99.55  200.02       2.43    0.700000    0.058
Note: as presently implemented, the compatible indexing algorithm requires back-to-back images when comparing raw data to input model. In the above example, indexing is requested on frames 1 and 90, so the comparison algorithm actually requires frames 1,2,90 and 91 to be present on disk. An error message is generated if frames 2 and 91 are not present.

Input of a structure factor file:

% labelit.index tm1347_mpd_1_###.img 1 90 compatibility_file=truncated.mtz [compatibility_column_label=label]
% # column labels IMEAN I F are OK by default, without specifying the label.

Matching the Indexing Solution to a Known Orientation

Consistent indexing on the same crystal. The above section discusses consistent indexing across multiple crystals. This section deals with the problem of deriving consistent orientation matrices from multiple datasets from the same crystal. Experiments giving rise to same-crystal multiple datasets include:
  1. Repeating the phi rotation at different wavelengths
  2. Repeating the rotation with inverse beam geometry
  3. Assembling a complete dataset from wedges covering a discontinuous phi range
  4. Minimizing radiation damage by collecting small wedges from different crystal positions
  5. Intentionally inducing radiation damage and recollecting the dataset
Motivation. There are many reasons why we might want to assign the exact same unit cell basis vectors (a, b, and c) over multiple datasets. Possible scenarios include:
  1. Consistent application of an absorption correction
  2. If the symmetry of the metric is higher than the symmetry of the space group, then it may be necessary to use a non-standard setting for the second data wedge, in order to merge correctly with the first dataset. For example if a triclinic crystal has a~b and alpha~beta, reindexing as -b,-a,-c may be required. A similar situation would apply for an orthorhombic cell with a~b.
  3. For polar space groups (such as tetragonal or hexagonal) the c-axis direction must be chosen consistently in order to merge two data wedges.
General Approach. The approach taken is to index the first dataset normally, and to store the resulting orientation matrix in the file 'crystal_orientation' in the current working directory. The second dataset is indexed normally, then the new orientation matrix is aligned with the stored solution (by briefly considering every possible reindexing matrix). The alignment is done at the level of the triclinic basis, before the metric symmetry is added. For crystals of higher symmetry, we first align the triclinic (aP) setting. Once this triclinic orientation is aligned with the same-crystal reference wedge, all possible subgroup settings are guaranteed to align with the reference wedge solutions.

Usage:

   labelit.index [first wedge dataset]
   labelit.store_crystal_orientation
   labelit.index [second wedge dataset, to be aligned with the first>]
   labelit.reset # erases the alignment matrix

Example output. We consider two possible scenarios. The first involves a single triclinic crystal used for PDB entry 2rh0 (Joint Center for Structural Genomics), which has pseudo-mC metric symmetry.

 
% labelit.index 2rh0/data/jcsg/aps3/GMCA_23ID_D/20070819/collection/13542905/58046/58046_1.#### 1 90

LABELIT Indexing results:
Beam center x  145.06mm, y  148.58mm, distance  250.04mm ; 80% mosaicity=0.80 deg.

Solution  Metric fit  rmsd  #spots  crystal_system   unit_cell                                  volume
:)   2     0.2494 dg 0.314    328    monoclinic mC  101.77  77.56  44.23  90.00 101.01  90.00   342734
:)   1     0.0000 dg 0.394    379     triclinic aP   44.26  63.81  64.04  74.62  81.31  81.19   171168

% labelit.store_crystal_orientation

% labelit.index 2rh0/data/jcsg/aps3/GMCA_23ID_D/20070819/collection/13542905/58046/58046_2.#### 1 90
Based on stored crystal orientation, reindexing a',b',c' = -a,-c,-b
New dataset shifted 0.00 degrees around the rotation axis with respect to the stored orientation

LABELIT Indexing results:
Beam center x  145.10mm, y  148.62mm, distance  249.88mm ; 80% mosaicity=0.90 deg.

Solution  Metric fit  rmsd  #spots  crystal_system   unit_cell                                  volume
:)   2     0.0951 dg 0.292    427    monoclinic mC  102.05  77.60  44.21  90.00 100.87  90.00   343808
:)   1     0.0000 dg 0.386    438     triclinic aP   44.21  64.17  64.07  74.50  81.37  81.36   171978

...Notice that the second dataset has a non-standard setting, which nonetheless is aligned with the first indexing solution. The small differences in the cell dimensions between the two datasets are attributable to experimental uncertainty in the cell measurement.

Note the statement that the new dataset is shifted "0.00" degrees around the rotation axis. This indicates that the orientation is known very precisely (because two images are used for indexing, related by a 90-degree rotation). But the second dataset can still be aligned even if the orientation is less-precisely known. For example, if only one oscillation image is used for indexing, it is not uncommon to see the second dataset shifted as much as +/- 1.0 degrees from the first dataset.

Our second scenario involves a synchrotron beamline with a robot automounter. The crystal is mounted once, two images are acquired, the crystal is placed back in the storage Dewar, and finally is remounted again at a later time. The goniometer is set to the same position for each crystal mounting, but the mounting pin has slipped the second time with respect to the first trial. A different algorithm is used for aligning the orientation matrices; this time we try each possible reindexing transformation, and at the same time allow a free rotation around the axis (usually phi). It is not possible to know the alignment with certainty (consider, for example, if a tetragonal crystal has its c axis aligned perfectly with phi giving four possible positions). However, if the crystal is randomly orientated, then usually there is only one solution possible, and this is reported as follows:

% labelit.index 1vrd/data/jcsg/als1/5.0.3/20011020/TM1347/T3879/tm1347_mpd_1_###.img 1 301

LABELIT Indexing results:
Beam center x   94.14mm, y   99.27mm, distance  199.97mm ; 80% mosaicity=1.10 deg.

Solution  Metric fit  rmsd  #spots  crystal_system   unit_cell                                  volume
:)   9     0.1420 dg 0.141    508    tetragonal tI  120.39 120.39 144.32  90.00  90.00  90.00  2091740
:)   8     0.1390 dg 0.141    508  orthorhombic oI  120.40 120.38 144.32  90.00  90.00  90.00  2091745
:)   7     0.1420 dg 0.154    514  orthorhombic oF  144.33 170.24 170.26  90.00  90.00  90.00  4183629
:)   6     0.1390 dg 0.190    507    monoclinic mC  170.25 144.33 120.39  90.00 135.00  90.00  2091817
:)   5     0.1420 dg 0.122    502    monoclinic mC  144.23 170.30 111.41  90.00 130.20  90.00  2090137
:)   4     0.1184 dg 0.132    507    monoclinic mC  187.86 120.40 120.41  90.00 129.77  90.00  2093408
:)   3     0.0905 dg 0.240    522    monoclinic mC  187.68 120.45 120.36  90.00 129.79  90.00  2090832
:)   2     0.0417 dg 0.188    507    monoclinic mC  144.36 170.23 111.54  90.00 130.23  90.00  2092834
:)   1     0.0000 dg 0.145    494     triclinic aP  111.52 111.59 111.63  99.39 114.71 114.65  1046659

% labelit.store_crystal_orientation

% labelit.index 1vrd/data/jcsg/als1/5.0.3/20011020/TM1347/T3879/tm1347_mpd_1_###.img 2 302

Based on stored crystal orientation, reindexing a',b',c' = -b,-a,-c
New dataset matches the known orientation with a -4.88 degree slip around the rotation axis.
This is a possible outcome if the crystal has been remounted (see LABELIT documentation).

LABELIT Indexing results:
Beam center x   94.15mm, y   99.29mm, distance  199.94mm ; 80% mosaicity=1.20 deg.

Solution  Metric fit  rmsd  #spots  crystal_system   unit_cell                                  volume
:)   9     0.0632 dg 0.155    505    tetragonal tI  120.41 120.41 144.19  90.00  90.00  90.00  2090412
:)   8     0.0632 dg 0.154    503  orthorhombic oI  120.40 120.41 144.18  90.00  90.00  90.00  2090312
:)   7     0.0563 dg 0.155    507  orthorhombic oF  144.25 170.21 170.29  90.00  90.00  90.00  4181375
:)   6     0.0562 dg 0.197    505    monoclinic mC  187.83 120.42 120.40  90.00 129.86  90.00  2090506
:)   5     0.0632 dg 0.215    509    monoclinic mC  187.76 120.40 120.40  90.00 129.84  90.00  2089772
:)   4     0.0546 dg 0.185    502    monoclinic mC  170.22 144.24 120.39  90.00 134.99  90.00  2090543
:)   3     0.0563 dg 0.188    501    monoclinic mC  144.25 170.21 111.58  90.00 130.26  90.00  2090585
:)   2     0.0175 dg 0.125    496    monoclinic mC  144.18 170.31 111.44  90.00 130.23  90.00  2089360
:)   1     0.0000 dg 0.153    495     triclinic aP  111.56 111.47 111.57  99.46 114.71 114.65  1044661
It is not necessary to specify at the command line which scenario applies. If no alignment is found with the first scenario, then the search is automatically repeated allowing for slippage around the phi axis.

Frequently Asked Questions About labelit.index

I have some extremely weak images, with definite albeit small (3-6 pixel) spots, for which DISTL reports:

No_Indexing_Solution: Too few unimodal Bragg spots (20) in image 1

Answer: Here are a few things to try (try them one at a time, not necessarily together).
  1. On the command line, specify distl_permit_binning=False. Normally, images larger than 4000x4000 pixels are binned 2x2 to save time, but this will ruin your spot detection if the spots are very weak.
  2. On the command line, override the default setting for minimum spot area:

    distl.minimum_spot_area=10

    By default DISTL ignores spots less than 10 pixels in area (for image plates and CCD detectors), or less than 5 pixels in area (for the Pilatus pixel array detector). You might want to try a smaller cutoff to pick more spots. If you choose a number too low it will just pick noise.
  3. Also on the command line, you can fine-adjust the cutoff level that distinguishes signal from background. By default, DISTL calculates the mean and sigma of the background, and then treats anything as signal that is more than 1.5 sigmas over background (CCDs) or 2.5 sigmas over background (Pilatus):

    distl.minimum_signal_height=1.5

    This level can carefully be adjusted downward if you want more sensitive spotpicking, but it is futile to pick values below about 1.2 sigmas.

My crystals have one fairly long cell axis (c = 506 A). Although the spots do run into each other, the maxima are far enough that it should be possible to pick enough good Bragg candidates to autoindex. I tried with MOSFLM and it worked, but the spotfinder component of labelit.index rejects most spots as having multiple maxima and I get No_Indexing_Solution.

Answer: First, you are using a MarCCD 4096 x 4096 pixel array. By default, anything greater than 4Kx4K gets 2x2 binned by the spotfinder. Although this does not harm you in this case, binning could conceivably hurt you for large cell work. So I recommend turning it off for virus work by adding the command line argument, distl_permit_binning=False.

This can go on the command line, or the dataset_preferences.py file!

Having done that, the key trick for your dataset is to notice that the spots on the 500A c-axis aren't really separated down to the baseline (as you mentioned), leading spotfinder to reject all spots that are closely spaced on a c-axis line. The solution is to raise the bar as to what spotfinder considers to be an acceptable baseline. By default the cut off is 1.5 sigma (CCDs) or 2.5 sigma (Pilatus), where sigma is the rmsd deviation of local pixels away from the best-fit background plane. If the cutoff is changed to 5.0, your pattern can be indexed. On the command line:

distl.minimum_signal_height=5.0 ...this raises the background cutoff.

Can LABELIT use more than two images for indexing, and if so is there a significant advantage to doing so?

Answer: For most image plate and CCD data, two images give an optimal indexing solution. The original LABELIT paper recommends using images spaced widely apart in rotation angle, e.g., 90 degrees.

In rare cases, two images may not be enough to completely sample the lattice. The user can specify more than two images on the command line, but LABELIT must be instructed to accept this input by the placment of a keyword on the command line or in the dataset_preferences.py file:
wedgelimit = [maximum_permissible_number_of_images]
By default, the maximum permissible number of images that can be specified on the command line is two.

The wedgelimit keyword should not be necessary in most cases. If a clear example is found requiring more than two images for indexing, this should be brought to the attention of the program authors.

LABELIT intentionally prohibits the use of images that are back-to-back in rotation angle, such as a 0-to-1 and a 1-to-2 degree image. This is because the program does not have any algorithm to distinguish between the same Bragg spot partially recorded on consecutive images, versus two different Bragg spots from adjacent lattice positions. Generally things work fine with this restriction. Possible exceptions might include data sets that are finely sliced in phi. In some such datasets, there may not be sufficient numbers of Bragg spots on each image for indexing. Such datasets should be brought to the attention of the program authors for further investigation.

What does the "xx" in front of the solution number in the indexing table mean? I used to see only ";(" or ":)". Is this something new? The program output is:

LABELIT Indexing results:
Beam center x  162.65mm, y  162.38mm, distance  299.94mm ; 80% mosaicity=0.70 deg.

Solution  Metric fit  rmsd  #spots  crystal_system   unit_cell                                  volume
xx   9     0.8778 dg 0.238    137    tetragonal tP   83.28  83.28  81.36  90.00  90.00  90.00   564268
xx   8     0.8778 dg 0.243    135  orthorhombic oC  117.75 117.87  81.37  90.00  90.00  90.00  1129382
xx   7     0.8778 dg 0.241    135    monoclinic mC  117.75 117.86  81.34  90.00  89.98  90.00  1128850
xx   6     0.8770 dg 0.241    136    monoclinic mC  117.86 117.75  81.33  90.00  89.96  90.00  1128741
:)   5     0.0730 dg 0.105    137  orthorhombic oP   81.27  82.65  83.91  90.00  90.00  90.00   563641
:)   4     0.0730 dg 0.105    135    monoclinic mP   81.27  82.64  83.91  90.00  89.98  90.00   563528
:)   3     0.0488 dg 0.098    136    monoclinic mP   82.64  81.27  83.94  90.00  89.94  90.00   563776
:)   2     0.0587 dg 0.094    133    monoclinic mP   81.28  83.91  82.65  90.00  89.95  90.00   563745
:)   1     0.0000 dg 0.092    133     triclinic aP   81.29  82.65  83.92  89.94  89.98  89.95   563827

MOSFLM Integration results:
Solution  SpaceGroup Beam x   y  distance  Resolution Mosaicity RMS
:)   5                 P222 162.73 162.76  300.36       3.41    0.700000    0.201
     1                   P1 162.71 162.75  300.48       3.42    0.700000    0.194
Answer: The Bravais lattice solutions are classified either as likely ":)", unlikely ":(", or very unlikely "xx". The critical thing to understand is that lattice symmetry is not measured, it is an abstract constraint that is imposed, if the unit cell measurements and integrated intensities fall within a close enough tolerance. For labelit.index we do not have reduced intensities yet; the output table presents best guesses based solely on Bragg spot positions. Previously, the "very unlikely" lattice settings were not printed at all; these are defined as settings for which
rmsd(setting) > rmsd_tolerance * rmsd(triclinic) 
where rmsd (in mm) is root-mean-squared-deviation (observed vs. predicted position) of well-fitting spots, and rmsd_tolerance is a parameter that can be defined on the command line or in the "dataset_preferences.py" file (default=3.5; prior to Nov 2010 the default was 2.0). It was reluctantly realized that no single cutoff value could adequately decide the true lattice symmetry in all cases, thus it was decided to print the "very unlikely" lattice settings along with the others. In the case shown, it is likely that the lattice is orthorhombic (oP), but a tetragonal solution cannot be ruled out at this stage. Indeed, it is possible to impose cubic symmetry by distorting the observed lattice by 1.83 degrees. This can be discovered with the command
iotbx.lattice_symmetry --unit-cell=81.29,82.65,83.92,89.94,89.98,89.95 P
However, from experience with macromolecular indexing, no observed lattice is ever distorted from the true geometry by more than 1.4 degrees. Therefore, for printing out the table we impose a strict limit on the metric fit:
Metric fit (degrees) < lepage_max_delta
where lepage_max_delta is another "dataset_preferences.py" parameter, 1.4 degrees by default.


Filing a bug report: labelit.bugreporter

If autoindexing fails and there is a suspicion that there may be a program bug, a detailed bug report can be prepared and emailed to the program authors.  This command has the same syntax as labelit.index, so for example labelit.bugreporter /home/data 1 90 will try to index frames 1 and 90 in the directory /home/data.  A printout detailing every line of executed python code will be redirected to the file bugreport.  This very large file (typically tens of megabytes) can be gzip'ed and mailed to the developers.

-----------------Graphical Output--------------------

labelit.overlay_distl
labelit.overlay_index
labelit.overlay_mosflm


Overlay_distl: Produce a PNG file showing the input spots for indexing. Blue and green spots are both considered good Bragg spot candidates. Greens are the brightest candidates selected for autoindexing. Command will only work after the user runs labelit.index because it requires the DISTL_pickle file containing spot information.

Overlay_index: On top of the overlay_distl image, show an additional overlay of spots predicted by the LABELIT indexing model. Command will only work after the user runs labelit.index. If labelit.index is run with the --index_only flag, resolution limit is taken from the DISTL analysis, otherwise it is taken from MOSFLM integration. Purple spots are predicted fulls, yellow spots are predicted partials

Overlay_mosflm: Instead of showing the LABELIT model, show the spots actually integrated during the mosflm step. Purple spots are predicted fulls, yellow spots are predicted partials

Syntax for all three commands is as follows:

labelit.overlay_#### [-large] <single input image file name> <output png file name>

By default the original image is binned in 2x2 blocks to produce the picture, thus saving computational time. This will generally yield useful pictures. However, in cases with very large unit cells, it is advisable to disable binning by giving the -large qualifier, so that close Bragg spots can be easily seen.

The -vector qualifier for overlay_index shows the difference vector from observed to calculated spot coordinates. The length of the vector is exaggerated 200x.

Colors used in these overlays may be changed by customizing the site-wide preferences file located in src/labelit/labelit/site_preferences.py.  The contents are self-explanatory.  Note that separate colors may be specified for fully-recorded and partially-recorded reflections, as is the custom in MOSFLM.

-----------------Housekeeping--------------------

labelit.reset

All temporary and result files will be erased.

-----------------Scaling and Space Group Determination--------------------

labelit.rsymop

Bravais symmetry determined with labelit.index is only based on the unit cell dimensions. Finding the true symmetry requires subsequent scaling of the intensity data. Once a partial dataset has been collected, the labelit.rsymop script can determine the Laue class and Patterson symmetry. Data integration and scaling are delegated to MOSFLM and SCALA. The facility is only available in the downloaded version, not as a web service; primarily because it would be impractical for remote users to upload entire datasets to the LABELIT web server.
Requirements:
LABELIT installation
CCP4 
MOSFLM (aliased to "ipmosflm")
SCALA
REINDEX

Synopsis:
The user collects a dataset, or a partial dataset.  The diffraction 
pattern is indexed, and data are integrated assuming a P1 spacegroup.  The
integrated Bragg spots are then analyzed under all Patterson symmetries to 
find the highest symmetry consistent with the data.  To avoid deducing the 
wrong symmetry under some circumstances, we use the Rsymop statistic (Sauter,
Grosse-Kunstleve & Adams [2006], J. Appl. Cryst., 39, 158-168) instead 
of the traditional Rmerge.  Finally, systematic extinctions are inspected 
to determine if screw axes can be found.

Primer:
As with other labelit commands, a current working directory should be 
set up which may or may not contain the detector data.  To  
index the diffraction pattern use a command like this:

labelit.index /net/adder/raid1/sauter/rawdata/alpha_lytic/ 1 91

This syntax asks labelit to look in the directory 
/net/adder/raid1/sauter/rawdata/alpha_lytic/ for any diffraction images with
image numbers 1 and 91, and use these to index the diffraction pattern.  It
is generally recommended to use frames 90-degrees apart.  FOR SYNCHROTRON 
DEVELOPERS, it would make sense for the data collection controls to have a
default where the 90-to-91 degree image is collected first, e.g., in file
alpha_lytic_1_091.img, and then revert back to phi=0 to start collecting image
number 1.  

Next, a script is called to determine the spacegroup:

labelit.rsymop 1 24

This syntax assumes that the previously determined indexing solution is to be 
used as a starting point, but that frames will only be integrated from 
nubmers 1 to 24.  This allows symmetry to be determined while the data are
still being acquired, and it should take on the order of 2 minutes.  

The following steps are performed by labelit.rsymop:
1. An attempt is made to postrefine the indexing solution using MOSFLM.
   This is done in the triclinic setting since at this point the spacegroup
   is not known.  However, with the lack of higher symmetry it is difficult 
   to postrefine, particularly if the angular wedge is much smaller than
   90 degrees.  Therefore the following provisions are made:  
   Postrefinement is attempted using two 3-degree data wedges (containing
   at least 3 frames each).  If the dataset contains more than 93 degrees of
   rotation then wedges are chosen 90 degrees apart; otherwise, 
   data are chosen from the beginning and end of the oscillation range.  
   If there are insufficient frames or degrees of data, the LABELIT model
   is used without further refinement.  Also, if the model diverges more 
   than 2% away from its starting point, we again revert to the LABELIT model.
   
2. MOSFLM is used to integrate the partial dataset, assuming a triclinic 
   setting.  For integration, the high-resolution cutoff is set to a value
   beyond the expected resolution limit of the data, so that weak reflections
   can be properly included.  Then a new analysis of the resolution cutoff
   is performed based on the integrated intensities, so that scaling can 
   be done with a conservative cutoff of 5-sigma.  The 5-sigma cutoff is
   determined on resolution bins combining partials and fulls.
   
3. The data are scaled with the program SCALA in the highest-possible Patterson 
   group consistent with the unit cell dimensions (the "metric supergroup").
   For the example given below this is P 6/m m m. The data are not merged after 
   scaling.  The Rsymop statistics are then calculated (Sauter, submitted).
   Also, presumptive Rmerge statistics are calculated for each possible 
   Patterson group ("presumptive" means that we presume to calculate merging
   statistics for a subgroup even though the data are scaled in the supergroup.
   This is usually a justified).  The presumptive Rmerge is listed in the 
   "Rmp" column in the example below.
   
4. The likeliest Patterson group is chosen. The data are rescaled and merged in
   this group, and then the merged intensities are checked for systematic
   absences to indicate screw axes.
   
A typical final output is shown here:

Symmetry Operator     Nobs Rsymop
          x,y,z       3520   3.0%
     -x,-x+y,-z       1158   3.1%
         y,x,-z      30214   3.3%
      -x+y,-x,z       2288   3.5%
       -y,x-y,z       2288   3.5%
      x-y,-y,-z       5116   3.9%
        x-y,x,z       2541  46.6%
       y,-x+y,z       2541  46.6%
       x,x-y,-z       2588  47.6%
      -x+y,y,-z       3998  48.6%
       -y,-x,-z      31860  50.0%
        -x,-y,z      33744  50.7%

Soln  Patterson       Operators_in_group max(Rsymop)    Rmp SpaceGroups
  12       P -3             S..SS.......        3.5%   3.1% P3 P31 P32 
  12   P -3 m 1             SSSSSS......        3.9%   3.5% ;(P321 :)P3121 :)P3221 
  12   P -3 1 m             S..SS...SSS.       50.0%  42.4%
  12      P 6/m             S..SS.SS...S       50.7%  45.5%
  12  P 6/m m m             SSSSSSSSSSSS       50.7%  44.1%
  11    C m m m             S.S.......SS       50.7%  44.1%
  10  C 1 2/m 1             S.S.........        3.3%   3.5% C121 
  09    C m m m             SS......S..S       50.7%  47.8%
  08  C 1 2/m 1             S.......S...       47.6%  10.7%
  07    C m m m             S....S...S.S       50.7%  46.4%
  06  P 1 2/m 1             S..........S       50.7%  48.5%
  05  C 1 2/m 1             SS..........        3.1%   3.0% C121 
  04  C 1 2/m 1             S........S..       48.6%  10.6%
  03  C 1 2/m 1             S.........S.       50.0%  43.3%
  02  C 1 2/m 1             S....S......        3.9%   3.2% C121 
  01       P -1             S...........        3.0%   3.0% P1 
  
The Solution numbers correspond to those Bravais groups previously identified
by LABELIT.  Note in this example that for the hexagonal-primitive Bravais 
setting (solution 12), there are 5 possible Patterson symmetries.  The 
"Operators_in_group" column shows which symmetry operators are present in a
particular subgroup.  For example, the P -3 subgroup has operators #1,#4 & #5
that are listed in the upper table.  These three operators have Rsymop values
of 3.0%, 3.5% and 3.5%, so "3.5%" is listed in the "max(Rsymop)" column.  
Subgroups with a reasonably low max(Rsymop) are considered to be potential
Patterson symmetries, and these are annotated under the "SpaceGroups" column,
where all possible enantiomorphic space groups are listed. 

Furthermore, it is reasonable to conclude that the true Patterson 
symmetry is the highest symmetry consistent with the data.  Here, the 
"P -3 m 1" symmetry is the highest group with a reasonable max(Rsymop), and 
therefore the 3 spacegroups P321, P3121, and P3221 have flag-notations 
next to them.  The SCALA output is scanned to determine if enough systematic
absences have been detected to make definitive conclusions about screw axes.
In this particular case, it is possible to conclude that the space group is 
either P3121 or P3221 --thus the smiley face :)-- and not P321, thus the 
frowning face ;(.  Note that it is not possible to choose between P3121 
and P3221 until a molecular model is fit to the electron density.

A list of all possible Patterson symmetries, along with space groups and 
corresponding systematic extinctions can be printed with the command

labelit.rsymop --reference

Note that in some cases, the crucial axis for determining systematic absences
may lie near the spindle, and therefore these data will be missing 
from the current partial dataset.  Indeed, it may be the case that no amount 
of phi rotation will bring this axis into the sphere of reflection; 
it may be necessary to re-orient the crystal using a different camera axis.
If axial data are missing, it will not be possible to distinguish whether
the crystal symmetry contains a screw axis.  In cases such as this, 
all space groups that are still consistent with the partial dataset 
are indicated by question marks as follows:  ??P121 ??P1211
This means that the true space group is either P2 or P21.

In cases where only a few frames are scaled, there may be insufficient 
angular coverage to calculate an Rsymop for a particular symmetry operator.
It is possible to know this by looking for Nobs values <5, in which case
the Rsymop will be listed as "0.0%".  Under the "Operators_in_group" column,
operators with too few observations are listed with lower case "s".  

The rsymop command takes several minutes to execute.  The final
output table is printed to the screen, not to a file.  To regenerate this
table again a second time (using the output files from the previously run
job) one can issue the command

labelit.stats_rsymop

Note that all results from indexing and space group determination can 
be permanently deleted from the directory by typing

labelit.reset
   

Frequently Asked Questions About labelit.rsymop

I want more control over the resolution cutoff used to calculate Rsymop.

Answer: The following defaults can be overridden in the dataset_preferences.py file:
rsymop_integration_permissible_resolution = None
Before calculating Rsymop statistics, the dataset is postrefined and integrated with MOSFLM. The outer resolution cutoff used for this step is normally the outer resolution determined by labelit.index (defined as the resolution shell where the average I/sigma ratio for all fulls and partials drops to 0.75). If the user wishes to limit the Rsymop command to a lower resolution, then the rsymop_integration_permissible_resolution can be specified in Angstroms.
rsymop_statistics_sigma_cutoff = 5.0
As described above, a resolution cutoff is applied prior to scaling and determining the Rsymop statistics. Resolution shells are thrown out where the average I/sigma ratio drops below 5.0. The rsymop_statistics_sigma_cutoff parameter overrides this value.


Other usage


The user should note that since LABELIT is a toolbox, there are functions and scenarios that can be exposed at the Python script level beyond what is shown here, but these are not yet documented.