LABELIT User Manual

The purpose of this document is to describe the function of LABELIT beyond what is described in the paper, to help the user understand the program output. Most information here is applicable to both the web version and the downloadable command-line version; more details on the command-line version may be found in the usage primer.
 

Input Files

The best autoindexing can be obtained by collecting two images at phi settings that are widely spaced.  At ALS, we typically use 1° oscillation shots collected at 0° and 90°.  The program will not currently accept images closer than 4° in phi.

Here are the presently supported detector formats:

  • ADSC Quantum x
  • Mar CCD
  • MarIP
  • Raxis Image Plate (Raxis-HTC; Raxis-IV; partial support for Raxis-II)
  • Rigaku Saturn 92 CCD
  • MacScience DIP 2030b
  • Crystallographic Binary Format (sourceforge page)
  • SLS Pilatus-6M miniCBF
  • Help in supporting other formats would be welcomed

  • All parameter input is obtained from the file header.  It is therefore important for synchrotron facilities and equipment vendors to write correct & complete information (see FAQ).

    Program Overview: Initial Spot-picking

    LABELIT begins by identifying probable Bragg spots. From these candidates, the brightest ones are chosen for the subsequent autoindexing step. Initial spot-picking tallies are not normally output to the screen, but are cached and can be viewed afterward with the command labelit.stats_distl. Here is a sample listing:

                         File : LNiGl_a3_1_001.img
                   Spot Total :   1111
          In-Resolution Total :   1069
        Good Bragg Candidates :    908
                    Ice Rings :      0
          Method 1 Resolution :   1.27
          Method 2 Resolution :   1.20
            Maximum unit cell :   75.6
    %Saturation, Top 50 Peaks :  33.66
    In-Resolution Ovrld Spots :      6
    
    Bin population cutoff for method 2 resolution: 20%
    

    Spot Total is the initial spot tally as outlined in Zhang et al., 2006. After applying a correction for the background slope, local maxima are flagged if they lie more than 2.5 standard deviations above the noise; and furthermore if the peak consists of more than 10 pixels. Peaks are rejected if they lie on possible ice-ring artifacts, as defined by sharp peaks in a histogram of pixel intensity vs. scattering angle.

    In-resolution Total is the number of peaks remaining after two additional filters. First, a histogram is created of peak count vs. scattering angle. Any sharp cusps in this histogram are considered to be possible ice-ring artifacts. The total number of Ice Rings reported is the combined number of rings detected in both the pixel intensity and peak count plots. Second, the histogram of peak count vs. scattering angle is analyzed to determine the limiting resolution of the macromolecular diffraction. This "Method 2 Resolution" limit (given in Å) is defined in Zhang et al., 2006. The cutoff point is where the histogram count falls below the Bin population cutoff, generally 20% of the bin population at low resolution.

    Several additional filters are applied to produce a list of Good Bragg Candidates. a) The spot profile is smooth. b) Outliers in skewness are rejected. c) Outliers of high intensity (5 standard deviations) are rejected. d) An analysis of nearest neighbor vectors is done. Also, a best-fit ellipse is fit to each spot. Spots with semi-major axis larger than the common nearest neighbor vector are rejected.

    The Maximum unit cell is calculated (in Å), given the nearest neighbor analysis, sample-to-detector distance, and a reasonable fudge-factor.

    %Saturation, Top 50 Peaks refers to the intensity of the brightest 50 good Bragg candidate spots, as a fraction of the detector's dynamic range. This helps the user to determine whether the exposure time is appropriate. Likewise, In-Resolution Ovrld Spots gives the number of overloaded spots, from the In-Resolution Total.

    Finally, it should be stressed that from the Good Bragg Candidates, only the brightest are selected for autoindexing. The exact number is determined by an ad hoc formula that is currently defined as follows:

    # of good Bragg candidates   # selected for autoindexing
    
               < 40              Indexing aborted; see Primer to override
               40 - 300          All spots are selected
               301 - 1500        300 spots are selected
               > 1500            20% of spots are selected
    

    Program Overview: Autoindexing

    Each image is first analyzed to select bright well-shaped Bragg spots.  A maximum of 300 are chosen from each image to use for autoindexing; for a total of 600.  If fewer than 40 good spots are found on either image, no indexing is attempted.

    After autoindexing the lattice is analyzed to correct the beam center.  If a correction of >1mm is required, a message is printed, and the data are re-examined with the new beam center:

    Correcting beam by 1.7 mm and reindexing

    Once a lattice basis has been chosen, the following twelve parameters are refined by the quasi-Newton L-BFGS method:  beam x&y, detector distance, and the xyz components of each basis vector.  No symmetry is present at this stage.  Furthermore, mosaicity values are sampled at 0.05° increments to determine the best value which can successfully model 80% of the 600 spots chosen for indexing.   If no mosaicity value less than 1.5° works, we relax the requirements and try to model 60% of the spots.  However, it is usually indicative of a poor diffraction pattern.  As currently implemented, the program does not account for detector tilt or non-zero two-theta angle.  The results of parameter refinement are reported to the screen:

    LABELIT Indexing results:
    Beam center x   94.11mm, y   99.40mm, distance  109.74mm ; 80% mosaicity=0.05 deg.

    Next we attempt to detect metric symmetry as described in the paper.  At this stage, Bravais symmetry is only a hypothesis, so a list is produced with all possible Bravais types consistent with the basis vectors within an angular tolerance of 1.4°.  Solutions are listed in order of decreasing angular distortion, listed here in the  Metric fit column, in degrees:

    Solution  Metric fit  rmsd  #spots  crystal_system   unit_cell                                  volume
    ;(   4     0.8494 dg 0.315    363    monoclinic mC  101.30  54.74  69.00  90.00 132.98  90.00   279907
    :)   2     0.0383 dg 0.181    503    monoclinic mC   87.36  73.84  54.70  90.00 127.99  90.00   278075
    :)   1     0.0000 dg 0.166    504     triclinic aP   54.68  57.12  57.14  80.50  61.97  62.01   138741

    Each of these Bravais choices is subjected to further parameter refinement with the added unit cell restraints arising from symmetry.  The rmsd column here is the root mean squared deviation of observed vs. modelled spot positions given in mm, while the #spots gives the number of Bragg spots correctly predicted by the model.  The maximum possible value is always 600 (the same spots that went in to indexing).  The model never predicts 100% of observed spots correctly because spots near the spindle are discounted.  Note that the refined unit_cell parameters are given in Å and degrees, while the unit cell volume is given in cubic Å.

    In the output above, the Solution column omits solution #3, which happened to be a different C-centered monoclinic setting in this case. The setting was omitted because the final rmsd was more than twice that for the triclinic setting, so it was deemed to be highly improbable.  Also, note that the rmsd for solution #4 is much higher than that for solutions 1 & 2, which cluster around 0.17 mm.  By this simple heuristic, solution #4 garners a frowning face ;(, while solutions 1& 2 merit a smiley face :).  The highest numbered smiley face is LABELIT's best guess.

    Next we turn to MOSFLM v6.2.3 to report statistics about resolution.  Bragg spots on both images are integrated within different Bravais settings.  Statistics are listed as follows:

    MOSFLM Integration results:
    Solution  SpaceGroup Beam x   y  distance  Resolution Mosaicity RMS
    :)   2           C2  94.10  99.50  109.79       1.42    0.050000    0.040
         1           P1  94.10  99.45  109.73       1.41    0.050000    0.039

    The Solution column corresponds to the Solution number in the previous table.  The SpaceGroup is the simplest group within this Bravais setting.  Beam x & y, distance, and Mosaicity are further-refined values from MOSFLM.  RMS is the root mean squared spot deviation in mm, and is invariably lower than that reported in the first table because MOSFLM's model is more sophisticated than that of LABELIT.  The Resolution is derived from an analysis of the integrated but unmerged spot output from MOSFLM.  The value reported is the resolution limit where the average I/sigma drops below 0.75.  Values are extrapolated from a log plot if the crystal diffracts past the image edge.

    The integration results from the triclinic setting are always reported, as are the results from the best-guess smiley face setting from the first table.  RMS values are compared for these two settings.  In this case mC gives 0.04 mm, while aP gives 0.039mm, so the monoclinic-C is awarded a second smiley face.  If the top solution has an RMS more than twice the triclinic setting, it is still listed, but further integrations are done, proceeding down the solution list until one is found with a reasonable RMS.

    The MarIP user will note that the MOSFLM beam center is different than that for LABELIT, reflecting coordinate systems related by a simple transformation.

    File Output

    In the event that the user wants to do further analysis with MOSFLM, appropriate control files are provided.  Hyperlinks are provided to download unix scripts for controlling interactive MOSFLM sessions.  A different script is provided for each of the numbered Bravais solutions.  For the example above, scripts integration01.csh and integration02.csh are generated.

    Parameters given in the mosflm input files represent the model as presented in the "LABELIT Indexing" table described above.  Mosaicity is the 80% value.  Resolution is the limiting value given for the :) solution from the "MOSFLM Integration" table.  X-ray source parameters such as polarization, dispersion and divergence are dummy values are appropriate for the Advanced Light Source, but which may require modification if the data were collected at other sources.

    Running any of these scripts starts an interactive MOSFLM session.  The user may then re-estimate the mosaicity, alter parameters if desired, post-refine the model, and integrate the dataset.

    Image Outputs

    Three types of PNG-format image files are produced to visually summarize the results for the most likely indexing setting.  The original image is 2x2 binned (4x4 for Quantum 315) and brightness/contrast is adjusted.

    Frequently Asked Questions

    My beam coordinates are wrong.

    When using the web interface, the "Verify the data collection parameters" form is provided so the beam coordinates supplied in the file header can be overridden: one simply "clicks" on the correct beam center on the image shown.  This has been extensively tested for ADSC images, but less so for Mar images, where there may be problems.  Also, it is assumed that the detector will be roughly centered on the direct beam, so the web form only shows a close up view of the image center.  If the beam is not located in this part of the image, the only way to determine the coordinates is to use the graphical viewer in MOSFLM, after which they can be typed in to the web form.

    The MOSFLM step always fails.

    Here's one less-obvious thing to check. LABELIT checks the idle time of the MOSFLM child process by comparing the MOSFLM log file modification time against the current system time. This procedure can only succeed if the system clocks on the file server and CPU are running within a few seconds of each other. If the file server clock is running significantly ahead of the CPU system clock, it will invariably lead to program failure. It's really important to activate network time protocol (or equivalent) on all involved machines.