Beamline Client/Server Spotfinder

An Extremely Fast Spotfinder for Real-Time Beamline Applications

Please see the Newsletter Article for initial details worked out in July 2010. A new multiprocessing Spotfinder for beamlines was released in July 2011 (see announcement), and the purpose of this page is to give current installation instructions.

Licensing

As of 19 July 2011, LBNL Spotfinder is released under the CCTBX License. Open-source downloads are available here; see the instructions below.

Quick Synopsis

Spotfinder has been adapted to process Bragg spots as fast as diffraction images can be generated by present X-ray detectors ( e.g., >= 12 Hz for the Pilatus-6M ). The main job is done by a multiprocessing server that divides the analysis of consecutive images among all available CPU cores, so that work can be done in parallel. The server's client is the data acquisition software written by the beamline group, such as a low-dose crystal rastering program that highlights the well-diffracting spots of the sample on top of a videomicrograph image. Client-server communication uses the standard http protocol, therefore the server's function can be easily tested using a standard Web browser, by typing in a URL like the following:

http://localhost:8125/spotfinder/distl.signal_strength?distl.image=/data/raster_1_0319.cbf&distl.res.outer=4.9&distl.bins.verbose=True

The query string in the above URL is constructed with field=value pairs that exactly match the options available for the command-line version of the program. Therefore the above URL will return the same text as this command:

distl.signal_strength distl.image=/data/raster_1_0319.cbf distl.res.outer=4.9 distl.bins.verbose=True

Of course, there is an important detail here: since it takes 1-2 seconds to analyze any given image, the client must get on with other work while waiting for the http reply; in other words the http requests must be issued asynchronously. This is efficiently handled by making the client multithreaded; each http request is handled by a separate thread in the client process. A straightforward example of how this can be done in Python (with just 40 lines of code) is given here.

Expected Performance

Our multiprocessing benchmarks give the following general results:

    64-bit Fedora 8   Intel Xeon (2.93 GHz)  16-cores 8.9 frames/sec (Pilatus-6M)
    64-bit Fedora 13  AMD Opteron(2.20 GHz)  48-cores 25  frames/sec (Pilatus-6M)

These tests were run with both data and code residing on a local disk (not NFS-mounted) on the server machine. Bragg spots were processed out to the corners of the Pilatus detector. For the Fedora 13 server it was only possible to obtain these optimal throughput rates when the multithreaded client was executed on a separate machine on the local subnet; indicating that the client actually utilizes significant resources.

Implementation Choices

Two Spotfinder servers (Apache/Mod-Python and Python) have been implemented, each giving identical data analysis with similar performance, but involving different tradeoffs related to their installation and maintenance. The Python server is easier to download and install, but is more difficult to tune for peak performance. The situation is explained here:

	Apache/Mod-Python Server	Python Server
Architecture	The Apache httpd Web server is used to implement a multiprocess server. The mod-python package is dynamically linked in to provide a Python interpreter within the Apache process. The LBL code (Python/C++) is executed within this mod-python/Python interpreter.	Built-in code from Python's BaseHTTPServer module is used to implement a multiprocess server.
Starting and Stopping the Server	The server process is started and stopped with the apachectl command distributed with Apache. However, since the CCTBX environment must first be sourced, a compound command is used: /bin/sh build/env_run httpd/bin/apachectl [start\|stop] The exact form of the command is printed out by the installer. Configuration details such as the port number are predefined by the installer, but can be edited later in httpd/conf/httpd.conf.	The CCTBX environment is sourced & the server process is started from the command line: source cctbx_build/setpaths.csh distl.mp_spotfinder_server_read_file distl.bins.verbose=\ True distl.port=8125 distl.processors=8 ^C stops the server from within this shell; alternatively the command distl.thin_client EXIT localhost 8125 kills the server from a remote shell.
Parallel Processes	The Apache server automatically creates new child processes to handle the computational load in response to client requests, so as to fully utilize available CPU cores. Unused child processes are then terminated automatically when the load is reduced.	The number of child processes is fixed and must be specified explicitly at run time with the command line option shown above (e.g., distl.processors=8).
Data Processing Options	For each client request, special processing directives must be built in to the URL, such as giving an outer resolution limit with distl.res.outer=2.0. The syntax is explained in the Quick Synopsis section above.	In addition to giving special processing directives in the URL for each request, global processing directives can be given on the command line at run time. For example, running distl.mp_spotfinder_server_read_file distl.res.outer=5 will impose a general resolution cutoff of 5 Angstroms, unless an override value is given in the URL of a particular client request.
Timing and Robustness	The Apache server automatically handles performance tuning. Any number of client requests can be issued, and requests will be queued until the server's CPU resources are available. Client code should use asynchronous requests, as discussed in the Synopsis section above.	Performance tuning is the responsibility of the client. If client requests are generated faster than the server can handle, the server will hang in an unpredictable way, severely impairing the throughput. As a demonstration the example client includes a "sleep()" command whose duration can be tuned. Decreasing the sleep will improve performance up to a point, beyond which the throughput rate is dramatically degraded. Experimentation is required for proper tuning.

Important Information About the Client

Optimal performance is achieved by running the server and client on separate hosts on the same local area network.
Use the python script general_client_example.py as an example for how to construct the multithreaded client. Modify the script as needed for the particular installation.
How can I output the client results to a file rather than screen output? The example in specific_client_example.py shows how to place a threading lock around the print statement for correct output.
If in addition, XML-formatted output is needed as used at Berkeley Labs, use the example bcsb_client_example.py.

Installing the Python Server

Create a suitable installation directory (empty!) and switch to that directory.
Obtain and extract the last official CCTBX source installer for Unix:

  wget http://cci.lbl.gov/cctbx_build/results/last_published/cctbx_python_273_bundle.selfx
  perl cctbx_python_273_bundle.selfx

Take note of the message at the end giving the exact command for sourcing the CCTBX environment.

Installing the Apache/Mod-Python Server

Create a suitable installation directory (empty!) and switch to that directory.
Obtain and extract the last official CCTBX source installer for Unix. Note this is a different source bundle than is used for the Python Server, above:

  wget http://cci.lbl.gov/cctbx_build/results/last_published/cctbx_python_273_bundle.tar.gz
  tar zxf cctbx_python_273_bundle.tar.gz

Second, get the underlying Apache & mod-python services.

  wget http://cci.lbl.gov/apache_services/apache_services.tar.gz
  tar zxf apache_services.tar.gz
  apache/install.csh

Take note of the message at the end giving the exact command for starting and stopping the Apache server, as well as for running the example client. This message is also saved in the file README_customized.

Using bleeding edge software

If you want to test out the very latest spotfinder code from the last nightly build, rather than the most recent standard distribution, you may substitute the directory current instead of last_published in the wget commands listed above.