CCI Research

The Computational Crystallography Initiative came into being in late 1999 when Paul Adams moved to Berkeley Lab from Yale University. The goal of the group has been to develop new computational algorithms and tools for structural biology, in particular crystallography. This has lead to the Phenix software for automated macromolecular crystallography, new tools for neutron crystallography, and methods for automated diffraction data analysis. Members of the CCI group are also involved in a number of other research projects spanning a range of topics.

Automated Crystallography

One of the primary goals of the CCI group has been the development of a software package called PHENIX (Python-based Hierarchical Environment for Integrated Xtallography). This provides the necessary algorithms to proceed from reduced intensity data to a refined molecular model, and facilitate structure solution for both the novice and expert crystallographer. The development of the PHENIX software involves the extended international community of crystallographers both through workshops and formal collaborations. It is an international collaboration, funded by NIH and headed by the CCI group. Those currently involved are: Tom Terwilliger (Los Alamos National Laboratory), Randy Read (University of Cambridge, U.K.), Jane and David Richardson (Duke University). PHENIX is designed with an open and flexible architecture to encourage its use by other developers, and to promote easy incorporation into the home lab and synchrotron beamline environments. It is supported on UNIX, Windows, and Mac OSX platforms and openly distributed to non-profit users.

The PHENIX architecture has been designed from the ground up as a hybrid system of tightly integrated interpreted (`scripted') and compiled software modules. A mix of scripted and compiled components is invariably found in all major successful crystallographic packages, but often the scripting is added as an afterthought in an ad hoc fashion using tools that predate the object-oriented programming era. While such ad hoc systems are quickly established, they tend to become a severe maintenance burden as they grow. In addition, users are often forced into many time-consuming routine tasks such as manually converting file formats. In PHENIX, the scripting layer is the heart of the system. With only a few exceptions, all major functionality is implemented as modules that are exclusively accessed via the scripting interfaces. The object-oriented Python scripting language is used for this purpose. In about two decades, a large developer/user community has produced millions of lines of highly uniform, interoperable, mature and openly available sources covering all aspects of programming ranging from simple file handling to highly sophisticated network communication and fully featured cross-platform graphical interfaces. Embedding crystallographic methods into this environment enables an unprecedented degree of automation, stability and portability. By design, the object-oriented programming model fosters shared collaborative development by multiple groups. It is routine practice to hierarchically recombine modules written by different groups into ever more complex procedures that appear uniform from the outside.

Computational Crystallography Toolbox

Recent software design concepts have revolutionized the way large applications are written. Modern programs typically feature object-oriented design, databases, graphical user interfaces, distributed computing, and platform independence. For use in the PHENIX software package we have developed an open-source toolbox for fundamental crystallographic data structures and algorithms. This Computational Crystallography toolbox (cctbx) is also intended to allow other developers to efficiently implement crystallographic applications that exploit modern programming techniques.

We have implemented algorithms in C++ for the handling and manipulation of unit cell parameters, space group symmetry, and reciprocal-space arrays, for the calculation of statistical quantities binned by resolution range, for the conversion between reciprocal space arrays and real-space arrays by Fourier transformation, calculation of structure factors and gradients, geometric restraints and a wealth of other algoriths required for crystallographic calculations. The cctbx is available as an open-source package at SourceForge. The code is designed with an open and flexible architecture to promote extendability and easy incorporation into other software environments.

Neutron Crystallography

Approximately 85% of the structures deposited in the Protein Data Bank have been solved using X-ray crystallography, making it the leading method for three-dimensional structure determination of macromolecules. One of the limitations of the method is that the typical data quality (resolution) does not allow the direct determination of H-atom positions. Most hydrogen positions can be inferred from the positions of other atoms and therefore can be readily included into the structure model as a priori knowledge. However, this may not be the case in biologically active sites of macromolecules, where the presence and position of hydrogen is crucial to the enzymatic mechanism. This makes the application of neutron crystallography in biology particularly important, as H atoms can be clearly located in experimental neutron scattering density maps. Without exception, when a neutron structure is determined the corresponding X-ray structure is also known, making it possible to derive the complete structure using both data sets. We have implemented crystallographic structure-refinement procedures that include both X-ray and neutron data (separate or jointly) in the PHENIX system.

These computational tools have been developed as part of the Macromolecular Neutron Crystallography (MNC), a consortium between Oak Ridge National Laboratory (ORNL) and Lawrence Berkeley National Laboratory (LBL), lead by Paul Langan and Paul Adams. One of our primary goals has been to address the urgent need for new computational tools and methods to deal with the increasing number of neutron macromolecular structures to be determined and their increasing size and complexity. Neutron capabilities have been added to PHENIX, and the phenix.refine program provides a broad variety of efficient and fully automated tools for structure refinement using X-ray and neutron crystallography or both, such as restrained refinement at low resolution, simulated annealing and TLS modelling using maximum-likelihood targets, automatic detection and use of NCS, automatic detection and use of twinning information, sophisticated bulk-solvent correction and anisotropic scaling protocols, efficient handling of H atoms and proper refinement at ultra-high resolution.

Automated Diffraction Data Analysis

Knowledge of the crystal symmetry is of fundamental importance to macromolecular crystallography. Experience has shown that even when it is possible to solve a structure under the wrong symmetry, the resulting atomic model can have subtle errors leading to incorrect biological conclusions. Correct initial characterization of raw images (prior to data reduction) is of paramount importance, since high-throughput crystal screening is relied upon increasingly to identify the best crystalline samples prior to the collection of full data sets. Crystal screening, a standard option at many synchroton beamlines, is now routinely used to examine numerous samples sequentially under robotic control. Nicholas Sauter in the CCI group has developed new analysis methods, incorporated into the program LABELIT, which can be performed early during data acquisition, and are fast enough that it is feasible to pause to optimize the data collection strategy. Current methods address the automatic determination of the crystal lattice and symmetry, analysis of data with overlapping lattices, algorithms for autoindexing in the presence of translational pseudosymetry, and detection of underassigned rotational symmetry in solved structures.

Structural Biology Knowledgebase Technology Portal

An important aspect of encouraging advances in protein structure determination is to communicate new discoveries and strategies to a wide audience of scientists. One of the most effective ways to simultaneously store a large amount of information and make this information available to the scientific community is through a dedicated website. Under the auspices of the Protein Structure Initiative Structural Biology Knowledgebase, we established the PSI SBKB Technology Portal. The Technology Portal is a web resource that presents over 230 methods and technologies catalyzed by the NIH Protein Structure Initiative (PSI) high-throughput structure determination efforts. In addition, this web resource is a repository for tools developed by the wider scientific community for structural biologists to take and use in their research. The Technology Portal serves as an important conduit of communication not only among the PSI:Biology Centers and Partnerships, but between this consortium and the wider scientific and structural biology community. Many of the technologies described on the website benefit several areas and levels of scientific research. This versatility allows for the website to be used as a resource for attaining specific research goals and to encourage scientific collaboration.

Privacy and Security Notice

LBNL: Phone Book A-Z Index Search

About this website