cryo-EM Re-Refine

PDB code	4-letter identifier of the atomic model. Can be accessed via the PDB website.
EMD code	identifier of the map file(s), at least 4 digits long. Can be accessed via the EMD website.
resolution	Consensus resolution according to map file, CIF file and PDB file.
Date from header	Date when the model was published, obtained from the PDB file, in yyyy-mm-dd format.
Date Processed	Date when the re-refinement was performed, yyyy-mm-dd format.
Logfile	If re-refinement was successful, the logfile is from phenix.real_space_refine, showing the events and steps during the run. If there was an error during any of the steps performed in the re-refinement script (refinement, validation, ligand analysis, etc.), the program output or the assertion error message is shown.

# atoms	Number of atoms in the model.
# residues	Number of residues in the model
# chains	Number of chains in the model.
# water	Number of water molecules in the model.
# H	Number of H atoms in the model. The density of H atoms can be observed at very high resolution, typically better than 1 Å. H atoms in cryo-EM models (which are determined at much lower resolutions) are therefore placed using prior knowledge, such as known bond lengths and angles from high resolution X-ray strutures.
fraction H atoms (%)	Fraction of H atoms among the total number of atoms. Around half of the number of atoms in a protein (~50%) are H atoms. If the fraction is smaller it is indicative that H atoms were built in parts of the model, such as for ligands or selected residues only.
B_min	Smallest isotropic B-factor found in the model.
B_max	Largest isotropic B-factor found in the model.
B_mean	Average isotropic B-factor.
occ = 1	Number of atoms with occupancy equal to 1.
0 <occ < 1	Number of atoms with occupancies between 0 and 1.
occ = 0	Number of atoms with occupancy equal to zero. These atoms are present in the model and displayed by graphics programs but they do not contribute to map calculations. Generally, atoms should not be modelled with zero occupancy.

Ramachandran favored, allowed, outliers	A two-dimensional graph of the phi and psi backbone angles. Due to steric constraints, only certain combinations are possible (Ramachandran et al. (1963) J Mol Biol 7:95-99) and the he plot is therefore divided into "favored", "allowed", and "outlier" regions. Validation reports indicate the percentage of phi/psi angle pairs in these areas. A well-refined structure should have 98% of residues favored, and less than 0.2% outliers, although it may be difficult to obtain these statistics at lower resolutions. Note that the expected distribution varies depending on residue type and environment. The results page shows the Ramachandran plot for all non-Pro/Gly residues.
Rama-Z score	The Ramachandran Z-score characterizes the shape of the backbone angle distribution in the Ramachandran plot (Hooft et al., 1997). Even if a refined model has satisfying Ramachandran statistics in terms of fractions of residues belonging to favored/allowed/outlier regions, the distribution of backbone dihedrals can be improbable (Sobolev et al., 2020). A normal backbone protein backbone geometry results in Rama-Z values between -2 and 2. A less likely yet possible distribution has absolute Rama-Z values between 2 and 3. A Rama-Z score with an absolute value above 3 corresponds to an improbable Ramachandran distribution.
Clashscore	Validation statistic used in Molprobity and Phenix validation tools. Two atoms have a severe clash, if they overlap by more than 0.4 Å. The clashscore represents the number of severe atomic clashes per 1000 atoms. A well-refined structure should have a clashscore below 20; oftentimes clashes will require manual rebuilding to fix. Clashes can be visualized in KiNG or Coot as "Probe dots"; these are generated by Molprobity or the Comprehensive Validation in Phenix.
rmsd values	Protein structures are refined using prior knowledge about the molecule. Geometrical parameters of amino acids, such as bond lengths, bond angles, or dihedral angles are known from high resolution X-ray structures of polypeptides. These parameters are expected to be similar in proteins and are therefore restrained to stereochemical standards. The root mean square deviation (rmsd) of the refined geometry from these dictionary values gives in indication if the model conforms to the standards.
Bond rmsd	Root mean square deviation of covalent bond lengths from dictionary values.
Angle rmsd	Root mean square deviation of bond angles from dictionary values.
Chiral rmsd	Root mean square deviation of chiral volumes.
Planar rmsd	Root mean square deviation of atoms from a plane, such as for the peptide plane or aromatic side chains.
Dihedral rmsd	Root mean square deviation of torsion (dihedral) angles from dictionary values.
Cβ deviations	Number of instances where the distance between the observed location of the Cβ atom from its ideal position exceeds 0.25 Å. This distance represents distortions around the Cα atom, f.ex. if the backbone and/or sidechain are misfit, the Cβ atom may be moved far from the ideal position to compensate. A deviation of more than 0.25 Å is considered to be a validation outlier and should be inspected during the validation process.
Rotamer outliers	In the context of model validation (Phenix, Molprobity) rotamers refer to the set of dihedral angles in amino acid side chains. Some combinations are preferred while certain combinations are not possible, due to steric and other atomic interactions. Phenix uses MolProbity’s Ultimate Rotamer-Library (Hintze, B.J. et al., Proteins: Struc Func Bioinf 84, 1177-1189 (2016)) to categorize rotamers as "favored", "allowed" or "outlier" (similar to Ramachandran angles). The vast majority of sidechains in a finished structure should be favored unless the density very clearly supports an outlier conformation.
Minimum non-bonded distance	The shortest distance between two atoms which are not covalently bound or form other bonded interactions.

d₉₉	Resolution estimate related to map details. The value is obtained by gradually removing the highest resolution Fourier map coefficients and by identifying the resolution, at which the map calulated with this reduced set of coefficients starts to differ from the original map.
d_FSC	Resolution estimate of the experimental map determined with the FSC plot of two half maps (see also FSC plot).
d_Model	Resolution of the model-caluclated map which maximizes its similarity to the experimental map.
FSC plot	Fourier shell correlation (FSC) plot. This plot shows the correlation of Fourier map coefficients between two maps, binned in resolution shells. The FSC can be calulated between two half maps (reconstructed independently from each other), the plot is typically used to determine the resolution limit d_FSC. Another application is to evaluate the model-to-map fit in Fourier space by calculating the FSC between the experimental map and the map calculated from the model. The results page for individual structures shows the FSC Model-vs-Map plot and, if half maps are available, the FSC Half-Map plot.

CC_box, CC_mask	The model-map correlation coefficient (CC) reflects how well a model fits to to a map. Note that there is no universal approach to calculate a CC. Phenix cryo-EM tools apply the following procedure, differing in the choice of the map region: Formula to calculate the CC: Method to calculate a map from a model: Afonine et al. (2018) Choice of map region: Use the entire map: CC_box Use map values in a mask (envelope) around the molecule: CC_mask
EMRinger score	Score reflecting the side chain information content of EM maps (Barad, B.A. et al., Nature Methods 12, 943-946 (2015)). For each side-chain, the C_γ is rotated around the χ angle and the map density values are interpolated. An EMringer score is obtained as explained here.

CC_box, CC_mask

The model-map correlation coefficient (CC) reflects how well a model fits to to a map. Note that there is no universal approach to calculate a CC. Phenix cryo-EM tools apply the following procedure, differing in the choice of the map region:

Formula to calculate the CC:
Method to calculate a map from a model: Afonine et al. (2018)
Choice of map region:
- Use the entire map: CC_box
- Use map values in a mask (envelope) around the molecule: CC_mask

EMRinger score

Score reflecting the side chain information content of EM maps (Barad, B.A. et al., Nature Methods 12, 943-946 (2015)). For each side-chain, the C_γ is rotated around the χ angle and the map density values are interpolated. An EMringer score is obtained as explained here.

Definition of metrics