Google

SCALA (CCP4: Supported Program)

NAME

scala - scale together multiple observations of reflections

SYNOPSIS

scala HKLIN foo_in.mtz HKLOUT foo_out.mtz
[Keyworded Input]

Keyworded input summary
References
Input and Output files
Examples
Release Notes

DESCRIPTION

Scaling options
Control of flow through the program
Partially recorded reflections
Scaling algorithm
TAILS correction
Data from Denzo
Datasets
Data harvesting

This program scales together multiple observations of reflections, and merges multiple observations into an average intensity.

Various scaling models can be used. The scale factor is a function of the primary beam direction, either as a smooth function of Phi (the rotation angle ROT), or expressed as BATCH (image) number. In addition, the scale may be a function of the secondary beam direction, acting principally as an absorption correction, either expanded as spherical harmonics, or as a interpolated three-dimensional function of Phi and the spatial coordinates of the measured spot on the detector. Such three-dimensional scaling is typically ill-determined, but it is generally useful if suitably restrained (see below for discussion of this). The secondary beam correction is related to the absorption anisotropy correction described by Blessing (Ref Blessing (1995) ), the interpolated three-dimensional correction is similar to that described by Kabsch (Ref Kabsch (1988)).

The merging algorithm analyses the data for outliers, and gives detailed analyses. It generates a weighted mean of the observations of the same reflection, after rejecting the outliers.

The program does three passes through the data:

  1. a scaling pass: firstly, there is an initial estimate of the scales, then the scale parameters are refined
  2. an analysis pass to analyse discrepancies and adjust the standard deviation estimates
  3. a final pass to apply scales, analyse agreement & write the output file, usually with merged intensities, but alternatively as a copy of the input file with evaluated scales appended to each observation.

Normally anomalous scattering is ignored during the scale determination (I+ & I- observations are treated together), but the merged file always contains I+ & I-, even if the the ANOMALOUS OFF command is used. Switching ANOMALOUS ON does affect the statistics and the outlier rejection (qv)

Scaling options

The optimum form of the scaling will depend a great deal on how the data were collected. It is not possible to lay down definitive rules, but some of the following hints may help.

  1. If successive images are collected with the same detector (on-line detector) or equivalent detectors, and the beam intensity is steady or smoothly varying, then use a smoothed scaling options. Only use the SCALE BATCH option if every image is different from every other one, i.e. off-line detectors (including film), or rapidly or discontinuously changing incident beam flux. This may often be the case for synchrotron data if "dose" mode is not used. It is possible to "mix-and'match" options. For instance, the best option for data from an unstable synchrotron beam may be e.g. SCALES BATCH BFACTOR ON BROTATION SPACING 10, which will make the Bfactor variation smooth, but the scales discontinuous by batch.
  2. If there is a discontinuity between one set of images and another (e.g. change of exposure time), then flag them as differents RUNs. This will be done automatically if no runs are specified.
  3. The SECONDARY correction is recommended: this provides a correction for absorption and is better than the DETECTOR option. It should always be restrained with a TIE SURFACE command (this is the default): under these conditions it is reasonably stable under most conditions, even in the absence of a reference dataset. It should only be combined with a B-factor if there is noticeable radiation damage and resolution beyond say 2.5A. The ABSORPTION (crystal frame) correction is similar to SECONDARY (camera frame) in most cases, but may be preferable if data has been collected from multiple alignments of the same crystal.
  4. If the SECONDARY (or ABSORPTION) correction is not used, use a B-factor correction, unless the data are only low-resolution. Traditionally, the relative B-factor is a correction for radiation damage (hence it is a function of time), but it also includes some other corrections eg absorption. For frozen crystals with little or no radiation damage it is better to turn off the B-factor and use the SECONDARY correction.
  5. The TAILS correction should be tried if the fractional bias is significant (this statistic is not calculated if there are no fully-recorded reflections). The refinement of the TAILS parameters is not very robust, and it may be necessary to FIX A1 (this should be improved).
  6. When trying out more complex scaling options (eg SECONDARY, TAILS), it is a good idea to try a simple scaling first, to check that the more elaborate model gives a real improvement.
  7. For isomorphous replacement, it may be useful to provide a native dataset as a reference, to make the systematic errors in the derivative similar to those in the native (ie "local" scaling, using the SECONDARY option). When scaling multiple MAD data sets they should all be scaled together in one pass, outliers rejected across all datasets, then each wavelength merged seperately. This is now the default if multiple datasets are present in the input file.
Other options are described in greater detail under the KEYWORDS.

Control of flow through the program

Each of the stages can be individually activated or suppressed. Particularly useful options are:
  • Restarting scaling after a crash or failure to converge :the RESTORE option enables a restart from where you left off. Scales are dumped by default to a file SCALES after each cycle in case of crashes (see DUMP/NODUMP options).
  • Rerunning the merge step without repeating the scaling, using the ONLYMERGE and RESTORE commands, eg to adjust the SDCORRECTION parameters

Partially recorded reflections

See appendix 1

Partially recorded reflections are by default included the scaling pass, as well as included in the final analysis and merging. They may optionally be excluded from the scaling (controlled by the command INTENSITIES), and excluded from the final analysis (controlled by the command FINAL). Note that this default has changed from earlier versions

The different options for the treatment of partials are set by either the PARTIALS command, effective for both scaling & merging stages; or separately for the scaling stage only (INTENSITIES command) or for the merging stage only (FINAL command).

Partials may either be summed or scaled : in the latter case, each part is treated independently of the others.

For datasets with few partials, with low mosaicity compared to the image widths, very few partials run over more than two images, & partial summation is not usually a problem. If you have many partials running over 3 or more images, you may need to tune the partial selection flags to accept or reject partial sets according to their reliability.

Summed partials:
All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.

Scaled partials:
In this option, each individual partial observation scaled up by the inverse FRACTIONCALC, provided that the fraction is greater than <minimum_fraction> [default = 0.5].

Scaling algorithm

See appendix 2

TAILS correction

The TAILS (SCALES .. TAILS) correction may be used to improve poor partial bias: this is an attempt to allow for the difference in scan width between fulls and partials. A partial is measured across twice (or 3 times etc) the rotation width of a full, so more of the diffuse scattering tails are included in the intensity, leading to an under-estimation of the fulls relative to partials. This correction is not very robust (though more so than in earlier versions of Scala), and the parameters may be unstable: you should always try first without this correction, and check that it really does improve the data statistics, without applying ridiculously large corrections. See appendix 3 for more details.

Data from Denzo

Data integrated with Denzo may be scaled and merged with Scala as an alternative to Scalepack, or unmerged output from scalepack may be used. Both have some limitations. See appendix 4 for more details.

Datasets

Data in MTZ files are assigned to "datasets", within a heirarchy of Crystal/Dataset [crystal names are not yet implemented]. A crystal also has a "project name" which is not part of the hierarchy but is used to group data for harvesting. Each of these levels of heirarchy has "properties": a crystal has a unit cell, and a dataset has a wavelength. Unmerged data files as used in Scala typically contain a single dataset, but may contain multiple datasets if for instance multiple wavelength datasets are being scaled together, or if a reference set is present. Each BATCH in the file is assigned to a specific dataset.

Assigning a dataset:-
  1. Preferably, a project name, crystal name and dataset name should be assigned when the file is created, eg in Mosflm
  2. Utility programs eg REBATCH may be used to (re)assign dataset names and add or correct dataset properties (wavelength and cell)
  3. Names may be (re)assigned within Scala using the NAME command. This may be useful if names have not been assigned before, or if data from different crystals are merged into a single dataset.
Using datasets in Scala:
  1. A RUN may not contain batches from different datasets, but a dataset may contain multiple runs. By default if runs are not explicitly defined, each dataset is assigned to a different run. Datasets may be explicitly assigned to runs (see the RUN command).
  2. By default, each dataset is written out to a different output file, (see OUTPUT options).
  3. By default, outliers are rejected across all datasets (unless REJECT SEPARATE). This is normally a sensible thing to do for MAD data, since the expected differences are small, but carries with it the danger of rejecting real differences. Suitable options might be:-
             ANOMALOUS ON
             REJECT 6 ALL 8  # to check between I+ and I-
    
    but larger numbers might be more appropriate for strong signals and good data
  4. Various analyses are done between datasets, comparing the anomalous differences and the dispersive (isomorphous) differences from a defined "base" set (ie correlation between ((I(i) - I(base)) and (I(j) - I(base)) (i .ne. j .ne. base)). Typically the base dataset would be a high-energy remote (this is the default), but it may be set with the BASE command.

Data Harvesting

Provided a Project Name and a Dataset Name are specified (either explicitly or from the MTZ file) and provided the NOHARVEST keyword is not given, the program will automatically produce a data harvesting file. This file will be written to

$HARVESTHOME/DepositFiles/<projectname>/ <datasetname>.scala

The environment variable $HARVESTHOME defaults to the user's home directory, but could be changed, for example, to a group project directory.

See also Data Harvesting.

KEYWORDED INPUT - SUMMARY

Summary classification of keywords

  • The most commonly used keywords (almost essential)

    RUN
    define subsets of data as "runs"
    SCALES
    define scaling method (scaling model)
    SDCORRECTION
    set SDcorrection parameters (particularly after first run). Note that the multiplie Sdfac is adjusted automatically, but the fraction of intensity Sdadd is not, but defaults to 0.02.
    REJECT
    set outlier rejection limits
    ANOMALOUS on
    anomalous scattering is present
    RESOLUTION
    resolution limits
    TITLE
    set a title

  • Control of program flow

    ONLYMERGE
    Skip the scaling, go straight to merge step: this requires RESTORE as well if the original input HKLIN file is used, but not if a file from a previous OUTPUT SEPARATE run is re-input.
    RESTORE
    restore previously-determined scales, eg after convergence failure or instead of re-running the scaling

  • General keywords:

    PARTIALS
    controls acceptance of partials
    PRINT
    how much printing in logfile

  • Principal keywords affecting scaling

    CYCLES
    number of cycles and convergence etc
    EXCLUDE
    select reliable reflections for scaling
    TIE
    restrain scaling parameters, particularly useful for the SECONDARY (ABSORPTION) scaling option
    LINK
    use same scaling parameters for different runs (for suface parameters (SECONDARY, ABSORPTION) or TAILS)
    INTENSITIES full
    use only fulls in scaling

  • Principal keywords affecting merging

    OUTPUT
    what to put in the output file
    FINAL
    treatment of partials

  • Dataset and Data Harvesting keywords

    NAME
    assign project/crystal/dataset name
    BASE
    define "base" dataset for dispersive differences
    PRIVATE
    directory permissions for user only
    USECWD
    write deposit file to current directory
    RSIZE
    width of a row in deposit file
    NOHARVEST
    do not write deposit file

  • Rarely used keywords: ANALYSE, BINS, DAMP, DUMP, FILTER, HISTORY, INITIAL, INSCALE, NODUMP, NOSCALE, OVERLAPMAP, SKIP, SMOOTHING, [UN]FIX, UNLINK, WIDTH, XYBINS

KEYWORDED INPUT - DESCRIPTION

In the definitions below "[]" encloses optional items, "|" delineates alternatives. All keywords are case-insensitive, but are listed below in upper-case. Anything after "!" or "#" is treated as comment. The available keywords are:

ANALYSE, ANOMALOUS, BASE, BINS, CYCLES, DAMP, DUMP, EXCLUDE, FILTER, FINAL, HISTORY, INITIAL, INSCALE, INTENSITIES, LINK, NAME, NODUMP, NOHARVEST, NORMALISE, NOSCALE, ONLYMERGE, OUTPUT, OVERLAPMAP, PARTIALS, PRINT, PRIVATE, REJECT, RESOLUTION, RESTORE, RSIZE, RUN, SCALES, SDCORRECTION, SKIP, SMOOTHING, TIE, TITLE, [UN]FIX, UNLINK, USECWD, WIDTH, XYBINS

RUN <Nrun> [<subkeys>]

Define a "run" : Nrun is the Run number, with an arbitrary integer label (i.e. not necessarily 1,2,3 etc). A "run" defines a set of reflections which share a set of scale factors. Typically a run will be a continuous rotation around a single axis. The subkeys allow definition of a run in a flexible way. The definition of a run may use several RUN commands. If no RUN command is given, or if the ALL keyword is used, then run assignment will be done automatically, with run breaks at discontinuities in dataset, batch number or Phi. Batches or batch ranges may still be excluded, either with the EXCLUDE subkey here, or by using the EXCLUDE keyword (qv)

Subkeys:
REFERENCE
This run is a reference set, i.e. it will be given a single scale factor = 1.0 (an input scale factor in the SCALE column will still be applied if present)
BATCH | <b1> <b2> <b3> ... | <b1> TO <b2> |
Define a list of batches, or a range of batches, to be included in or excluded from the run. If batches are included in more than one run definition, the last definition will take priority.
ALL
Include all batches. In this case automatic run assigment will be done: to override this use eg RUN 1 BATCH 1 to 99999
CRYSTAL <crystal_name>
Define a crystal name to be included in the run. This would usually be used in conjunction with the DATASET subkey. Crystal names are not defined at present, so this option is not very useful.
DATASET <dataset_name>
Define a dataset name to be included in the run. A crystal name may be combined with the dataset name using the syntax <crystal_name>/<dataset_name>. The dataset names used here are those present in the input file, not those assigned or altered by the NAME command.
INCLUDE | EXCLUDE
Set include/exclude flag for a following RANGE or BATCH keyword. Excluded batches or ranges will be omitted from the output file.
RANGE <r1> TO <r2>
Rotation range to include or exclude

Examples:

  RUN 1 BATCH 1 TO 10000    # unconditionally include all batches
  RUN 1 ALL  EXCLUDE 77 79 132  # automatic run splitting will be done
  RUN 1 INCLUDE BATCH 1 TO 200 EXCLUDE 77 79 132
  RUN 2 CRYSTAL  Native DATASET Lambda1
  RUN 3 DATASET  Native/Lambda2
  RUN 4 INCLUDE RANGE 0 TO 90 EXCLUDE RANGE 45 TO 48

SCALES [<subkeys>]

Define layout of scales, ie the scaling model. Note that a layout may be defined for all runs (no RUN subkeyword), then overridden for particular runs by additional commands.

Subkeys:
RUN <run_number>
Define run to which this command applies: the run must have been previously defined. If no run is defined, it applies to all runs
ROTATION <Nscales> | SPACING <delta_rotation>
Define layout of scale factors along rotation axis (i.e. primary beam), either as number of scales or (if SPACING keyword present) as interval on rotation [default SPACING 10]
BATCH
Set "Batch" mode, no interpolation along rotation (primary) axis.This option is compulsory if a ROT column is not present in the input file, but otherwise the ROTATION option is preferred.
SMOOTH <delta_batch>
Set smoothed Batch mode: this treats the batch number as a rotation angle, and interpolates along rotation axis in the same way as the ROTATION option. <delta_batch> sets the interval on batches (ie the number of batches to smooth over). This option is an alternative to ROTATION if you have lost the information in the ROT column (spindle rotation angle (Phi)), but otherwise the ROTATION option is preferred.
BFACTOR ON | OFF | ANISOTROPIC
Switch Bfactors on or off. The default is ON, but Bfactor refinement will be switched off by default if the scales are allowed to vary across the detector or on secondary beam direction (qv DETECTOR, SECONDARY, ABSORPTION, SURFACE). The ANISOTROPIC keyword activates anisotropic Bfactors (NOT RECOMMENDED): beware that the parameters for this option is likely to be poorly determined. Note that the anisotropic correction is centrosymmetric.
BROTATION [|TIME] <Ntime> | SPACING <delta_time>
Define number of B-factors or (if SPACING keyword present) the interval on "time": usually no time is defined in the input file, and the rotation angle is used. SCALES BATCH BROTATION SPACING 5 make the Bfactor variation smooth, but the scales discontinuous by batch.
SECONDARY [<Lmax>]
Secondary beam correction expanded in spherical harmonics up to maximum order Lmax in the camera spindle frame. The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6). This correction would typically be combined with the usual primary beam correction (eg ROTATION SPACING 5 SECONDARY 6). The deviation of the surface from spherical should be restrained eg with TIE SURFACE 0.001 [default]
ABSORPTION [<Lmax>]
Secondary beam correction expanded in spherical harmonics up to maximum order Lmax in the crystal frame based on POLE (qv). The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6). This correction would typically be combined with the usual primary beam correction (eg ROTATION SPACING 5 ABSORPTION 6). The deviation of the surface from spherical should be restrained eg with TIE SURFACE 0.001 [default]. This is not substantially different from SECONDARY in most cases, but may be preferred if data are collected from multiple settings of the same crystal, and you want to use the same absorption surface. This would only be strictly valid if the beam is larger than the crystal.
SURFACE [<Lmax>]
Local correction expanded on direction of the scattering vector in hkl space (ie crystal frame) in spherical harmonics up to maximum order Lmax. The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6). The polar axis may be specified with the POLE keyword (qv). If you want to do 3-dimensional scaling, the SECONDARY or ABSORPTION option is preferable: this option should only be used if the diffraction geometry information required to work out the beam directions is not available.
POLE <h|k|l>
Define The polar axis for ABSORPTION or SURFACE as h, k or l (eg POLE L): the pole will default to either the closest axis to the spindle (if known), or l (k for monoclinic space-groups).
DETECTOR <Nscales_X> [<Nscales_Y>] | SPACING <delta_X> [<delta_Y>]
Define layout of scale factors on detector (i.e. secondary beam), either as number of scales in each direction (along XDET & YDET), or (if SPACING keyword present) as interval on XDET & YDET. The values for Y default equal to those for X is not specified. This option assumes that the detector positions are recorded in the input file (columns XDET, YDET), in any units (mm or pixels). If you allow the scale to vary across the detector (anything other than DETECTOR 1, the default), then by default Bfactor refinement is switched off, since the combination is likely to be unstable [Default 1 scale, i.e. no variation of scale across detector]. The SECONDARY option is probably better.
CONSTANT
One scale for each run (equivalent to ROTATION 1)
TAILS [<v> [<a0> [<a1>]]]
Apply correction for diffuse scattering (reflection tails) for this run. This can only be used with summed partials (INTENSITIES PARTIALS: this is the default). See introduction for explanation. Initial values for the parameters v, a0 & a1 may be given following the keyword.
v
width of tails in reciprocal space (A**-1) [default = 0.01]
a0
fraction of intensity in diffuse peak at theta = 0 [default = 0.0, fixed]
a1
slope of intensity fraction against (sin theta/lambda)**2 [default = 10]

Parameters may be fixed using the FIX command, or the same set used by different runs as defined by the LINK command. These controls may be required to avoid the parameters going wild.
SLOPE
NOT RECOMMENDED. Set "Slope" mode, like Batch, except that each batch has different scales at the beginning and end of the rotation range. The value used for each reflection is interpolated linearly according to the "Rotation" (phi) value. SLOPE implies BATCH mode. Be careful with this option: does it really improve the data? It is unlikely to work well if the mosaicity is large. TIE ROTATION may be used to restrain the difference in scales.

SDCORRECTION [RUN <RunNumber>] [[NO]ADJUST]
[FULL | PARTIAL | BOTH] <Sdfac> [<SdB>] <Sdadd>

Modifiers for input standard deviations: these are modified to

        sd(I) corrected = Sdfac * sqrt(sd(I)**2 + SdB*Ihl' + (Sdadd*Ihl)**2)

where Ihl is the intensity (SdB may be omitted in the input and its use is not recommended). Default values are 1.0, 0.0, 0.02.

RUN <run_number>
Define run to which this command applies: the run must have been previously defined. If no run is defined, it applies to all runs. Different values may be specified for fully recorded reflections (FULL) and for partially recorded reflections (PARTIAL), or the same values may be used for both (BOTH), e.g.

         sdcorrection full 1.4 0.11 part 1.4 0.05

The keyword NOADJUST stops the automatic adjustment of the Sdfac parameters from the normal probability analysis at the beginning of the merge stage [default is ADJUST] (this applies to all runs)

With the output options SEPARATE or POSTREF, the modified Sds are written to the output file in columns SIGIC [& SIGIPRC if IPR is present]. These columns will be used by Postref but ignored on reinput to Scala.

PARTIALS [NO]CHECK [NO]TEST [<lower_limit> <upper_limit>] CORRECT <minimum_fraction>] [NO]GAP MAXWIDTH <maximum_width> SCALE_PARTIAL <minimum_fraction> USE_PROFILE

Select the way in which partials are treated in both scaling and merging. These settings may be overridden separately for the scaling and merging steps with the INTENSITIES and FINAL commands respectively.

By default, partials are included (summed) in both scaling and in merging.

Subkeys:
[NO]CHECK
do [not] check for consistency of MPART flags (if present, i.e. from Mosflm). Reflections failing this test are tested for total fraction (see TEST option) [default do if MPART is present]
[NO]TEST [<lower_limit> <upper_limit>]
do [not] accept partials only if total fraction (from FRACTIONCALC column) is in range lower_limit -> upper_limit [default if no MPART flag, limits 0.95, 1.05]
CORRECT [<minimum_fraction>]
Scale partials in range minimum_fraction -> lower_limit, predicted total fraction (needs reliable FRACTIONCALC) [default minimum = <lower_limit>]
[NO]GAP
do [not] accept partials with a gap in, e.g. a partial over 3 parts with the middle one missing. GAP implies NOCHECK and TEST: CORRECT may also be set [default NOGAP]
MAXWIDTH <maximum_width>
maximum number of parts for an acceptable summed partial
SCALE_PARTIALS
use scaled partials greater than <Minimum_fraction> in the scaling. Only use this if the FRACTIONCALC column contains a good estimate of the partiality.
USE_PROFILE
use profile-fitted intensity even for scaled partials

INTENSITIES

[INTEGRATED | PROFILE | PR_PART]
[[NO]ANOMALOUS]
[FULLS | ONLYFULLS | SCALE_PARTIAL <minimum_fraction>
| PARTIALS [ [NO]CHECK | [NO]TEST [<lower_limit> <upper_limit>] [CORRECT <minimum_fraction> ] [ [NO]GAP ] [MAXWIDTH <maximum_width>] ] ]

Intensities selection for scaling: which intensities to use, whether to keep Bijvoet pairs separate, and treatment of partials in scaling:

(a) Intensity selection options:

Set which intensity to use, of the integrated intensity (column I) or profile-fitted (column IPR), if both are present. Note this applies to all stages of the program, scaling & averaging.

Subkeys:
INTEGRATED
summation integrated intensity I.
PROFILE
profile-fitted intensity IPR [default if present]. Note that this will not be used for scaled partials unless PARTIALS USE_PROFILE is set.
PR_PART
profile IPR for fullys, integrated for partials

(b) Treatment of Bijvoet-related observations

By default, all observations (I+ & I-) are treated alike in scaling. This is normally the correct thing to do, since the anomalous differences are usually small and randomly positive and negative. In a case with large anomalous differences and high redundancy, it may be better to keep the I+ & I- observations separate in the scaling. Note that typically this will severely reduce the scaling overlaps between different parts of the data, and is not recommended except in special cases.

Subkeys:
ANOMALOUS
keep I+ and I- observations separate in scaling
NOANOMALOUS
use I+ and I- together in scaling [default]

(c) Options for treatment of partials in scaling (overrides options given under PARTIALS):

Set whether partially recorded reflections should be used in scaling, & if so, whether to use summed or scaled partials. By default summed partials are used in scaling as well as fulls. See introduction above for a description of the use of partially recorded reflections. Treatment of partials in the final averaging stage is defined with the FINAL command

Subkeys:
FULLS
use fully recorded observations only, & previously summed partials (from MOSFLM ADDPART)
ONLYFULLS
use fulls only: exclude previously summed partials (from MOSFLM)
SCALE_PARTIALS
use scaled partials greater than <Minimum_fraction> in the scaling. Only use this if the FRACTIONCALC column contains a good estimate of the partiality.
PARTIALS
use summed partials in scaling (if present) [this is the default]. The following flags are qualifiers of PARTIALS and will override those given on a previous PARTIALS command, for the scaling step only (not merging):
[NO]CHECK
do [not] check for consistency of MPART flags (if present). Reflections failing this test are tested for total fraction (see TEST option) [default do if MPART is present]
[NO]TEST [<lower_limit> <upper_limit>]
do [not] accept partials only if total fraction (from FRACTIONCALC column) is in range lower_limit -> upper_limit [default if no MPART flag, limits 0.95, 1.05]
CORRECT [<minimum_fraction>]
Scale partials in range minimum_fraction -> lower_limit, predicted total fraction (needs reliable FRACTIONCALC) [default minimum = <lower_limit>]
[NO]GAP
do [not] accept partials with a gap in, e.g. a partial over 3 parts with the middle one missing. GAP implies NOCHECK and TEST: CORRECT may also be set [default NOGAP]
MAXWIDTH <maximum_width>
maximum number of parts for an acceptable summed partial

REJECT
[SCALE | MERGE] [COMBINE] [SEPARATE]
<Sdrej> [<Sdrej2>]
[ALL <Sdrej+-> [<Sdrej2+->]]
[KEEP | REJECT | LARGER | SMALLER]

Define rejection criteria for outliers: different criteria may be set for the scaling and for the merging (FINAL) passes. If neither SCALE nor MERGE are specified, the same values are used for both stages. The default values are REJECT 6 ALL 8

If there are multiple datasets, by default, deviation calculations include data from all datasets [COMBINE]. The SEPARATE flag means that outlier rejections are done only between observations from the same dataset. The usual case of multiple datasets is MAD data.

If ANOMALOUS ON is set, then the main outlier test is done in the merging step only within the I+ & I- sets for that reflection, ie Bijvoet-related reflections are treated as independent. The ALL keyword here enables an additional test on all observations including I+ & I- observations. Observations rejected on this second check are flagged "@" in the ROGUES file. In the scaling step, the outlier check includes all observations, unless anomalous observations are kept separate in scaling (INTENSITIES ANOMALOUS: this is an unusual option for special cases only).

Subkeys:
SEPARATE
rejection & deviation calculations only between observations from the same dataset
COMBINE
rejection & deviation calculations are done with all datasets [default]
SCALE
use these values for the scaling pass
MERGE
use these values for the merging (FINAL) pass
sdrej
sd multiplier for maximum deviation from weighted mean I [default 6.0]
[sdrej2]
special value for reflections measured twice [default = sdrej]
ALL
check outliers in merging step between as well as within I+ & I- sets (not relevant if ANOMALOUS OFF)
sdrej+-
sd multiplier for maximum deviation from weighted mean I including all I+ & I- observations (not relevant if ANOMALOUS OFF)[default check within I+ & I- sets only]
[sdrej2+-]
special value for reflections measured twice [default = sdrej+-]
KEEP
in merging, if two observations disagree, keep both of them [default]
REJECT
in merging, if two observations disagree, reject both of them
LARGER
in merging, if two observations disagree, reject the larger
SMALLER
in merging, if two observations disagree, reject the smaller

The test for outliers is described in Appendix 5

ANOMALOUS [OFF] [ON | ALL]

[RUN <Nrun>]
[MATCH [SPINDLE | INVERT | <hkl symmetry>]]
[PHIDIF <maximum Phi difference>]

Controls the treatment of anomalous scattering information in the merging step. Note that the option of selecting matching anomalous pairs is not recommended for normal use: it is likely to lead to seriously incomplete data in many cases, and the results should be compared carefully with those with the MATCH option switched off.

Subkeys:
OFF [default]
no anomalous used, I+ & I- observations averaged together in merging
ON | ALL
separate anomalous observations in the final output pass, for statistics & merging: this is also selected the keyword ANOMALOUS on its own
RUN <run number>
set run for this MATCH option to apply to, otherwise it applies to all runs [default]
MATCH
use only matching I+ & I- pairs in merging
Matching pairs are :-
  • (a) in same run
  • (b) related by defined symmetry (if given as SPINDLE | INVERT | <hkl symmetry>)
  • (c) not more than DeltaPhi apart (if given by PHIDIF)
  • Definition of symmetry:-

    SPINDLE
    related by negation of reciprocal index closest to spindle: this option requires full orientation data to be present in the file
    INVERT
    related by inversion of indices, i.e. -h, -k, -l
    <hkl symmetry>
    specified hkl symmetry (e.g. h, -k, l)
    PHIDIF <DeltaPhi>
    maximum difference in Phi (ROT) between matching pairs

    RESOLUTION [RUN <Nrun>] [[LOW] <Resmin>] [[HIGH] <Resmax>]

    Set resolution limits in Angstrom, either order, optionally for individual runs (in which case this command MUST come after definition of the run). The keywords LOW or HIGH, followed by a number, may be used to set the low or high resolution limits explicitly: an unset limit will be set as in the input HKLIN file. If the RUN subkeyword is omitted, the limit applies to all runs. [Default use all data]

    TITLE <new title>

    Set new title to replace the one taken from the input file. By default, the title is copied from hklin to hklout

    ONLYMERGE

    Only do the merge step, no initial analysis, no scaling (== INITIAL NONE; NOSCALE). Note that this will usually need to be combined with a RESTORE command.

    RESTORE [<Scale_file_name>]

    Read initial scales from a SCALES file from a previous run of Scala (scales are normally dumped on every cycle, see DUMP). The number of scales defined for each run this time should typically be the same as in the dump, although a set of scale factors along ROTATION or DETECTOR may be extrapolated to additional batches which were not present in the initial scaling. The file may contain scales for runs which are not used this time, but new runs may not be added. RESTORing from a scale file which does not properly correspond to the run which generated the file is liable to give silly results. No initial analysis pass will be done unless the command INITIAL ANALYSE is given.

    INITIAL MEAN | UNITY | RUN <RunNumber> <InitialScale> | NONE | ANALYSE

    Define method of setting initial scales

    Subkeys:
    MEAN
    from mean intensities by rotation range [default]
    UNITY
    set all scales = 1.0
    RUN <RunNumber> <InitialScale>
    set initial scale factor for this run If this option is used, any runs whose scales are not set explicitly will have their scales set = 1.0
    NONE
    no initial analysis pass, set all scales to unity
    ANALYSE
    force initial analysis pass even if RESTORE option is used

    PRINT [<subkey>]

    Define amount of printing

    Subkeys:
    NONE
    almost none
    BRIEF
    some more [default]
    CYCLES
    more information about each minimization cycle
    FULL
    quite a lot
    DEBUG [<reflection_interval>]
    far too much: also define reflection interval for printing
    ALLOVERLAP
    print all numbers in overlap matrix after initial pass, rather than the default condensed table
    OVERLAP
    print condensed table of overlap matrix after initial pass
    NOOVERLAP
    no printing of overlap matrix after initial pass [default]

    CYCLES [[NUMBER] <Ncycle>] [CONVERGE <Conv_limit>] [REJECT <Rej_cycle>] [WEIGHT VARIANCE | UNIT ]

    Define number of refinement cycles, convergence limit, and weighting scheme for scale refinement

    Subkeys:
    [NUMBER]
    maximum number of cycles [default 10]
    CONVERGE
    convergence limit (multiple of sd(param)) [default 0.3]
    REJECT
    1st cycle number for rejection of outliers [default 2] The default is not to reject outliers on the first cycle when the scales may be a long way off, but if the initial scales are reasonable (particularly if they come from a previous run) it is probably better to exclude outliers from the first cycle as well
    WEIGHT VARIANCE | UNIT
    Weighting scheme for scale refinement: VARIANCE weighting is default and usual; UNIT weights may help if the scale-factors vary over a large range (unit weights have not been much tested)

    EXCLUDE [RUN <Nrun>]
    [[NO]EMAX <maximum_E> | EPROB <minimum_probability>]
    [SDMIN <value>] [SDMAX <value>] [ABSMAX <value>]
    [ARC INSIDE|OUTSIDE <X1> <Y1> <X2> <Y2> <X3> <Y3> ... <Xn> <Yn>]
    [RECTANGLE <Xmin> <Xmax> <Ymin> <Ymax>] [BATCH <batch range>|<batch list>] [CRYSTAL <crystal_name>] [DATASET <dataset_name>]

    Set intensity limits or positional limits for excluding observations.

    Limits for scaling and merging passes:-
    EMAX or EPROB, ARC, RECTANGLE, BATCH, CRYSTAL and DATASET limits apply to all stages of the program

    Limits for scaling pass only:-
    If an observation is considered too weak (I .lt. sd(I) * SDMIN), or if an observation is too strong (I .gt. sd(I) * SDMAX .or. I .gt. ABSMAX), then all observations of that reflection are omitted from the scaling. Exclusions are not applied to a Reference run. [Default EXCLUDE SDMIN 3.0]
    These exclusions do not apply to the initial scale calculation (INITIAL MEAN), nor to the output statistics, only to the scaling. The test is only done on fully recorded observations, and against the input standard deviations (i.e. unmodified by SDCORRECTION parameters)
    Subkeys:
    RUN <Nrun>
    defines a run number (previously defined) for these exclusion parameters to apply to: else applies to all runs (this applies to SD, arc and rectangle limits only: the EMAX|EPROB limit applies to all runs)
    EMAX <maximum_E> | EPROB <minimum_probability>
    Define maximum normalized amplitude E allowed: this may be given either as the maximum E-value EMAX for an acentric reflection eg 8 - 10, or as the minimum allowed probability EPROB eg 1e-8 Eprob = exp (- Emax**2). Excluded reflections are listed in the log file, and in the ROGUES file. See R.Read, CCP4 Study Weekend, Sheffield 1999. [Default EMAX 10]. NOEMAX switches this test off
    SDMIN
    minimum sd multiple for inclusion
    SDMAX
    maximum sd multiple for inclusion
    ABSMAX
    maximum absolute value i.e. observations are excluded if:-
                    I  .lt. sd(I) * SDMIN
            .or.    I  .gt. sd(I) * SDMAX
            .or.    I  .gt. ABSMAX
    
    ARC
    defines an area of detector coordinates (XDET, YDET) to be excluded from all calculations, both scaling and merging, as a circular arc. Data are excluded either INSIDE (lower radius) or OUTSIDE (higher radius) the arc. The arc is defined by fitting a circle to the coordinates of 3 or more points: points 1 (X1,Y1) and 2 (X2,Y2) define the ends of the arc (in either order). If X1,Y1 = X2,Y2 a complete circle is excluded. A series of arcs may be defined. This option allows for the exclusion of shadows on the detector from eg backstop or cryocooler etc
    RECTANGLE
    defines a rectangular area of detector coordinates (XDET, YDET) to be excluded from all calculations, both scaling and merging. A series of rectangles may be defined.
    BATCH | <b1> <b2> <b3> ... | <b1> TO <b2> |
    Define a list of batches, or a range of batches, to be excluded altogether.
    CRYSTAL <crystal_name>
    Define a crystal name to be excluded altogether. This would usually be used in conjunction with the DATASET subkey. Crystal names are not defined at present, so this option is not very useful.
    DATASET <dataset_name>
    Define a dataset name to be excluded altogether. A crystal name may be combined with the dataset name using the syntax <crystal_name>/<dataset_name>. The dataset names used here are those present in the input file, not those assigned or altered by the NAME command.

    [UN]TIE [SURFACE [<Sd_srf>]] [BFACTOR [<Sd_bfac>]][A1 [<Sd_a1>]][ROTATION [<Sd_z>]][DETECTOR [<Sd_xy>]]

    Apply or remove restraints to parameters. These can be pairs of neighbouring scale factors on rotation axis (ROTATION = primary beam) or in detector plane (DETECTOR = secondary beam) to have the same value, or neighbouring Bfactors, or surface spherical harmonic parameters to zero (for SECONDARY or SURFACE corrections, to keep the correction approximately spherical), with a standard deviation as given. This may be used if scales are varying too wildly, particularly in the detector plane. The default is no restraints on scales. A tie is recommended (a) if scales are varied across the detector, eg TIE DETECTOR 0.1, or (b) for SECONDARY or SURFACE corrections, eg TIE SURFACE 0.001

    UNTIE may be used to remove the default restraints on SURFACE and A1 (not recommended)

    SURFACE: tie surface parameters to spherical surface [default is TIE SURFACE 0.001]
    BFACTOR: tie Bfactors along rotation
    A1: tie TAILS parameter A1 to starting value, ie that given on the SCALES command [default is TIE A1 4]
    ROTATION: tie parameters along rotation axis (mainly useful with BATCH mode)
    DETECTOR: tie parameters on detector

    NORMALISE [SCALES|BFACTOR] [BEST|FIRST|RUN <run_number>]

    Controls which scale factors and Bfactors are "normalised", ie set to 1.0 or 0.0. The overall scale of the data is indeterminate, so one scale factor needs to be set = 1.0: similarly, one relative B-factor needs to be set = 0.0. The default options are to normalise scales on the first part of the first run, and Bfactors on the best part (ie to make all the Bfactors negative: because of the smoothing they may still go slightly positive). The normalisation of the scales is not important, but the normalisation of Bfactors is, because negative Bs will sharpen data, while positive Bs will blur it.

    SCALES
    Following keywords apply to scales
    BFACTORS
    Following keywords apply to Bfactors [default]
    BEST
    Normalise B-factors on the best bit (not applicable to scales) [default for Bfactors]
    FIRST
    Normalise on the beginning of the first run [default for scales]
    RUN <run_number>
    Normalise on the beginning of the defined run

    OUTPUT <subkeywords>

    Control what goes in the output file. Three types of output MTZ file may be produced: (a) AVERAGE, average intensity for each hkl (I+ & I-). (b) SEPARATE, observations from input file with scale calculated, for re-input to Scala (or Postref, see POSTREF option) (c) UNMERGED, unaveraged observations, but with scales applied, partials summed or scaled, and outliers rejected.

    A reference batch is always excluded from the final statistics, even if it is included in the output file (only possible with the SEPARATE option).

    File format options:
    NONE
    no output file written
    AVERAGE
    [default] output averaged intensities, <I+> & <I-> for each hkl
    SEPARATE
    output observations as input, but with added columns for SCALE etc. This file may be reinput to Scala for further scaling (e.g. with a different scaling model)
    POSTREF
    append columns for Postref. This option implies SEPARATE. The added columns are IMEAN SIGIMEAN ISUM SIGISUM IMEAN mean of fully-recorded reflections ISUM summed partials (partials only)
    UNMERGED
    apply scales, sum or scale partials, reject outliers, but do not average observations
    POLISH
    Write reflections also to a formatted file as well as the MTZ file (logical name SCALEPACK) in some obscure format as written by "scalepack" (or my best approximation to it). Why would anyone want to do this? If the UNMERGED option is also selected, then the output matches the scalepack "output nomerge original index", otherwise it is the "normal" scalepack output, with either I, sigI or I+ sigI+, I-, sigI-, depending on the "anomalous" flag.
    Dataset options (only relevent for multiple datasets):
    SPLIT
    If there are multiple datasets defined, split them into separate output files [this is the default]. The base filename is taken from the HKLOUT, with the datasetname added for each dataset.
    TOGETHER
    NOT YET IMPLEMENTED. Write out multiple datasets into the same file, but labelled as different datasets

    Other options:

    (a) UNMERGED options:
    ORIGINAL
    write original indices hkl: M/ISYM = 1 for all reflections
    REDUCED
    [default] hkl indices are reduced to asymmetric unit, as in input file
    BEAMS
    output direction cosines of incident (s0) and diffracted (s2) beams in output file (columns S0X, S0Y, S0Z, S2X, S2Y, S2Z). These vectors are in the orthogonalised crystal frame with x,y,z axes along a*, c x a*, c
    (b) SEPARATE (POSTREF) options
    the following apply only to the SEPARATE (POSTREF) option, and must not precede that switch:-
    REFERENCE
    write reference batch (if present) to output file
    NOREFERENCE
    [default] omit reference batch (if present) from output file
    KEEP
    [default unless average] keep reflections outside resolution limits. The SCALE column will be set = 0.0
    KEEP SCALE
    keep reflections outside resolution limits, and calculate scales for them. This is dangerous unless the proportion of reflections omitted from scaling is small
    EXCLUDE
    [default if AVERAGE] exclude reflections outside resolution limits
    OMIT OUTLIERS
    omit rejected outliers from output file (SEPARATE & POSTREF options only). In this case a ROGUES file is written (see below) [default keep them in, but flagged in the FLAG column]
    OMIT PARTIALS [RUN <Nrun>]
    omit partially recorded reflections from output file. If no run number is given, then it applies to all runs. Multiple runs may specified on successive OUTPUT OMIT PARTIALS RUN commands
    ROGUES
    write a list of rejected reflections is written to the file ROGUES. This may be assigned on the command line. A ROGUES file is always written for the AVERAGE & UNMERGED options. [for SEPARATE, default no ROGUES file written unless OMIT OUTLIERS option used]

    FINAL [ NONE | FULLS | ONLYFULLS

    | SCALE_PARTIAL <Minimum_fraction>
    | PARTIALS [[NO]CHECK] | [NO]TEST [<lower_limit> <upper_limit>] [CORRECT <minimum_fraction>] [[NO]GAP] [MAXWIDTH <maximum_width>] ]

    Select whether or not to use summed or scaled partials in the final analysis after scale determination. If this command is missing, summed partials will be included if the input file contains a FRACTIONCALC column.

    Subkeys:
    NONE
    no final analysis/output pass
    FULLS
    use fulls only (& previously summed partials, eg from MOSFLM ADDPART or Scalepack) [default if no FRACTIONCALC column]
    ONLYFULLS
    use fulls only: exclude previously summed partials (from MOSFLM)
    SCALE_PARTIALS
    use scaled partials greater than <Minimum_fraction> in the merging. Only use this if the FRACTIONCALC column contains a good estimate of the partiality.
    PARTIALS
    use summed partials in final analysis (if present). See introduction above for a description of the use of partially recorded reflections. [this is the default if FRACTIONCALC column is present] The following flags are qualifiers of PARTIALS and will override those given on a previous PARTIALS command, for the merging step only (not scaling):
    [NO]CHECK
    do [not] check for consistency of MPART flags (if present). Reflections failing this test are tested for total fraction (see TEST option) [default do if MPART is present]
    [NO]TEST [<lower_limit> <upper_limit>]
    do [not] accept partials only if total fraction (from FRACTIONCALC column) is in range lower_limit -> upper_limit [default if no MPART flag, limits 0.95, 1.05]
    CORRECT [<minimum_fraction>]
    Scale partials in range minimum_fraction -> lower_limit, predicted total fraction (needs reliable FRACTIONCALC) [default minimum = <lower_limit>]
    [NO]GAP
    do [not] accept partials with a gap in, e.g. a partial over 3 parts with the middle one missing. GAP implies NOCHECK and TEST: CORRECT may also be set. [default GAP]
    MAXWIDTH <maximum_width>
    maximum number of parts for an acceptable summed partial

    [UN]FIX [V] [A0] [A1]

    Option to fix or free TAILS parameters: by default V & A1 are free, A0 is fixed [default A0 = 0.0]. Fixing A1 may help for low resolution data particularly.

    LINK [SURFACE|TAILS] ALL | <run_2> TO <run_1>

    run_2 will use the same SURFACE (or SECONDARY) or TAILS parameters as run_1. This can be useful when different runs come from the same crystal, and may stabilize the parameters. LINK TAILS ALL will use the same tails parameters for all runs for which TAILS parameters are refined. The keyword ALL will be assumed if omitted.

    • For TAILS parameters, the default is LINK TAILS ALL, but any LINK or UNLINK command will override this.
    • For SECONDARY or SURFACE parameters, the default is to link runs which come from the same dataset. They should be UNLINKed if they are different.

    UNLINK [SURFACE|TAILS] ALL | <run_2> TO <run_1>

    Remove links set by LINK command (or by default). The keyword ALL will be assumed if omitted, e.g. UNLINK TAILS [ALL] will use seperate tails parameters for each run.

    SKIP <N_skip> [[FOR] <N_skip_cycles>]

    Allow a subset of reflections to be used during the initial cycles of scaling, to speed up the program. For the first N_skip_cycles, only every N_skip'th unique reflection will be used. N_skip_cycles defaults = Ncycle-2, and the program will force 2 more cycles with all data if convergence is reached while reflections are still being skipped. You should check that convergence has been reached with all observations, particularly if the number of observations used in the early cycles is small.

    FILTER <Filter> [<Damp>]

    Define filter level, & damp level. In the minimization, shifts corresponding to eigenvalues .lt. <Filter> are removed, <Damp> is added to all eigenvalues. [Default 1.0e-6, 0.0]

    DAMP [NONE] | <Damp> <NcycDamp>

    Set damping level for shifts. <Damp> is added to all eigenvalues for the first <NcycDamp> cycles. This may be useful if the scales vary over a wide range, particularly if the scale refinement diverges at first, but is not normally recommended, as it seems to slow convergence. Default is DAMP NONE. If <NcycDamp> is omitted, the damping applies to all cycles

    BINS <Nsrange>

    Define number of resolution bins for analysis [default 10]

    XYBINS <Nx> [<Ny>]

    Define number of bins across detector, x (=XDET) and y (YDET). Only used if XDET, YDET columns are present in input file <Ny> defaults to <Nx>. XYBINS 0 turns off analysis [default Nx = Ny = 20]

    SMOOTHING <subkeyword> <value>

    Set smoothing factors ("variances" of weights). A larger "variance" leads to greater smoothing

    Subkeys:
    TIME <Vt>
    smoothing of B-factors [default 0.5]
    ROTATION <Vz>
    smoothing of scale along rotation [default 1.0]
    DETECTOR <Vxy>
    smoothing of scale on detector [default 1.0]
    PROB_LIMIT <DelMax_t> <DelMax_z> <DelMax_xy>
    maximum values of normalized squared deviation (del**2/V) to include a scale [default set automatically, typically 3.0]

    INSCALE OFF | ON

    Switch OFF or ON application of an input SCALE column. By default, if the input file contains a column called SCALE (e.g. from a previous run of Scala), it will be applied.

    NOSCALE

    Don't do any scaling, just the final analysis (equivalent to CYCLES 0)

    DUMP [<Scale_file_name>]

    Dump all scale factors to a file after each cycle. These can be used to restart scaling using the RESTORE option, or for rerunning the merge step. If no filename is given, the scales will be written to logical file SCALES, which may be assigned on the command line. DUMP is set by default, but may be turned off with the NODUMP command.

    NODUMP

    No dump of scales to file. Default is DUMP.

    ANALYSE [[NO]NORMAL] [[NO]PLOT] [MAXDENSITY <maximum point density>]

    This command controls the normal probability analyses

    Subkeys:
    [NO]NORMAL
    do [not] do normal probability analyses [default do them]
    [NO]PLOT
    do [not] write normal probability plot to output file with logical name DELTA [default do write file]. This file contains pairs of delta(expected), delta(observed) for fulls, then summed partials, then scaled partials
    MAXDENSITY
    maximum point density for normal probability plot. This plot includes a point for every observation, so in large datasets it can get very big. This parameter allows the sampling of the plot, so that in the central crowded part only some of the points are included in the plotfile [default 25]

    HISTORY <history line>

    Define optional line to be added to the history records in the file. This is in addition to a line giving the date and time of the run, which is always added. Only one optional history line may be added.

    OVERLAPMAP

    Write the overlap matrix from the initial analysis to a map file assigned to MAPOUT. Note that the initial analysis is not done if the RESTORE option is used or INITIAL NONE is set.

    WIDTH WILSON | LINEAR | SQUARE [<mid-point>]

    Select binning mode on intensity

    Subkeys:
    WILSON
    [default] exponential bins
    LINEAR
    linear bins
    SQUARE
    quadratic bins

    In each case, <mid-point> is the limit for the middle bin.

    NAME [RUN <RunNumber(s)>] PROJECT <project_name> CRYSTAL <crystal_name> DATASET <dataset_name>

    Assign or reassign project/crystal/dataset names, for output file. The names given here supersede those in the input file.

    If the RUN subkey is present, different runs (or groups of runs) may be assigned to different datasets. If the RUN subkey is omitted, the names apply to all data. RunNumber may be a list or a range of run numbers (see examples below). DATASET must be present: if PROJECT or CRYSTAL are omitted, they take the value last given for these parameters. DATASET may optionally be given in the syntax crystal_name/dataset_name

    Examples:

    name run 1      project  Lysozyme crystal  Native dataset L1
    name run 2 3         dataset  L2  #  takes project & crystal from previous line
    name run 4 to 6    crystal Native  dataset L3
    

    BASE [CRYSTAL <crystal_name>] DATASET <base_dataset_name>

    If there are multiple datasets in the input file, define the "base" dataset for analysis of dispersive (isomorphous) differences. Differences between other datasets and the base dataset are analysed for correlation and ratios, ie for the i'th dataset (I(i) - I(base)). By default, the datasets with the shortest wavelength will be chosen as the base (or dataset 1 if wavelength is unknown). Typically, the CRYSTAL keyword may be omitted.

    PNAME <project_name>

    OBSOLETE keyword: use NAME instead.
    Project Name. In most cases, this will be inherited from the MTZ file.
    A dataset, as listed in the MTZ header, is specified by a project-name/dataset-name pair. The project-name specifies a particular structure solution project, while the dataset-name specifies a particular dataset contributing to the structure solution. An entry in the PNAME keyword should therefore be accompanied by a corresponding entry in the DNAME keyword.

    DNAME <dataset_name>

    OBSOLETE keyword: use NAME instead.
    Dataset Name. In most cases, this will be inherited from the MTZ file.
    A dataset, as listed in the MTZ header, is specified by a project-name/dataset-name pair. The project-name specifies a particular structure solution project, while the dataset-name specifies a particular dataset contributing to the structure solution. An entry in the DNAME keyword should therefore be accompanied by a corresponding entry in the PNAME keyword.

    PRIVATE

    Set the directory permissions to '700', i.e. read/write/execute for the user only (default '755').

    USECWD

    Write the deposit file to the current directory, rather than a subdirectory of $HARVESTHOME. This can be used to send deposit files from speculative runs to the local directory rather than the official project directory, or can be used when the program is being run on a machine without access to the directory $HARVESTHOME.

    RSIZE <row_length>

    Maximum width of a row in the deposit file (default 80). <row_length> should be between 80 and 132 characters.

    NOHARVEST

    Do not write out a deposit file; default is to do so provided Project and Dataset names are available.

    INPUT AND OUTPUT FILES

    Input

    HKLIN
    The input file must be sorted on H K L M/ISYM BATCH

    Compulsory columns:

            H K L           indices
            M/ISYM          partial flag, symmetry number
            BATCH           batch number
            I               intensity  (integrated intensity)
            SIGI            sd(intensity)   (integrated intensity)
    

    Optional columns:

            XDET YDET       position on detector of this reflection: these
                            may be in any units (e.g. mm or pixels), but the
                            range of values must be specified in the
                            orientation data block for each batch. If
                            these columns are absent, the scale may not be
                            varied across the detector (i.e. only SCALES
                            DETECTOR 1 is valid)
            ROT             rotation angle of this reflection ("Phi"). If
                            this column is absent, only SCALES BATCH is valid.
            IPR             intensity  (profile-fitted intensity)     
            SIGIPR          sd(intensity)   (profile-fitted intensity)
            SCALE           previously calculated scale factor (e.g. from
                            previous run of Scala). This will be applied
                            on input
            SIGSCALE        sd(SCALE)
            TIME            time for B-factor variation (if this is
                            missing, ROT is used instead)
            MPART           partial flag from Mosflm
            FRACTIONCALC    calculated fraction, required to SCALE PARTIALS
            LP              Lorentz/polarization correction (already applied)
            FLAG            error flag (packed bits)
                            for now, pending proper implementation, if this column
                            is present, observations with a  non-zero FLAG will be
                            unconditionally omitted
    

    Output

    HKLOUT
    (a) Option AVERAGE
    The output file contains columns
    H K L  IMEAN SIGIMEAN  I(+) SIGI(+)  I(-) SIGI(-)
    
    

    Note that there are no M/ISYM or BATCH columns. I(+) & I(-) are the means of the Bijvoet positive and negative reflections respectively and are always present even for the option ANOMALOUS OFF.

    (b) Option SEPARATE
    The output file contains the same columns as the input, with some columns added if not previously present:-

    SCALE & SIGSCALE - the calculated scale factor & its sd (this may be applied in another run of Scala). SCALE will be = 0.0 for reflections outside the resolution cutoff, if they are included in the output file (option OUTPUT KEEP) (see example)

    SIGIC [, SIGIPRC] - the corrected standard deviations of I [and IPR], as altered by SDCORR commands. These columns are only written if a SDCORRECTION command is given to Scala.

    If the OUTPUT POSTREF option is given, then also the columns IMEAN SIGIMEAN ISUM SIGISUM are added

            IMEAN    mean of fully-recorded reflections
            ISUM     summed partials (partials only)
    
    
    (c) Option UNMERGED
    As for SEPARATE, but with scales applied, with no partials (i.e. partials have been summed or scaled, unmatched partials removed), & outliers rejected. If a separate profile-fitted intensity column IPR, SIGIPR is present in the input file as well as columns I, SIGI, only one set will be chosen, as specified. Columns defining the diffraction geometry (e.g. XDET YDET ROT TIME LP FRACTIONCALC) will be preserved in the output file.

    Output columns:

            H,K,L     REDUCED or ORIGINAL indices (see OUTPUT options)
            M/ISYM    Symmetry number (REDUCED), = 1 for ORIGINAL indices
            BATCH     batch number as for input
            I, SIGI   scaled intensity & sd(I)
            SCALE     scale factor applied
            SIGSCALE  sd(SCALE)
            NPART     number of parts, = 1 for fulls, negated for scaled
                       partials, i.e. = -1 for scaled single part partial
            TIME      copied from input if present
            XDET,YDET copied from input if present
            ROT       copied from input if present (averaged for
                        multi-part partials)
            FRACTIONCALC total fraction (if present in input file)
            LP        copied from input if present
       If BEAM option is used:-
            S0X, S0Y, S0Z  direction cosines of incident beam in
                      orthogonalised crystal frame ( x,y,z axes along
                      a*, c x a*, c)
            S2X, S2Y, S2Z  direction cosines of diffracted beam in
                      orthogonalised crystal frame
    
    SCALES
    scale factors from DUMP, used by RESTORE option
    ROGUES
    list of bad agreements
    PLOT
    If SCALES SECONDARY or SURFACE options are used, graph of correction surface (Plot84 format)
    NORMPLOT
    normal probability plot from merge stage
    *** this is at present written is a format for plotting program xmgr ***
    ANOMPLOT
    normal probability plot of anomalous differences
                (I+ - I-)/sqrt[sd(I+)**2 + sd(I-)**2]
    

    *** this is at present written is a format for plotting program xmgr ***

    SCALEPACK
    Formatted output selected by the command OUTPUT POLISH

    EXAMPLES

    1. Simple smoothed scaling, with some alternatives flagged as #*#
    2. set crystal = "tfn2"
      scala hklin     ${crystal}_srs  \
            hklout    ${crystal}_merge \
            scales    ${crystal}_${run}.scales \
            rogues    ${crystal}_${run}.rogues \
            normplot  ${crystal}_${run}.norm \
                  << eof 
      
      run  1 all
      
      intensities partial     # we have few fulls: this is the default
      
      cycles 20
      
      anomalous off           # this is a native set
      #*# anomalous on        #   or a derivative
      
      sdcorrection 1.3 0.02   # from a previous run
      
      # try it with and without the tails correction: this is with tails
      scales   rotation spacing 10  bfactor on    tails
      #*#
      #*#  Some alternatives
      #*# >> Recommended usual case
      #*# scales rotation spacing 5 secondary 6 bfactor off tails
      #*#
      #*# >> If you have radiation damage, you need a Bfactor, 
      #*# >>  but a Bfactor at coarser intervals is more stable
      #*# scales  rotation spacing 5 secondary 6  tails \
      #*#    bfactor on brotation spacing 20
      #*# tie bfactor 0.5     ##  restarining the Bfactor also helps
      #*#
      
      
      reject 4              # reject outliers more than 4sd from mean
      #*# reject 6 all 8  is default
      
      exclude emax 8        # reject very large observations
                            #    default is Emax 10
      
      eof
      
    3. Simple Batch scaling
    4. #!/bin/csh -f
      #
      # Scale data from Mosflm, merge with Scala
      #
      scala hklin jpa_example hklout jpa_example_sc \
            scales   jpa.scales \
            rogues   jpa.rogues \
            normplot jpa.norm \
            anomplot jpa.anom \
      << eof-1
      run 1 batch 2001 to 2049
      run 2 batch 2051 to 2100
      cycles 8
      sdcorr  1.5  0.03
      scales batch  bfactor on    # batch scaling is generally poorer than smoothed 
      reject merge 4
      anomalous on
      eof-1
      
    5. A more complicated example, smooth scaling of native, then scaling of derivative to native
    6. #!/bin/csh -f
      #
      #scala
      #
      cd /scr0/fm1/Temp
      #
      ##
      #==== Sort native output from Mosflm together
      ##
      sort:
      sortmtz hklout m6c8_sort.mtz  << end_sort
      H K L M/ISYM BATCH I SIGI
      m6c8a1.mtz
      m6c8a2.mtz
      end_sort
      #
      ##
      #==== scale native data together, no Bfactor, smooth scale on rotation
      #==== merge native
      ##
      scala hklin m6c8_sort.mtz hklout m6c8_scala <<EOF
      run 1 batch 1 to 90000
      title frozen native monoclinic m6c8 
      scales bfactor off  rotation spacing 5
      resolution 25 6.1
      anomalous off
      reject merge  4
      sdcorr  1.3  0.04
      EOF
      #
      # Convert native data into form suitable for reinput to Scala
      combat  hklin m6c8_scala hklout m6c8_r << eof-r
      input mtzi
      labin I=IMEAN SIGI=SIGIMEAN
      batch 1
      eof-r
      #
      ##
      #==== Sort derivative data together
      ##
      sort:
      sortmtz hklout m6cb3_sort.mtz  << end_sort
      H K L M/ISYM BATCH I SIGI
      m6cb3b.mtz
      m6cb3c.mtz
      end_sort
      #
      ##
      #==== Combine together merged native & sorted derivative data, by
      #     interleaving reflection records
      #     Must resort data after this step
      ##
      mtzutils:
      mtzutils hklin2 m6cb3_sort.mtz \
               hklin1 m6c8_r \
               hklout temp_m6cb3_resort << eof-m
      merge
      eof-m
      #
      sortmtz hklin temp_m6cb3_resort hklout m6cb3_resort << eof-m
      H K L M/ISYM BATCH
      eof-m
      #
      ##
      #==== Scale and merge derivative data, using native data as reference (run 1)
      #     Use secondary beam absorption correction for derivative,
      #       but with some restraints (tie)
      #     The reference data (native) is omitted from the output file
      ##
      scala hklin m6cb3_resort.mtz hklout m6cb3_scala \
        scales    m6cb3.scales \
        rogues    m6cb3.rogues \
        normplot  m6cb3.norm   \
        anomplot  m6cb3.anom   \
        plot      m6cb3.plt    \
       <<EOF
      run 1 batch 1 reference
      run 2 batches 10 to 23156 exclude 23152          #  reject one duff batch
      run 3 batches 23157 to 90000
      title frozen native monoclinic m6cb3 
      scales bfactor off  rotation spacing 5 secondary 6
      tie surface 0.001  # this is the defeult value anyway
      resolution 25 2.5
      reject merge  4
      anomalous on
      sdcorr  1.1  0.005
      EOF
      #
      #
      #
      #exit
      trunc:
      truncate  hklin  m6cb3_scala \
                hklout /ss3/fm1/Mutase/Derivs_FzM/m6cb3_F <<end-trunc
      anomalous yes
      resolution  25 2.5
      nresidue   1400
      labout  F=FM623 SIGF=SIGFM623 DANO=DANOM623 SIGDANO=SIGDANOM623
      end-trunc
      
    7. Scaling of several MAD datasets together, no reference dataset
    8. #!/bin/csh -f
      
      # Define a base name for files created in this script
      set name = dfxe_3d
      set project = dfxe
      set crystal = crys1
      
      # Input filenames for the 4 datasets at different wavelengths
      set l1 = dfxe_1   # peak
      set l2 = dfxe_2   # inflection
      set l3 = dfxe_3   # hard remote
      set l4 = dfxe_4   # 1A wavelength
      
      set nl1 = peak
      set nl2 = inflect
      set nl3 = highE
      set nl4 = lowE
      
      # Angular spacing for smoothed scales
      set spacing = 5
      
      # Sort together the initial data files
      sortmtz hklout ${name}_all << eof-s
      H K L M/ISYM BATCH
      ${l1}.mtz
      ${l2}.mtz
      ${l3}.mtz
      ${l4}.mtz
      eof-s
      
      
      ###=== Step 1 ==========================================================
      ###===    Scale all datasets together
      ###===    This will write out 4 output files, with filenames constructed
      ###===    by appending the dataset name on to the hklout name
      scale_1:
      set run = all
      scala hklin ${name}_all  hklout ${name} \
            scales   ${run}.scales \
            normplot ${run}.norm \
            anomplot ${run}.anom \
            rogues   ${run}.rogues \
                                            << eof-r1
      title Scale all datasets together, smooth, secondary
      #  Define runs
      run 1 batch 1000 to 1999
      run 2 batch 2000 to 2999
      run 3 batch 3000 to 3999
      run 4 batch 4000 to 4999
      
      #  Define datasets: this should have been done in Mosflm previously
      name  run 1   project ${project} crystal ${crystal} dataset ${nl1} # peak
      name  run 2   project ${project} crystal ${crystal} dataset ${nl2} # inflection
      name  run 3   project ${project} crystal ${crystal} dataset ${nl3} # highE
      name  run 4   project ${project} crystal ${crystal} dataset ${nl4} # lowE
      
      # Dispersive differences for analysis are relative to the "base" dataset
      base dataset highE
      
      
      # If using secondary beam correction, usually turn Bfactor off
      # unless you have high resolution and radiation damage 
      
      scales rotation spacing ${spacing}  bfactor off secondary 6
      tie surface 0.001    # this is the default restraint to keep the
                           # absorption surface spherical
      
      anomalous on
      
      # reject on 5 sigma within the I+ or I- sets, 8 sigma between I+ & I-
      reject 5 all 8
      
      eof-r1
      
      eof-${run}
      
      ###===   Convert I to F, do Wilson plot, for each dataset
      ###===   A future change to Truncate may allow processing of multiple
      ###===   datasets together 
      l1:
      set ln = ${nl1}
      truncate hklin ${name}_${ln} hklout ${name}_${ln}_f << eof_t${ln}
      nresidues 117
      ranges 30
      labout F=F${ln} SIGF=SIGF${ln} \
         DANO=DANO${ln}  SIGDANO=SIGDANO${ln}  ISYM=ISYM${ln} \
         F(+)=F${ln}(+) SIGF(+)=SIGF${ln}(+)  F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
      eof_t${ln}
      
      l2:
      set ln = ${nl2}
      truncate hklin ${name}_${ln} hklout ${name}_${ln}_f << eof_t${ln}
      nresidues 117
      ranges 30
      labout F=F${ln} SIGF=SIGF${ln} \
         DANO=DANO${ln}  SIGDANO=SIGDANO${ln}  ISYM=ISYM${ln} \
         F(+)=F${ln}(+) SIGF(+)=SIGF${ln}(+)  F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
      eof_t${ln}
      
      l3:
      set ln = ${nl3}
      truncate hklin ${name}_${ln} hklout ${name}_${ln}_f << eof_t${ln}
      nresidues 117
      ranges 30
      labout F=F${ln} SIGF=SIGF${ln} \
         DANO=DANO${ln}  SIGDANO=SIGDANO${ln}  ISYM=ISYM${ln} \
         F(+)=F${ln}(+) SIGF(+)=SIGF${ln}(+)  F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
      eof_t${ln}
      
      l4:
      set ln = ${nl4}
      truncate hklin ${name}_${ln} hklout ${name}_${ln}_f << eof_t${ln}
      nresidues 117
      ranges 30
      labout F=F${ln} SIGF=SIGF${ln} \
         DANO=DANO${ln}  SIGDANO=SIGDANO${ln}  ISYM=ISYM${ln} \
         F(+)=F${ln}(+) SIGF(+)=SIGF${ln}(+)  F(-)=F${ln}(-) SIGF(-)=SIGF${ln}(-)
      eof_t${ln}
      
      ###===   Sort together merged data for all wavelength, outputting a 
      ###===   single record for each hkl
      ###===   For each wavelength, store amplitude F & sigF, 
      ###===   anomalous difference DANO (= F+ - F-) & sigDANO,
      ###===   and ISYM flag which shows if both F+ & F- were measured
      cad  hklout ${name}_fcad  \
           hklin1 ${name}_${nl1}_f \
           hklin2 ${name}_${nl2}_f \
           hklin3 ${name}_${nl3}_f \
           hklin4 ${name}_${nl4}_f       << eof-c
      labin  file_number 1  \
        E1=F${nl1} E2=SIGF${nl1} E3=DANO${nl1} E4=SIGDANO${nl1} E5=ISYM${nl1} \
        E6=F${nl1}(+) E7=SIGF${nl1}(+) E8=F${nl1}(-) E9=SIGF${nl1}(-)
      labin  file_number 2  \
        E1=F${nl2} E2=SIGF${nl2} E3=DANO${nl2} E4=SIGDANO${nl2} E5=ISYM${nl2} \
        E6=F${nl2}(+) E7=SIGF${nl2}(+) E8=F${nl2}(-) E9=SIGF${nl2}(-)
      labin  file_number 3  \
        E1=F${nl3} E2=SIGF${nl3} E3=DANO${nl3} E4=SIGDANO${nl3} E5=ISYM${nl3} \
        E6=F${nl3}(+) E7=SIGF${nl3}(+) E8=F${nl3}(-) E9=SIGF${nl3}(-)
      labin  file_number 4  \
        E1=F${nl4} E2=SIGF${nl4} E3=DANO${nl4} E4=SIGDANO${nl4} E5=ISYM${nl4} \
        E6=F${nl4}(+) E7=SIGF${nl4}(+) E8=F${nl4}(-) E9=SIGF${nl4}(-)
      eof-c
      

    REFERENCES

    1. W. Kabsch, J.Appl.Cryst. 21, 916-924 (1988)
    2. P.R.Evans, "Data reduction", Proceedings of CCP4 Study Weekend, 1993, on Data Collection & Processing, pages 114-122
    3. P.R.Evans, "Scaling of MAD Data", Proceedings of CCP4 Study Weekend, 1997, on Recent Advances in Phasing, Click here
    4. R.Read, "Outlier rejection", Proceedings of CCP4 Study Weekend, 1999, on Data Collection & Processing
    5. Hamilton, Rollett & Sparks, Acta Cryst. 18, 129-130 (1965)
    6. Blessing, R.H., Acta Cryst. A51, 33-38 (1995)
    7. Kay Diederichs & P. Andrew Karplus,"Improved R-factors for diffraction data analysis in macromolecular crystallography", Nature Structural Biology, 4, 269-275 (1997)
    8. Manfred Weiss & Rolf Hilgenfeld, "On the use of the merging R factor as a quality indicator for X-ray data", J.Appl.Cryst. 30, 203-205 (1997)
    9. Manfred Weiss, "Global Indicators of X-ray data quality" J.Appl.Cryst. 34, 130-135 (2001)

    Appendix 1: Partially recorded reflections

    Partially recorded reflections may optionally be used in scaling (controlled by the command INTENSITIES), and in the final analysis (controlled by the command FINAL). The default is to include summed partials in both scaling and the final analysis and merging.

    Different options for the treatment of partials are set for both scaling & merging stages by the PARTIALS command, or separately for the scaling stage (INTENSITIES command) and the merging stage (FINAL command). Partials may either be summed (subkeyword PARTIALS, with various options), or scaled (subkeyword SCALE_PARTIALS): in the latter case, each part is treated independently of the others. If summed partials are used in scaling with the SCALES BATCH option, the FRACTIONCALC is used to partition the effects of the different scales for the two halves. In the input file, partials are flagged with M=1 in the M/ISYM column, and have a calculated fraction in the FRACTIONCALC column. Data from Mosflm also has a column MPART which enumerates each part (e.g. for a reflection predicted to run over 3 images, the 3 parts are labelled 31, 32, 33), allowing a check that all parts have been found: MPART = 10 for partials already summed in MOSFLM.

    For datasets with few partials, with low mosaicity compared to the image widths, very few partials run over more than two images, & partial summation is not usually a problem. If you have many partials running over 3 or more images, you may need to tune the partial selection flags below to accept or reject partial sets according to their reliability.

    Summed partials:
    All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The options to use partials as well as fulls are defined separately for the scaling and merging steps on the INTENSITIES and FINAL commands. The parameters for the checks are set by the PARTIALS command for both stages, or separately on the INTENSITIES and FINAL commands. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.

    (a)
    At least two parts must be present (unless the CORRECT option is set, see (e) below)
    (b)
    not more than MAXWIDTH <maximum_width> parts must be present [default maximum_width = 5]
    (c)
    if the CHECK option is set (the default if an MPART column is present), the MPART flags are examined. If they are consistent, the summed intensity is accepted. If they are inconsistent (quite common), the total fraction is checked unless NOTEST is specified, in which case they are rejected. NOCHECK switches off this check.
    (d)
    if the TEST option is set (default if no MPART column), the summed reflection is accepted if the total fraction (the sum of the FRACTIONCALC values) lies between <lower_limit> -> <upper_limit> [default limits = 0.95 1.2]
    (e)
    if the CORRECT option is set, the total intensity is scaled by the inverse total fraction for total fractions between <minimum_fraction> to <lower_limit>. This works also for a single unmatched partial. As for the scaled partial option, this correction relies on accurate FRACTIONCALC values, so beware.
    (f)
    if the GAP option is set, partials with a gap in are accepted, e.g. a partial over 3 parts with the middle one missing. The GAP option implies TEST & NOCHECK, & the CORRECT option may also be set.

    By setting the TEST & CORRECT limits, you can control summation & scaling of partials, e.g .

          TEST 1.2 1.2 CORRECT 0.5 
    

    will scale up all partials with a total fraction between 0.5 & 1.2

          TEST 0.95 1.05           
    

    will accept summed partials 0.95->1.05, no scaling

          TEST 0.95 1.05 CORRECT 0.4  
    

    will accept summed partials 0.95->1.05, and scale up those with fractions between 0.4 & 0.95

    Note that a profile-fitted intensity, if present in the file as a separate IPR column, will not be used for a scaled partial, unless the PARTIALS USE_PROFILE flag is set.

    Scaled partials:
    In this option, each individual partial observation scaled up by the inverse FRACTIONCALC, provided that the fraction is greater than <minimum_fraction> [default = 0.5].


    Appendix 2: Scaling algorithm

    For each reflection h, we have a number of observations Ihl, with estimated standard deviation shl, which defines a weight whl. We need to determine the inverse scale factor ghl to put each observation on a common scale (as Ihl/ghl). This is done by minimizing

     
            Sum( whl * ( Ihl - ghl * Ih )**2 )   Ref Hamilton, Rollett & Sparks
    
    

    where Ih is the current best estimate of the "true" intensity

            Ih = Sum ( whl * ghl * Ihl ) / Sum ( whl * ghl**2)
    
    

    Each observation is assigned to a "run", which corresponds to a set of scale factors. A run would typically consist of a continuous rotation of a crystal about a single axis.

    The inverse scale factor ghl is derived as follows:

            ghl = Thl * Chl * Shl
    
    

    where Thl is an optional relative B-factor contribution, Chl is a scale factor (1-dimensional or 3-dimensional (ie DETECTOR option)), and Shl is a anisotropic correction expressed as spherical harmonics (ie SECONDARY or SURFACE options).

    a) B-factor (optional)

    For each run, a relative B-factor (Bi) is determined at intervals in "time" ("time" is normally defined as rotation angle if no independent time value is available), at positions ti (t1, t2, . . tn). Then for an observation measured at time tl

            B = Sum[i=1,n] ( p(delt) Bi ) / Sum (p(delt))
    
            where   Bi  are the B-factors at time ti
                    delt    = tl - ti
                    p(delt) = exp ( - (delt)**2 / Vt )
                    Vt  is "variance" of weight, & controls the smoothness
                            of interpolation
    
            Thl = exp ( + 2 s B )
                    s = (sin theta / lambda)**2
    
    

    An alternative anisotropic B-factor may be used to correct for anisotropic fall-off of scattering: THIS OPTION IS NOT RECOMMENDED. This is parameterized on the components of the scattering vector (divided by 2 for compatibility with the normal definition of B) in two directions perpendicular to the Xray beam (y & z in the "Cambridge" coordinate frame with x along the beam).

            Thl = exp ( + 2[uy**2 Byy + 2 uy uz Byz + uz**2 Bzz])
    
            where  uy, uz are the components of d*/2
    
    

    Byy, Byz, Bzz are functions of time ti or batch as for the isotropic Bfactor. The principle components of B (Bfac_min, Bfac_max) are also printed.

    b) Scale factors

    For each run, scale factors Cxyz are determined at positions (x,y) on the detector, at intervals on rotation angle z. Then for an observation at position (x0, y0, z0),

            Chl(x0, y0, z0) =
       Sum(z)[p(delz){Sum(xy)[q(delxy)*Cxyz]/Sum(xy)[q(delxy)]}/Sum(z)[p(delz)]
    
    where   delz    = z - z0
            p(delz) = exp(-delz**2/Vz)
            q(delxy)= exp(-((x-x0)**2 + (y-y0)**2)/Vxy)
            Vz, Vxy are the "variances" of the weight & control the smoothness
                    of interpolation
    
    

    For the SCALES BATCH option, the scale along z is discontinuous: the normal option has one scale factor (or set of scale factors across the detector) for each batch. The SLOPE (not recommended) option has two scale factors per batch, with the scale interpolated linearly between the beginning and end according to the rotation angle of the reflection.

    c) Anisotropy factor

    The optional surface or anisotropy factor Shl is expressed as a sum of spherical harmonic terms as a function of the direction of
    (1) the secondary beam (SECONDARY correction) in the camera spindle frame,
    (2) the secondary beam (ABSORPTION correction) in the crystal frame, permuted to put either a*, b* or c* along the spherical polar axis
    or
    (3) the scattering vector in the crystal frame (SURFACE option).

    1. SECONDARY beam direction (camera frame)
               s  =  [Phi] [UB] h
               s2 = s - s0       
               s2' = [-Phi] s2
      Polar coordinates:
               s2' = (x y z)
               PolarTheta = arctan(sqrt(x**2 + y**2)/z)
               PolarPhi   = arctan(y/x)
      
                                   where [Phi] is the spindle rotation matrix
                                         [-Phi] is its inverse
                                         [UB]  is the setting matrix
                                         h = (h k l)
      
    2. ABSORPTION: Secondary beam direction (permuted crystal frame)
               s    = [Phi] [UB] h
               s2   = s - s0       
               s2c' = [-Q] [-U] [-Phi] s2
      Polar coordinates:
               s2' = (x y z)
               PolarTheta = arctan(sqrt(x**2 + y**2)/z)
               PolarPhi   = arctan(y/x)
      
                                   where [Phi] is the spindle rotation matrix
                                         [-Phi] is its inverse
                                         [Q] is a permutation matrix to put
                                             h, k, or l along z (see POLE option)
                                         [U]  is the orientation matrix
                                         [B]  is the orthogonalization matrix
                                         h = (h k l)
      
    3. Scattering vector in crystal frame
      	(x y z) = [Q][B] h
      Polar coordinates:
               PolarTheta = arctan(sqrt(x**2 + y**2)/z)
               PolarPhi   = arctan(y/x)
      
                                   where [Q] is a permutation matrix to put
                                             h, k, or l along z (see POLE option)
                                         [B]  is the orthogonalization matrix
                                         h = (h k l)
      
    then
     Shl = 1  +  Sum[l=1,lmax] Sum[m=-l,+l] Clm  Ylm(PolarTheta,PolarPhi)
    
                                 where Ylm is the spherical harmonic function for
                                           the direction given by the polar angles
                                       Clm are the coefficients determined by
                                           the program
    
    
    Notes:
    • The initial term "1" is essentially the l = 0 term, but with a fixed coefficient.
    • The number of terms = (lmax + 1)**2 - 1
    • Even terms (ie l even) are centrosymmetric, odd terms antisymmetric
    • Restraining all terms to zero (with the TIE SURFACE) reduces the anisotropic correction. This should always be done

    Appendix 3: TAILS correction

    For many crystals, the reflection profile on rotation ("phi") is not a simple closed curve, but has long tails due at least in part to thermal diffuse scattering (TDS): the amount of this depends on the crystal, and is larger at high resolution than at low resolution. If all reflections were scanned through the same angle, then equal amounts of this diffuse scattering would be included in each reflection. However, in typical "coarse sliced" data collection schemes, where the image rotation width is larger than the reflection width, reflections are recorded on a variable number of images, 1, 2, 3 etc, and different amounts of the tails are included in the integrated intensity. This generally leads to a negative "partial bias", increasing with resolution, i.e. the apparent intensities of partially recorded reflections are higher than equivalent fulls.

    The TAILS correction is an attempt to correct for the different truncation of tails, by using a simple (crude) model of thermal diffuse scattering, although the correction only attempts to correct for the different truncation, and does not attempt to correct for diffuse scattering itself.

    Some of the ideas used are based on suggestions by R.H.Blessing, Cryst. Reviews, 1, 3-58 (1987), but he should not be blamed for this.

    This is a brief account of method (see code & comments in subroutine dffscn for more details):-

    1. I = J ( 1 + alpha)
      where J is the Bragg intensity (true intensity) & I is the measured intensity, i.e. the TDS intensity is proportional to the Bragg intensity
    2. alpha = alpha0 + alpha1 * (sin theta / lambda)**2
      where alpha0 & alpha1 are refinable parameters. This is a simple linear isotropic model to the amount of TDS. alpha0 should be 0.0, and may be fixed as such, but allowing it to vary seems to help sometimes. Both alpha0 & alpha1 are reset if they go negative in the refinement. An extension of the model would be to make alpha anisotropic.
    3. each reflection is scanned over an angle DPhi, which is an integral multiple of the image width (Dphi = Nimages * DelPhi). A rotation by DPhi moves the reflection a distance in reciprocal space
    4.         Dq = Dphi * xsi,    
      

      where xsi is the radius from the rotation axis

      If the half width of the reflection (including tails) is v (another refineable parameter), and 2v > Dq, then part of the tails will be truncated.

      Taking a simple model of the shape of the tails as a triangle of base width 2v, height in the middle h (h = J * alpha / v), then the area in the tails (= tail intensity) and the intensity truncated by the restricted scan range can be calculated. Then the corrected ("true") intensity J can be calculated

      For full scan:

              J = I / (1 + alpha)
      

      For truncated scan (missing parts of tails C1 & C2)

              J = I / (1 + alpha*(1 - C1 - C2))
      
    5. because this model is very crude, it seems insufficiently trustworthy to use as a proper correction for TDS. It does however seem reasonable to correct for the different amounts of tails truncation, C1 & C2 ( >= 0.0)
    6. The correction applied is thus

              I' = I * (1 + alpha) / (1 + alpha*(1 - C1 - C2))
      
    7. the parameters refined are v, alpha0 (A0) and alpha1 (A1). By default, the same parameters are used for all runs (see LINK, UNLINK). refinement of the parameters seems often to be unstable. If they are being reset from negative values, try setting A0 = 0.0 (e.g. SCALES . . TAILS 0.005 0.0 30.0) and fixing A0 (FIX A0, this is the default)

    Appendix 4: Data from Denzo

    DENZO is often run refining the cell and orentation angles for each image independently, then postrefinement is done in Scalepack. It is essential that you do this postrefinement. Either then reintegrate the images with the cell parameters fixed, or use unmerged output from scalepack as input to Scala. The DENZO or SCALEPACK outputs will need to be converted to a multi-record MTZ file using COMBAT (see COMBAT documentation).

    Both of these options have some problems

    • If you take the output from Denzo into Scala, there may be problems with partially recorded reflections: it is difficult for Scala to determine reliably that it has all parts of a partial to sum together.
    • If you take unmerged output from scalepack into Scala, most of the geometrical information about how the observations were collected is lost, so many of the scaling options in Scala are not available. Only Batch scaling can be used, but simultaneous scaling of several wavelengths or derivatives may still be useful

    Appendix 5: Outlier algorithm

    The test for outliers is as follows:

    (1)
    if there are 2 observations (left), then
    (a)
    for each observation Ihl, test deviation
            Delta(hl) = |Chi|
             |Ihl - ghl Iother| / sqrt[sigIhl**2 + (ghl*sdIother)**2]
    

    against sdrej2, where Iother = the other observation

    (b)
    if either Delta(hl) > sdrej2, then
    1. in scaling, reject reflection. Or:
    2. in merging,
      1. keep both (default or if KEEP subkey given) or
      2. reject both (subkey REJECT) or
      3. reject larger (subkey LARGER) or
      4. reject smaller (subkey SMALLER).
    (2)
    if there 3 or more observations left, then
    (a)
    for each observation Ihl,
    1. calculate weighted mean of all other observations <I>n-1 & its sd(<I>n-1)
    2. deviation
    3.           Delta(hl) =
             |Ihl - ghl <I>n-1>| / sqrt[sigIhl**2 + (ghl*sd(<I>n-1))**2]
      
    4. find largest deviation max|Delta(hl)|
    5. count number of observations for which |Delta(hl)| .ge. 0 (ngt), & for which |Delta(hl)| .lt. 0 (nlt)
    (b)
    if max|Delta(hl)| > sdrej, then reject one observation, but which one?
    1. if ngt == 1 .or. nlt == 1, then one observation is a long way from the others, and this one is rejected
    2. else reject the one with the worst deviation max|Delta(hl)|
    (3)
    iterate from beginning

    RELEASE NOTES

    Version 3.1.20

    • Fixed bug for files lacking XDET, YDET

    Version 3.1.19

    • Fixed bug for BATCH BFACTOR mode in working out "best" batch

    Version 3.1.18

    • Corrected totals in harvest file

    Version 3.1.15

    • Fixed long-standing bug in resolution limits for analysis, when resolution is cut back from maximum in the MTZ file

    Version 3.1.12

    • Fixed bug in BATCH mode when runs are automatically allocated

    Version 3.1.6-11

    • Default maximum width of partials to 5 degrees
    • Minor syntax changes to keep compilers happy
    • Extend partial bias analysis to case when there are no fulls, correct small bug in previous analysis
    • Correct (again) case of no datasets defined in file

    Version 3.1.5

    • In Project & dataset names, only accept alphanumeric and "-._" characters, change others to "_"
    • fixed bug in dtsstore routine (failed with 5 or more datasets)
    • defaults BASE dataset = 1 if no wavelength information in file

    Version 3.0.N, 3.1.2-4

    • Many changes to handle multiple datasets properly
    • REJECT BYRUN option removed, replaced by COMBINE to do the opposite
    • Analysis of correlation and differences between datasets, for MAD
    • TIE BFACTOR
    • NORMALISE to allow normalisation of Bfactor on best bit
    • Wrap-around of Phi at 360 degrees
    • Fixed bug in anomalous normal probability plot
    • Added SCALES SMOOTH option
    • Automatic assignment of runs
    • TIE A1 option, and auto-fixing of tails parameters
    • OUTPUT BEAMS option
    • EXCLUDE BATCH, DATASET, CRYSTAL options
    • SCALES ABSORPTION option
    • default Bfactor smoothing made less smooth (Vwt = 0.5 instead of 1.0). This seems to improve behaviour (reduces oscillations)
    • Absorption options now use datum setting in orientation block if set (relevent for 3-circle goniostats for ABSORPTION option)

    Version 2.7.6

    • Corrected B-factor in analysis table against batch when it is a function of TIME.

    Version 2.7.5

    • Added EXCLUDE ARC option

    Version 2.7.4

    • Added OUTPUT POLISH option

    Version 2.7.3

    • Bug fix: "output separate" now works again

    Version 2.7.2

    • Correct calculation of completeness, by counting reflections, instead of approximate calculation by volume.

    Version 2.7.1

    • Harvest stuff added by Martyn Winn & Kim Henrick
    • Corrected bug in counting rejections by batch

    Version 2.6.4

    • Set SDADD parameter = 0.02 by default
    • New algorithm to determine initial scales from mean intensities: this should work much better when different runs or batches have very different resolution ranges.

    Version 2.6.3

    • Default to include summed partials in scaling
    • Classify partials for analysis in the batch to which they contribute most, rather than to the first batch they occur in.

    Version 2.6.2

    • Allow extrapolation of RESTORE file to new batches
    • Buffer input echo so that there is a plain text & html form

    Version 2.6.1

    • Fixed pointer bug in FIX

    Version 2.6.0

    • Spherical harmonic expansion of scale factors: SCALES SECONDARY & SCALES SURFACE options

    Version 2.5.5

    • Added EXCLUDE EMAX|EPROB option, to reject zingers & ice spots (Read's method)
    • Added unit weight option CYCLES WEIGHT UNIT
    • Added calls to libhtml to write html stuff into log file

    Version 2.5.4

    • New optional outlier check comparing I+ with I- observations in ANOMALOUS ALL case (REJECT ... ALL)
    • Removed all (or most) html-reserved characters from logfile

    Version 2.5.3

    • Checks on PhiRange if present

    Version 2.5.2

    • Proper default for PARTIAL TEST (s/r chkkpf)
    • Count failures with inconsistent MPART flags

    Version 2.5.1

    • Fixed uninitialized variables in s/r setscl, particularly affecting eg "scales batch detector 3 3"
    • Updated version of ea06cd

    Version 2.5.0

    • Includes all Kim Henrick's harvesting calls, and calls to new MTZ library things for project & dataset names, but currently commented out or inactivated

    Version 2.4.3

    • Fixed a couple of uninitialized variable bugs (in s/r anlini, nrmprb)

    Version 2.4.2

    • New REJECT options for two observations (KEEP, REJECT, LARGER, SMALLER)

    Version 2.4.1

    • Allow for MPART > 200, for Mosflm 5.51
    • Corrected partial check, to allow for errors in MPART

    Version 2.3.2

  • Out of Phi range is warning, not fatal
  • Check for M>0 (flag set in Postref) for partials: previously didn't work with data from Postref
  • Correct labels for UNMERGED output option
  • DAMP keyword added
  • Bug fix to avoid normal probability analysis problem is no fulls
  • Version 2.3.1

  • Output labels for SEPARATE option changed to conform with CCP4 3.3 convention, i.e. I(+) and I(-) etc
  • Version 2.3.0

  • added "anomalous match" options for selecting matched I+ & I-
  • EXCLUDE does not check reference batch
  • Version 2.2.3

    1. fixed bug in summed partials in case of "scales batch": this combination is still dubious, but awaits proper analysis
    2. added PARTIALS keyword
    3. fixed bug in calculation of Rfull: this was completely wrong if anomalous data was present
    4. added INTENSITIES ANOMALOUS option to keep I+ & I- separate in scaling (not normally recommended)
    5. allow incomplete orientation data in certain cases

    Version 2.2.2, November 1996

    1. defaults on partial summation improved (and again 18/12/96)
    2. analysis on fulls only even when partials are used
    3. bug fix in random number routine (thanks to Adam)
    4. ONLYMERGE option
    5. If scaling across detector (e.g. "scales detector 3 3"), checks on valid Xdet, Ydet (within limits in file header)
    6. Rogues file lists Xdet, Ydet, Phi
    7. default in scaling is "exclude sdmin 6" (omitting weak observations speeds scaling)
    8. default FIX A0
    9. reject outliers on every cycle if scales "restored" (else previous scaling gets messed up)
    10. analysis by position on detector
    11. fixed bug affecting "reject byrun" & deviations with anomalous on

    Version 2.2.1, November 1996

    Many changes from version 1.x.x

    1. this version by default merges multiple measurements and thus replaces Agrovata. See the keyword OUTPUT for further description of the output options:-
    2. -
      AVERAGE [default] merged I (as from Agrovata)
      SEPARATE separate scaled measurements (as from older Scala versions), for reinput into Scala, or input into Agrovata [not recommended]
      POSTREF scaled file for input to POSTREF
      UNMERGED scaled, partials summed (or scaled), but not merged
    3. by default, the SDCORRECTION parameter SdFac (multiplier) will be automatically adjusted, from the normal probability analysis of deviations. This is done in a separate pass through the data before the final merging pass. The command SDCORRECTION NOADJUST disables this adjustment.
    4. The scaling option TAILS has been introduced. This makes some attempt to correct for the different truncation of the tails of diffuse scattering between fulls & partials. This option comes with a health warning: it should be treated with caution. Try with & without. (see commands SCALES . . TAILS, FIX, [UN]LINK)
    5. the way of putting data (e.g. native) back into the scaling as a reference set has changed. See example.
    6. treatment of summed partials has been elaborated (see FINAL & INTENSITIES keywords above). In 2.2.1, the defaults are not set optimally (whatever that means!): this is improved in 2.2.2
    7. Recommended usage:

      FINAL PARTIALS CHECK TEST 0.95 1.05     # for Mosflm
      
      FINAL PARTIALS TEST 0.95 1.05         # for Denzo (but FractionCalc 
                                            #  is rather unreliable)
      
      
    8. Scales are dumped to the file SCALES by default (see DUMP & RESTORE)
    9. Normal probability analyses done, plots output to files NORMPLOT and ANOMPLOT in a format sutiable for xmgr (from your favourite ftp server)
    10. by default scaling now excludes weak data (EXCLUDE SDMIN 3.0)

    AUTHOR

    Phil Evans, MRC Laboratory of Molecular Biology, Cambridge (pre@mrc-lmb.cam.ac.uk) See above for Release Notes.

    SEE ALSO

    truncate, postref, Data Harvesting