Anatomy of a complex recipe#
In this section, we’ll take a look at a fairly elaborate real-life recipe and explain its moving parts. This is the PARROT processing recipe used to process the data for the RATT PARROT paper.
The full recipe and all supporting files are available at https://github.com/ratt-ru/parrot-stew-recipes/tree/parrot1. The top-level recipe is called image-parrot.yml
.
Preliminaries#
First, we have a shebang:
#!/usr/bin/env -S stimela run
this is not necessary, but it’s handy, as it allows the recipe file to be executed directly from the shell (if you chmod the recipe file to be executable), implicitly invoking stimela run
. Next, we have an include block:
_include:
- parrot-cabs.yml
This pulls in a separate file containing other standard includes and cab definitions:
_include:
(cultcargo):
- wsclean.yml
- casa-flag.yml
- breizorro.yml
...
omstimelation:
- oms-cabs-cc.yml
- oms-ddf-cabs.yml
cabs:
wget:
command: wget
...
Option settings#
Next, we set set some Stimela configuration options. This is done by tweaking the opts
section (see Configuration namespace for details):
## this augments the standard 'opts' config section to tweak logging settings
opts:
log:
dir: logs/log-{config.run.datetime}
name: log-{info.fqname}
nest: 2
symlink: log
backend:
select: singularity
singularity:
auto_update: true
rlimits:
NOFILE: 100000 # set high limit on number of open files
The first of these have to do with logfile handling for details). Here, we’re telling Stimela that:
We want the logs from every run placed into a subdirectory called
./logs/log-DATETIME
. This way, logs from each run are kept separate.We want logfiles to be split up by step, and named in a specific way (
log-recipe.step.txt
)……but only to a nesting level of 2 – that is, if a top-level step happens to be a sub-recipe with steps of its own, these nested steps will not have their own logfiles – rather, all their output will be logged into the logfile of the outer step. If we wanted nested recipe steps to be logged separately, we could increase the nesting level.
We want a symlink named
logs/log
to be updated to point to the latest log subdirectory – this allows for quick examination of logs from the latest run.
The backend
options select the SIngularty backend, and set so process resource limits (see Backend settings).
Body of the recipe#
Next, we have the body of the recipe proper:
rrat:
name: ratt-rrat
info: imaging of RRAT follow-up
This starts with a section name. Because the section name is not one of the standard ones (i.e. cabs, opts), stimela will treat it as a recipe definition. Note that it is also possible to define recipes directly in the library, like this:
ratt-parrot:
name: ratt-parrot
info: "imaging of RRAT PARROT follow-up observations"
…and, in fact, any recipe section defined at the top level will be implicitly inserted into lib.recipes
. Defining things at top level is simply a shortcut that saves on indentation.
Then, we have an optional name
attribute (if missing, the section name will be used as the recipe name), and an info
string describing the recipe.
Recipe variables#
Next, we define some variable assignments. Note that all these variables are completely free-form and user-defined, with no particular meaning to Stimela itself. The point is to set up some consistent naming conventions for output files and dirctories, which we then apply throughout the body of the recipe:
assign:
dir-out: '{recipe.dirs.base}/{recipe.dirs.sub}{recipe.output-suffix}' # output products go here
image-prefix: '{recipe.dir-out}/im{info.suffix}{recipe.variant}/im{info.suffix}{recipe.variant}' # prefix for image names at each step
log.dir: '{recipe.dir-out}/logs/log-{config.run.datetime}' # put logs into output dir
# some more directory assignments
dirs:
ms: ../msdir # MSs live here
temp: "{config.run.env.HOME}/tmp" # temp files go here
base: . # base project directory -- directory of recipe by default, but see below
Note a few crucial details:
the above makes heavy use of {}-substitutions to define a set of naming conventions. For example, the
recipe.dir-out
variable (which this recipe uses consistently throughout to construct paths for output products) is formed up from a base directory (here set to “.”), a subdirectory (defined viaassign_based_on
below), and an optional output suffix (defined as a recipe input below).assignments are re-evaluated (and thus resubstituted) at each recipe step.
{info.suffix}
, for example, refers to the suffix of the current step’s label. The recipe contains steps labeledimage-1
,image-2
, etc. – at each step, therecipe.image-prefix
variable will be updated accordingly. Note also how this includesrecipe.dir-out
.{config.run.datetime}
fetches the timestamp of the Stimela run from the configuration namespace. The assignment tolog.dir
results in logfiles being placed into a custom subdirectory (which is unique for each run, by virtue of having the timestamp included in its name). We’re also telling Stimela that we want to keep logfiles insiderecipe.dir-out
.config.run.env
contains all the shell environment variables, here we useHOME
to get at the user’s home directory.
Then, we include a few more variable assignments via the assign-based-on trick:
assign_based_on:
_include: parrot-observation-sets.yml
Inside that file <https://github.com/ratt-ru/parrot-stew-recipes/blob/parrot1/parrot-observation-sets.yml>`you’ll see a bunch of variable assignments based on the value of ``obs` (which has the meaning of “observation ID”, see below), including an assignment to band
(“L” o “UHF”), which itself triggers a bunch of other assignments. This is a useful pattern for grouping observation-specific settings together and managing them in one place.
Recipe inputs#
Next, it’s time to define the recipe’s inputs:
inputs:
obs:
choices: [L1, L2, L3, L4, U0, U1, U2, U3, U3b, U3c]
info: "Selects observation, see parrot-observation-sets.yml for list of observations"
default: L1
output-suffix:
dtype: str
default: '-qc2'
variant:
dtype: str
default: ''
dir.out:
dtype: str
...
The only required input is obs
, which selects the observation to be processed. The assign_based_on
section above relies on this input to set up a slew of other variables (including the MS name). The output-suffix
and variant
inputs are used to form up filename paths, as we saw above. Then, we have an inputs called dir-out
. This may look familiar – you may have it being assigned to above. What is going on, and why are we assigning to a recipe’s inputs? Recall that inputs can be assigned to; this is effectively a roundabout way of setting up a default value for them. Here, the intended routine usage of the recipe is to specify obs
and have dir-out
set up automatically via the assign_based_on
section. However, dir-out
remains a legitimate input, so the user may also specify it explicitly from the command line.
Aliases#
The aliases section links certain recipe inputs to inputs of particular steps:
aliases:
ms:
- (wsclean).ms
- (quartical).input_ms.path
weight:
- (wsclean).weight
minuv-l:
- (wsclean).minuv-l
taper-inner-tukey:
- (wsclean).taper-inner-tukey
Since WSClean and QuartiCal steps recur throughout the recipe, this is a clean way to link some of their parameters to recipe inputs up front. Note how the (wsclean)
syntax refers to “all steps using the wsclean
cab”. Throughout the rest of the recipe, you will often see parameter assignments such as ms: =recipe.ms
for other cabs. This achieves the same effect as an alias, with the difference that aliases allow for a bit more up-front prevalidation. (The PARROT recipe could be overhauled to use more aliases in these cases, modulo better being the enemy of good.)
Note also that as a result of this aliases declaration, ms
, weight
, etc. become recipe-level inputs (see Aliased inputs/outputs). Recall that ms
was assigned to based on the value of obs
, similar to dir-out
. The user can still specify it manually from the command line.
Recipe steps#
We now get to the business end of the recipe. We won’t go through all of its many steps here, but rather highlight some of the more interesting ones that illustrate various Stimela features.
The flag-save
step is marked as always skipped (skip: true
). Why so? This step captures the initial state of flags in the MS, and should only ever be run once per MS. The next step, flag-reset
(not skipped!), resets the flags to the saved initial state. The idea here is, the very first time a user processes a particular MS, they should do an explicit stimela run -s flag-save
to save the initial state. Subsequent re-runs of the same workflow can then start from a known set of flags.
The image-1
step reuses a “template” step definition defined elsewhere, and augments it with some specific parameter settings:
image-1:
info: "auto-masked deep I clean"
_use: lib.steps.wsclean.rrat
params:
column: DATA
niter: 150000
fits-mask: =IF(recipe.automask, UNSET, recipe.deep-mask-1)
auto-threshold: 2
Given many repeated steps with lengthy yet similar parameter settings, this “template” pattern can reduce the recipe’s complexity. You will see it recur in many of the subsequent steps.
The mask-1
step invokes a sub-recipe:
mask-1:
recipe: make_masks
params:
restored-image: "{previous.restored.mfs}"
prefix: "{previous.prefix}"
The sub-recipe is defined here.
A few steps down, we come to predict-copycol-3
. This illustrates a conditional skip based on a recipe input.
Another few steps down, we see conditional skips based on the state of a step’s outputs:
download-power-beam:
cab: wget
params:
url: =recipe.mdv-beams-url
dest: =recipe.mdv-beams
skip_if_outputs: exist
compute-power-beam:
cab: mdv-beams-to-power-beam
params:
mdv_beams: =recipe.mdv-beams
power_beam: =recipe.power-beam
skip_if_outputs: fresh
derive-obs-specific-power-beam:
cab: derive-power-beam
params:
cube: =steps.cube-3.cube
images: =steps.image-3.restored.per-band
outcube: =STRIPEXT(current.cube) + ".pbcorr.fits"
power_beam: =recipe.power-beam
beaminfo: "{steps.image-3.prefix}-powerbeam.p"
nband: 128
skip_if_outputs: fresh
This pattern comes in handy for relatively expensive steps that should only be re-executed if some of their inputs change. The skip_if_outputs: fresh
directive makes Stimela behave in a way that is reminiscent of Unix Makefiles.
In passing, noe the =STRIPEXT(current.cube) + ".pbcorr.fits"
pattern. This take the value of the cube
input (a filename), removed the extension, and appends another extension to form up a value for the outcube
output.
A few more steps down, we come onto an example of the use of tags:
make-master-catalog:
tags: [master-catalog, never]
...
augment-master-catalog:
tags: [master-catalog, never]
...
Tags are related to skips. They can be used to group related steps together, and invoke or skip them as a whole. In this case, we see the special never
tag. This tells Stimela that these two steps are to be skipped unless explicitly invoked from the command line. The invoication can be done by specifything their other tag with -t
:
stimela run image-parrot.yml -t master-catalog
or, more cumbesomely, using -s
with the step labels:
stimela run image-parrot.yml -s make-master-catalog -s augment-master-catalog
A bit further down we see another example of step tags:
tags: [lightcurves]
Running:
stimela run image-parrot.yml -t lightcurves
will skip the bulk of the recipe, only invoking the steps tagged with lightcurves
.
Looping recipes#
On a different subject, let’s leave the PARROT and examine a Jupiter imaging recipe in the same repository, jove-pol.yml
. The structure of this recipe is broadly similar to the PARROT recipe above. It takes one MS and, after multiple rounds of selfcal, produces images. Note how its ms
input is defined:
inputs:
ms:
dtype: MS
default: '{recipe.ms-base}-scan{recipe.scan:02d}.ms'
aliases: ['*.ms']
This means that the user can specify an ms
explicitly, or, alternatively, via two other inputs – a base name and a scan number – from which the corresponding MS name is constructed by default. (In passing, note also the use of aliases
to link this input to all steps with an ms
input.)
The interesting trick comes when we want to apply this recipe to a series of MSs. This is done by jove-pol-loop.yml
:
jove-pol-loop:
name: "Jove IQUV scan loop"
info: "makes images with 1GC/DDCal for a series of scans, in full Stokes"
for_loop:
var: scan
over: scan-list
display_status: "{var}={value} {index1}/{total}"
inputs:
_include: jove-defaults.yml
scan-list:
dtype: List[int]
default: [ 4, 6, 8, 11, 13, 15, 18, 20, 22, 24, 27, 29, 31, 34,
36, 38, 41, 43, 45, 48, 50, 52, 55, 57, 59, 61, 64, 65 ]
This tells Stimela that the recipe is a loop: the scan
variable is to be iterated over values in scan-list
, which by itself is an input, with a default. (The display_status
attribute tells Stimela how to format information for its status bar.) For each scan in the list, it involves two steps, passing the scan number (and base filenames) as inputs to the sub-recipe:
steps:
jove-prepare:
...
jove-pol:
recipe: jove-pol4
params:
ms-base: =recipe.ms-base
dir-out-base: =recipe.dir-out-base
scan: =recipe.scan
If the Slurm backend is enabled, once could also add scatter: -1
to the for_loop
section so as to process all the iterations in parallel.
The above pattern represents a common scenario where the same workflow needs to be applied to a series of observations. Note how this structure allows for a straightforward invocation of the whole workflow or its individual parts:
$ stimela run jove-pol.yml scan=11 # process one scan (MS name formed up automatically)
$ stimela run jove-pol.yml ms=my.ms # process one particular MS
$ stimela run jove-pol-loop.yml # process all scans
$ stimela run jove-pol-loop.yml scan-list=[4,6,8] # process three particular scans
This would be an appropriate place to mention that Stimela also supports step selection within sub-recipes. If you want to run a speficic sequence of steps within jove-pol
, but over multiple scans, it could be done as e.g.:
$ stimela run jove-pol-loop.yml scan-list=[4,6,8] -s jove-pol.mask-1:image-2
The STEP.FROM:TO
syntax here selects a sequence of substeps (mask-1
through image-2
) from the nested recipe given by jove-pol
step of the outer recipe.
In conclusion#
This concludes our brief tour of some real-life recipes. Hopefully, this has illustrated some good practices for recipe construction, as well as some advanced Stimela trickery and the right ways of employing it. We hope this material has been stimelating!