-
Notifications
You must be signed in to change notification settings - Fork 9
Pipeline and Stages
We have already seen that we want to divide our pipeline up into independent stages. To simplify the stitching of many stages together, the module pipeline.manager exists. It allows you to build a complete pipeline made of many independent stages and have all the input parameters for all the stages in the same file. Dropping stages in and out of the pipeline is easy. This comes with a catch though, all stages must conform to a particular format to be executed by the pipeline manager (I actually think this is an advantage, since everyone will be writing similar looking code). The particulars should in the module documentation for pipeline.manager (once I write it) but I will summarize it here.
- A pipeline stage must be a class.
- A pipeline stage must read all of its input parameters (including the list of files to process, output directory, any stage dependant configuration, etc.) using the kiyopy.parse_ini module. No parameters passed on the command line, and obviously avoid hard coding any settings that will frequently change.
- A stage's input parameter names must all start with 2 letters unique to that stage followed by an underscore. Use parse_ini's prefix functionality for this.
- The stage's
__init__
method should accept three parameters: A file name or dictionary filled with the input parameters for the stage (which is then passed along to parse_ini), the number of processors to use nprocessors (default 1. accept this parameter even if you have not yet threaded you stage.), and a feedback level parameter telling the module how much noise to make (default 2, range 0-10). - The stage must have a method named execute() that causes the entire analysis for that stage to be performed.
For example look at the noise.noise_power
module. This first thing after the import statements in the params_init assignment. This is the dictionary that lists the parameters that this module will read from file. The values in the dictionary give the default values for these parameters. The first few parameters are just IO, and the later ones are configuration for the Stage.
Next up is the prefix: 'np_'. Thus, in the parameter file for this stage, there should be the assignment np_input_root = 'path to the input data'
and the default value is "./".
Next is the class definition; the stage itself. This is the thing that you invoke to perform the analysis associated with this stage and the thing to give to the pipeline manager (described below). The first method is the __init__
method which is called when you create a new instance of the stage. This init method can basically be cut and pasted to any other stages, but you can also add anything else you want for initial set up. Essentially it uses the kiyopy.parse_ini module to read in input parameters based on the params_init dictionary and prefix string described above. Notice that parameters can be read from either a file or a dictionary. The dictionary functionality is used in the by the pipeline manager below.
The execute method initiates the rest of the analysis. There isn't a whole lot to be said here. Do what ever you would like.
I have provided a base class to help build pipeline modules in time_stream.BaseSingle
. All the looping over files, IFs and scans is already done, all you have to do is replace the action
method which performs an action on a single DataBlock object. This greatly simplifies writing modules where both the input and output is the time stream fits file format. It is also threaded.
The pipeline manager (pipeline.manager
) provides a way to string many analysis stages together.
Pipeline Primer Contents