-
Notifications
You must be signed in to change notification settings - Fork 9
Running the Map Maker
Prior to running the map maker, we need to get the time stream data ready for map making.
Typical time stream prep modules are:
-
Rebin the data in time. This rebinning is done on the data prior to transferring it off the NRAO computers in order to make the data volume a bit more manageable.
-
Flag for RFI (and apply the cal-on - cal-off calibration). This is done using the flag_data module.
-
Rebin in frequency using the rebin module. Typical rebinning is 16 frequency bins combined to get a total number of frequencies = 256.
-
Apply Flux calibration using the flux_diff_gain_cal module. The data is now in (XX, XY, YX, YY) polarization, with both the cal-on and cal-off data present.
-
Rotate to intensity (I) or full stokes (I,Q,U,V) polarization using the rotate_pol module. During this step, we also take the average of the cal-on and cal-off data.
-
Optional. If you are doing a special type of map (eg higher/lower frequency resolution), you can also run additional rebinning in time or frequency, or run the band_split/band_stop modules (which change the frequencies being used in the map making).
After running the first round of map making on the full set of time stream data, but before making the second round of sub maps (in time); there are a couple of time stream stages that have to be run.
If you want to subtract the mean map signal from the time stream data during the map making you have to run the correlate_map and subtract_map_data modules; plus the reflag module to flag anything that the comparison makes stand out.
Regardless of whether or not you run the subtraction, you also need to run the measure_noise module prior to making sub-maps.
There are two parts of the map-maker: dirty_map and clean_map
Dirty map is always run first, and also calculates the noise inverse for the map. The dirty map parameters are listed below, along with some typical values for them.
dm_input_root = '' #Output path for the last pre map making stage, usually the rotation to I,Q,U,V polarization.
dm_file_middles = file_middles #List of the file names for all the time stream data files to use in the map maker.
dm_input_end = '.fits'
dm_output_root = #Output path for your maps
dm_scans = () #time stream data files typically only have 1 scan
dm_IFs = () #This is not empty only if you have used the band_split module to split up the array into frequency bands.
dm_polarizations = ('I','Q','U','V') # You can also set it to just one polarization eg ('I',) if you only want to make that map.
dm_field_centre = map_centre # Position in RA and DEC for the center of the map
dm_pixel_spacing = map0_spacing # Width of each pixel in degrees
dm_map_shape = map0_shape #Number of RA, DEC pixels to include
dm_time_block = 'scan' #I've always just used 'scan', not sure what else this could be.
dm_n_files_group = 10 #This is the number of files to process at a time. Keep the number low to not run into memory issues.
dm_frequency_correlations = 'None' #This has other settings if you want to look at correlations between frequencies.
dm_number_frequency_modes = 0 #This is the number of modes subtracted from the map
dm_noise_parameter_file = '' #This is a measured noise file calculated from a map. Usually left blank for the first stage map, but becomes the .shelve file made in the measure_noise step for the second stage maps.
dm_deweight_time_mean = True #Have always kept as True
dm_deweight_time_slope = True #Have always kept as True
dm_interpolation = 'cubic' #Can be adjusted if you want a different interpolation mode, but have always just used cubic.
After running the dirty_map, you move on to run the clean_map. There aren't as many parameters to worry about for the clean_map module, and they are defined below.
cm_input_root = dm_output_root #Need to know where to find the dirty_map and noise_inv files
cm_output_root = cm_input_root #Usually write the clean map in the same directory as the dirty map
cm_polarizations = ('I','Q','U','V') #Works the same as with the dirty map polarizations
cm_bands = (800,) #This is the center frequency for the dirty map (will be in the dirty map file name, is set by the dm_IFs, so can change if you limit your frequency band).
cm_save_noise_diag = True #This saves the noise_diagonal in the same directory as the clean map.
To run the mapmaker you first have to choose the machine on which you are going to run. The map maker has fairly high computing needs when run without parallelizing. I've only successfully run in that mode on prawn or the tpb high memory nodes, which takes up over 50% of the system memory.
I only recommend using those machines if you are running severely truncated maps (either in spatial or frequency resolution).
For the regular map-maker, you will need to set up a .pipe file. There are numerous examples to be found in the individual input directories in analysis_IM. The .pipe file sets all the parameters listed above and then tells the system to run the map maker.
The .pipe file will look something like the example below:
import os
from core import dir_data
import scipy as sp
#Code using dir_data to get the file middles:
file_middles = tuple(dir_data.get_data_files(range(01,19),field='11hr',project='GBT12A_418',type='ralongmap'))
#code defining directories if needed for the parameters.
#Set maximum number of processes to use
pipe_processes = 1 # can push up to 6 for prawn or 10 for the tpb nodes
pipe_modules = []
from map import dirty_map
pipe_modules.append(dirty_map.DirtyMapMaker)
from map import clean_map
pipe_modules.append(clean_map.CleanMapMaker)
#Set map parameters as described above
To run the mapmaker in-line you type from the analysis_IM directory:
python pipeline/manager.py input/user/filename.pipe
It is generally best if you're using the cita computers to nice your python command.
Running the parallel map maker is similar to running the regular map maker, but now you have to submit the job to a queue.
For the parallel dirty map maker you cannot use a .pipe file, but have to use a .ini file. The file will look similar to the .pipe file but doesn't include the lines about the pipe modules. You can just take your .pipe file and remove the following lines:
#Set maximum number of processes to use
pipe_processes = 1 # can push up to 6 for prawn or 10 for the tpb nodes
pipe_modules = []
from map import dirty_map
pipe_modules.append(dirty_map.DirtyMapMaker)
For the parallel clean map maker you still use the .pipe file but you have to change the module that runs, which becomes:
from map import parallel_clean_map
pipe_modules.append(parallel_clean_map.CleanMapMaker)
Then you write a .qsub file to run the command.
For sunnyvale, the .qsub file that I use for the dirty map looks like this:
#!/bin/csh
##PBS-l nodes=1:ppn=8
#PBS-l nodes=128:ppn=8
##PBS -q greenq
#PBS -q workq
#PBS -r n
#PBS -l walltime=48:00:00
#PBS -N dirty_map_parallel
cd /mnt/raid-project/gmrt/tcv/analysis_IM/
mpirun -np 128 -npernode 1 python map/parallel_dirty_map.py input/tcv/mm_test.ini 1
and the .qsub file that I use for the clean map looks like this:
#!/bin/csh
##PBS-l nodes=1:ppn=8
#PBS-l nodes=16:ppn=8
#PBS -q workq
#PBS -r n
#PBS -l walltime=40:00:00
#PBS -N clean_map_parallel
cd /mnt/raid-project/gmrt/tcv/analysis_IM/
mpirun -np 16 -npernode 1 python pipeline/manager.py input/tcv/pointcorr_mapmaker.pipe 1
Note that running the parallel dirty mapper on sunnyvale takes up ~ 60% of the nodes because we're limiting the number of processes per node to 1.
For general information on scinet, see their wiki page, https://support.scinet.utoronto.ca/wiki/index.php/SciNet_User_Support_Library . The SciNet User Tutorial pdf available here is very helpful. Note that large data transfers, which will probably be your first step before you can make maps, should be done on the datamover1 or datamover2 server.
Scinet login server: login.scinet.utoronto.ca
From there, I log onto one of the gpc servers, gpc01-gpc04 to run code or to datamover1 or 2 to copy data from a remote server.
Work on scinet should be done on /scratch, since we don't have project storage yet. I do my work and store my maps in /scratch/p/pen/andersoc/. The data on scratch is deleted every 3 months unless it is touched (they send warning emails before this happens). Before this happens, you can just touch all files in your directory, using the command: find . -exec touch {} ;
I have always worked on the GPC, which has several thousand nodes with 8 cores each. There are also 84 32GB nodes and 72 64GB nodes.
Submitting jobs is similar to sunnyvale. You log onto one of the gpc nodes, make your job script, EXAMPLE_SCRIPT. From that directory, you submit the script with the command qsub EXAMPLE_SCRIPT .
Here is an example of a working job script I made (which uses the 32GB nodes). The modules I load here are ones that I found work for our code.
#!/bin/bash
#PBS -l nodes=64:m32g:ppn=8
#PBS -l walltime=1:05:00
cd $PBS_O_WORKDIR
cd parkes_analysis_IM
export PYTHONPATH=$PYTHONPATH::/scratch/p/pen/andersoc/workcopy/parkes_analysis_IM/
module load intel/13.1.1
module load gcc/4.8.1
module load python/2.7.5
module load openmpi/gcc/1.6.4
module load hdf5
mpirun -np 64 -npernode 1 python map/parallel_dirty_map.py input/cja/32gb_beam_ini/dirty1YYra33.ini 16
Check the status of your job using: showq -u username .