dataselect¶
The tool dataselect
will help with the bookkeeping and with creating lists
of input files to feed to the Recipe System. The tool has a command line and
an API. This tool finds files that match certain criteria defined through
AstroData Tags and expressions involving AstroData Descriptors.
You can access the basic documentation from the command line by typing:
$ dataselect --help
usage: dataselect [-h] [--tags TAGS] [--xtags XTAGS] [--expr EXPRESSION]
[--strict] [--output OUTPUT] [--verbose] [--debug]
inputs [inputs ...]
Find files that matches certain criteria defined by tags and expression
involving descriptors.
positional arguments:
inputs Input FITS file
optional arguments:
-h, --help show this help message and exit
--tags TAGS, -t TAGS Comma-separated list of required tags.
--xtags XTAGS Comma-separated list of tags to exclude
--expr EXPRESSION Expression to apply to descriptors (and tags)
--strict Toggle on strict expression matching for exposure_time
(not just close) and for filter_name (match component
number).
--output OUTPUT, -o OUTPUT
Name of the output file
--verbose, -v Toggle verbose mode when using -o
--debug Toggle debug mode
dataselect
Command Line Tool¶
dataselect
accepts list of input files separated by space, and wildcards.
Below are some usage examples.
This command selects all the FITS files inside the
raw
directory with a tag that matchesDARK
.$ dataselect raw/*.fits --tags DARK
To select darks of a specific exposure time:
$ dataselect raw/*.fits --tags DARK --expr='exposure_time==20'
To send that list to a file that can be used later:
$ dataselect raw/*.fits --tags DARK --expr='exposure_time==20' -o dark20s.lis
This commands prints all the files in the current directory that do not have the
CAL
tag (calibration files).$ dataselect raw/*.fits --xtags CAL
The
xtags
can be used withtags
. To select images that are not flats:$ dataselect raw/*.fits --tags IMAGE --xtags FLAT
This command selects all the files with a specific target name:
$ dataselect --expr 'object=="FS 17"' raw/*.fits
This command selects all the files with an “observation_class” descriptor that matches the “science” value and a specific exposure time:
$ dataselect --expr '(observation_class=="science" and exposure_time==60.)' raw/*.fits
dataselect
API¶
The same selections presented in the command line section above can be done
from the dataselect
API. Here is the API versions of the examples
presented in the previous sections.
The list of files on disk must first be obtained with Python’s glob
module.
>>> import glob
>>> all_files = glob.glob('raw/*.fits')
The dataselect
module is located in gempy.adlibrary
and must first be
imported:
>>> from gempy.adlibrary import dataselect
This command selects all the FITS files inside the
raw
directory with a tag that matchesDARK
.>>> all_darks = dataselect.select_data(all_files, ['DARK'])
To select darks of a specific exposure time:
>>> expression = 'exposure_time==20' >>> parsed_expr = dataselect.expr_parser(expression) >>> darks20 = dataselect.select_data(all_files, ['DARK'], [], parsed_expr)
To send that list to a file that can be used later:
>>> expression = 'exposure_time==20' >>> parsed_expr = dataselect.expr_parser(expression) >>> darks20 = dataselect.select_data(all_files, ['DARK'], [], parsed_expr) >>> with open('dark20s.lis', 'w') as f: ... for filename in dark20: ... f.write(filename + '\n') ... >>>
Note that the need to send a list of a file on disk will probably not be very common when using the API as
Reduce
will take the Python list directly.This commands prints all the files in the current directory that do not have the
CAL
tag (calibration files).>>> non_cals = dataselect.select_data(all_files, [], ['CAL'])
The
xtags
can be used withtags
. To select images that are not flats:>>> has_tags = ['IMAGE'] >>> has_not_tags = ['FLAT'] >>> non_flat_images = dataselect.select_data(all_files, has_tags, has_not_tags)
This command selects all the files with a specific target name:
>>> expression = 'object="FS 17"' >>> parsed_expr = dataselect.expr_parser(expression) >>> stds = dataselect.select_data(all_files, expression=parsed_expr)
This command selects all the files with an “observation_class” descriptor that matches the “science” value and a specific exposure time:
>>> expression = '(observation_class=="science" and exposure_time==60.)' >>> parsed_expr = dataselect.expr_parser(expression) >>> sci60 = dataselect.select_data(all_files, expression=parsed_expr)
The strict
Flag¶
The strict
flag applies to the descriptors exposure_time()
and
filter_name()
. To keep the user interface more friendly, in the
expressions, the exposure time is matched on a “close enough” principle and
the filter name is matched on a “general bandpass name” principle.
For example, if the exposure time in the header is 10.001 second, from a user’s
perspective, asking to match “10” seconds is a lot nicer, exposure_time==10
.
Similarly, asking for the “H”-band filter is more natural than asking for the
“H_G0203” filter.
However, there might be cases where the exposure time or the filter name must
be matched exactly. In such case, the strict
flag should be activated.
For example:
$ dataselect raw/*.fits --strict --expr='exposure_time==0.95'
And:
>>> expression = 'exposure_time==0.95'
>>> parsed_expr = dataselect.expr_parser(expression, strict=True)
>>> filelist = dataselect.select_data(all_files, expression=parsed_expr)