(written for NIAC2014_Meeting)
In developing the canSAS standard for storing multidimensional reduced small-angle scattering data, a new method for identifying the related axes and uncertainties was devised. It is proposed that this will be a good addition to NeXus, adding flexibility to data file writers while preserving an obvious path from data to axes and uncertainties.
(see this from canSAS: http://www.cansas.org/formats/canSAS2012/1.0/doc/framework.html)
NeXus needs a robust method to describe and associate the axes of data with arbitrary dimensions in files. Particularly, the existing methods rely on the concept of HDF dimension scales to describe each of the axes. There is no capability in NeXus to describe multi-dimensional axes associated with plottable data.
The current method of identifying the default data in NeXus files by adding attributes to datasets does not work in certain important situations. For example, with the current specifications, a dataset with @signal=1 attribute can not be linked into a group which already has such a dataset. Yet, the notion of NXdata to describe the default plot implies the specifications are a property of the NXdata group, not the component datasets.
An additional concern is that the method should allow great flexibility in the naming of fields.
The new method should be backwards compatible with and not spoil existing methods in NeXus.
We must also consider and allow for the common situation when no axes are specified.
Define the specifications of the default data and any associated axes using attributes attached to the NXdata group.
Preserve the simplicity of the signal attribute but provide it at the group level.
Define how to describe slices of multi-dimensional axes associated with the default data.
All attributes potentially containing multiple values (axes and _indices) are to be written as integer or string arrays, to avoid string parsing in reading applications.
signal:
Defines the name of the default dataset.
A field of this name *must* exist.
(either dataset or link to dataset)
axes:
String array that defines the independent data fields used in default plot for all of the
dimensions of the signal field. One entry is provided for every dimension in the signal field.
The field(s) named as values (known as
“axes
”) of this attribute must exist.
An axis slice is specified using the
{axisname}_indices
below.
When no default axis is available for a particular dimension of the plottable data,
use a
“.
” in that position. An example below demonstrates this.
If there are no axes at all (such as with a stack of images), the
axes
attribute can be omitted.
{axisname}_indices:
Integer array that defines the indices of the signal field array which need to be used in the
{axisname}
dataset in order to reference the corresponding axis value.
This attribute is to be provided in all situations.
However, if the indices attributes are missing, file readers are encouraged
to make their best efforts to plot the data. Thus the implementation
of the
{axis
name}_indices
attribute is based on the model of
“strict
writer,
liberal
reader
”.
NXroot
NXentry
NXdata
@signal=
“data
” --> *names* the default data to be visualized
@axes=
“x
” --> *names* the default independent data
@x_indices=0
data: float[100] --> the default dependent data
x: float[100] --> the default independent data
NXroot
NXentry
NXdata
@signal=
“data
”
@axes=
“time
”,
“pressure
”
@pressure_indices=1
@temperature_indices=1
@time_indices=0
data: float[1000,20]
pressure: float[20]
temperature: float[20]
time: float[1000]
NXroot
NXentry
NXdata
@signal=
“det
”
@axes=
“pressure
”,
“tof
”
@pressure_indices=0
@tof_indices=1
det: float[100,100000]
pressure: float[100]
tof: float[100000]
NXroot
NXentry
NXdata
@signal=
“det
”
@axes=
“x
”,
“y
”,
“tof
”
@tof_indices=2
@x_indices=0,1
@y_indices=0,1
det: float[100,512,100000]
tof: float[100000]
x: float[100,512]
y: float[100,512]
NXroot
NXentry
NXdata
@signal=
“det1
”
@axes=
“polar_angle_demand
”,
“frame_number
”,
“.
”
@frame_number_indices=1
@polar_angle_rbv_indices=0,1
@time_indices=0,1
@polar_angle_demand_indices=0
polar_angle_rbv: [50,5]
det1: [50,5,1024]
polar_angle_demand: [50]
frame_number: [5]
time: [50,5]
NOTE: At NIAC2014, this proposal on uncertainties was not accepted. NIAC will see a proposal when experience has been gained with all variations.
(see this from canSAS: http://www.cansas.org/formats/canSAS2012/1.0/doc/examples.html#representing-uncertainty-components)
In a scientific data file, it is assumed that uncertainty about the value of the data is expressed in an array of the same shape as the data. The uncertainty of a datum may be expressed in a single value or as derived from several components.
The way that data uncertainties are described in NeXus data files is inconsistent across the base classes and can be improved while also being generalized. Currently, the name of these uncertainties is most often called “errors” which is incorrect. At best, these are error estimates but actually describe an estimate of one’s uncertainty in the data.
While uncertainties are properties of the dataset rather than the containing group, it is difficult in some cases such as externally-linked datasets, to attach the attributes directly to the dataset.
The attribute-based scheme used to describe the axes (see above) can be extended to describe the uncertainties. A subgroup can be created to deposit these constituents. It seems that a new base class would be needed for this subgroup.
It is a question for debate whether to attach the attribute to the dataset or the NXdata group. We leave that open for now.
We must consider the description of uncertainty as an attribute of either a dataset or the containing NXdata group. The value of the attribute should be the same in either case.
Name of the uncertainty attribute depends on the context:
Value:
Defines the name of the dataset with the uncertainty to be used.
This dataset must exist and have the same shape as the signal dataset.
Examples
NXroot
NXentry
NXdata
@signal=
“data
”
@data_axes=
“xy
”
@data_uncertainty=
“esd
”
data: float[300, 300]
xy: float[300, 300]
esd: float[300, 300]
NXroot
NXentry
NXdata
@signal=
“data
”
@data_axes=
“xy
”
data: float[300, 300]
@uncertainty=
“esd
”
xy: float[300, 300]
esd: float[300, 300]
Name of the uncertainty components subgroup attribute depends on the context:
Value:
Defines the name of the NXuncertainty subgroup with the uncertainty components.
This subgroup must exist.
Examples
NXroot
NXentry
NXdata
@signal=
“data
”
@data_axes=
“xy
”
@data_uncertainty=
“esd
”
@esd_uncertainty_components=
“esd_uncertainties
”
data: float[300, 300]
xy: float[300, 300]
esd: float[300, 300]
esd_uncertainties:NXuncertainty
electronic : float[300, 300]
@basis=
“Johnson
noise
”
counting_statistics: float[300, 300]
@basis=
“shot
noise
”
secondary_standard: float[300, 300]
@basis=
“esd
”
NXroot
NXentry
NXdata
@signal=
“data
”
@data_axes=
“xy
”
data: float[300, 300]
@uncertainty=
“esd
”
xy: float[300, 300]
esd: float[300, 300]
@uncertainty_components=
“esd_uncertainties
”
esd_uncertainties:NXuncertainty
electronic : float[300, 300]
@basis=
“Johnson
noise
”
counting_statistics: float[300, 300]
@basis=
“shot
noise
”
secondary_standard: float[300, 300]
@basis=
“esd
”
These are some of the topics to be considered when evaluating this proposal.
TODO: need to write this section