Telco 20220426
Date
Tuesday, 26th April, 14:00 UTC
Connect
Agenda
- Welcome guests and new members
- Review previous meeting minutes and actions: February Telco, Spring virtual NIAC
- Review of the recent NFDI workshop
- File format alternatives to HDF5 and XML such as ADIOS and Zarr.
- Discussion on mailing list showed some concerns about reading archives if the container format is not restricted. (Is this part of the NeXus mission?)
- What should be our core definition of how NeXus is expressed in each container file format? Options include:
- English description in manual (i.e. current implementation).
- Functional code (e.g. validator, reader, writer).
- XML declarations.
- Example files (maybe make an
NXexample
application definition for this purpose).
- Format of Code Camp during week of May 9-13th?
- 2022 HDF European user group meeting
- Brian Maranville has made on online version of cnxvalidate. We may wish to endorse this and provide hosting.
- Recent issues on Github (especially those labelled with “telco”)
Present
BW, WDN, HB, FP, MK, RB, CZ, NS, AB, GG, HG, LG, PJ, SB, DMR, JK, PC, AD
Minutes
- Introductions
- NFDI workshop 160-ish registrants (110 online at once)
- First day of presentations from people already working with NeXus and some newcomers
- Second day more about getting to know each other with short presentations and break-out discussions. Broad variety of projects and communities, plus some vendors.
- NS ALBA want to be able to write simultaneously to a file (perhaps with Zarr).
- FP from OpenPMD needs many different use cases and needs to optimise for different use cases.
- BW says that NeXus…
- HB says that opening to new formats raises responsibility to provide tools for such formats. Organisations are producing/collecting millions of datasets. Additional resources are going to have to be part of the discussion.
- BW asks if this is within the NeXus mission
- PC need an algortim to write a tree - file format just needs to have sufficient expressibility.
- MK - history: HDF4 came first, then HDF5, then the community demanded XML. Later the community only wanted HDF5. Maintenance of the NeXus API is lots of work. The HDF group is working on performance. We should not stop anyone using NeXus heirarchies, but I would be cautious about adding extra formats.
- HG asks if some documentation can be provided to support arguements that HDF5 is not sufficient. Providing minimum performance requirements would be helpful
- SB is using the virtual dataset features that allows the tree to be split into multiple files so that multiple readers can be implemented
- DB is open to different approaches.
- RB proposes a “contributed” status for the file formats similar to the application definition in order to provide guidance from NeXus while the community provides implementation for resources.
- BW wants to rethink how NeXus declares the implementation of the data hierarchy in a file format. The current method of describing it in the manual is informal and quickly becomes unmanageable when adding further file formats. We define everything else with NXDL, but it is unclear to me how this can be done for file formats, unless it is to create an application definition describing all of the required features to express NeXus, which would be accompanied by reference files and perhaps reference code demonstrating how the application definition is implemented in the file format.
- AB says NEXUS provides a set of services to the community: ontology, ref examples, and validation. We expect community to write further tools.
- HB suggest people recall what happened to file formats around the switch to UNIX. Solved by do what you want so long as you fit into the “streams interface”. Need simple way to describe mapping inot file formats. these are huge investments worldwide when adding further formats.
- DMR points to other examples of validation (eg browser test suit).
- HG group runs a data repository. We would only want formats with long term support. Archiving should be important to NeXus. We should also consider conversion (eg write what you want, but convert to eg HDF5 for archiving).
- NS asks about a tools between the data and file formats - this is what the NAPI is
- AD openPMD has lots of similar things to consider. HDF5 was an early feature, but it stopped being able to handle the size of the simulations (TB). We changed the backend to allow different formats and this solved many problems. openPMD would like to bring the NeXus ontology in to the different backends.
- FP says implementing NeXus onto multiple backends is not straightforward. We need to agree on specific details. Data conversion is “free” with a multi-backend API.
-
AB the NIAC provides ontology, validation, and example data. Other stuff is more a volunteer effort (NeXpy, NeXus API, Russ’s example file writer from NXDL). So if a file format wants a ‘badge of support’, i think it needs a validator and example data.
- BW advertised the HDF user meeting and asks if any NIAC members plan to attend. BW says that he will try to attend himself.
- The code camp clashes with the EPICS codeathon. Multiple people want to reschedule our code camp, so BW will send out a new poll to decide another set of dates.
- BW encourages people to try out the online cnxvalidate implementation so that we can talk about in more detail at another meeting.
- WDN ESRF already has multiple APIs just for HDF5. This happens for performance reasons.
-
FP perspective is that runtime configuration is important in order to meet different performace criteria.
- WDN wants errors on aux signals https://github.com/nexusformat/definitions/issues/1044
- consensus that this just extends the convention and so should be easily accepted.
- BW says that the release milestone should be a priority for NeXus, especially at the upcoming code camp
- BW will make a poll for the next telco aiming for mid- to late May.