EleFits
4.0.0
A modern C++ API on top of CFitsIO
|
How to make a program efficient?
The implementation of EleFits focuses firstly on memory management, secondly on user-friendlyness and finally on performance. Much of its complexity is imputable to performance choices, like the heavy use of tuples which are not that fun to manipulates. On user side, few operations would be very detrimental to performance. Yet, a few would be, and there are thus a few things to know if one wants to get the most out of EleFits. Although it's not possible to statically sort the following items w.r.t. performance gain, we've tried to sort by orders of magnitude for classical use cases.
This one is not specific to the Fits format or EleFits at all, but is probably worth remembering when using a code devoted to dataset manipulation.
In the frame of the library usage, avoiding such operations boils down to:
Move semantics is a C++ feature you can look for on the Internet; Let us focus on the two other items.
When casting is explicit, let's assume that you know what to do. The case we'll describe in more details is implicit casting made internally by CFitsIO.
The Fits format is obviously specified in terms of actual bytes, and therefore fixed-size types (e.g. 16- or 32-bit integers). CFitsIO mixes such fixed-size types with C types (e.g. TLONG
maps to long
while LONG_IMG
maps to 32-bit integer), without providing to the user any straightforwad way of avoiding casting.
EleFits abstracts from this and provides a templated API, which internally handles fixed-size types and C types transparently to enforce consistency and limit the number of casts. Yet, only fixed-size types guarantee that there is no time spent in casting operation. It is therefore recommended to use fixed-size integers as the template parameters (e.g. std::int16_t
).
Scaling is an affine transform specified in the Fits files as a pair of offset and scale parameters. The transform is applied at read-time by CFitsIO (and therefore EleFits), and at write-time when the type is not natively supported by Fits. For example, Fits does not support unsigned 16-bit integer values, but CFitsIO (and EleFits) does. To this end, it offsets the value to be written, writes it, and writes the offset parameter. This obviously has a non-negligible cost when applied to many values, like for table columns. In order to avoid scaling, users should stick to native Fits types (e.g. std::int16_t
but not std::uint16_t
).
Fits files are made of consecutive HDUs, themeselves made of an ASCII header unit followed by a binary data unit. Reading or writing data to or from an HDU consists in:
In order to limit as much as possible the time spent in the first two operations, it is recommended to exploit each HDU as much as possible before moving to another one, to follow the HDU ordering, and to group operations in the header and then in the data unit. For example, read all the records you need to read from an HDU at once, or write all the columns you need to write in a table at once.
Ultimate optimization when creating an HDU is to complete the header unit first before writing the data unit:
For very large files, following the in-file HDU ordering can also make a difference.
EleFits is providing HDU handlers as light objects designed to be passed by reference. The other options are to specify the HDU index or name instead of passing the handler itself. This is subobtimal because:
EXTNAME
keyword is searched for, until its value matches the name.EleFits provides services to read and write bunches of data through one single call. The reason is to skip steps (like bunny hopping in the file) and lower runtime and memory footprint.
This is especially true for binary table writing. Without diving into the implementation details, let's give a brief overview of what's happening in CFitsIO. Fits tables are stored row-wise. When reading a single column, CFitsIO has to load each of the rows. When reaching the last value of the target column, CFitsIO has finally loaded the whole set of columns!
EleFits's multi-column reading and writing functions internally take advantage of a CFitsIO buffer to avoid passing through the data unit several times.
The strategy holds for record IOs too, yet with smaller improvements to be expected.
In order to lower the impact of reading table data colomn-wise, CFitsIO documentation recommends writing vector columns even for scalar concepts. That is, if scalar columns of 10,000 rows should be written, rather write 10,000 value-wide vector columns over one row. This indeed allows storing the column values contiguously, and therefore speeds up the accesses.
We think software optimization should not obfuscate the data, and the trick should be avoided as much as possible. Aforementioned multi-column reading and writing services of EleFits have no counterpart in CFitsIO, which makes the trick more relevant to CFitsIO users than to EleFits users.
To go further, CFitsIO "iterator" mechanism is the way to go for performance; it is a clean solution which keeps the data in the logical format. It has no counterpart in EleFits. For now...