EleFits  5.3.1
A modern C++ API on top of CFITSIO
Loading...
Searching...
No Matches
Classes
Image compression

Detailed Description

Compress and decompress image HDUs.

TL;DR

To enable the compression of image HDUs, simply activate one compression strategy, like CompressAuto as follows:

MefFile f(filename, FileMode::Create);
f.strategy(CompressAuto(CompressionType::LosslessInts)); // Activate compression
f.append_image("", {}, raster); // Automatically compressed
@ Create
Create a new file (overwrite forbidden)
@ LosslessInts
Lossless for integers, possibly lossy otherwise.

At read-time, nothing special has to be done.

External vs. Internal Compressions

FITS supports two compression approaches:

Both compression approaches can be combined.

External Compression

External compression is transparent: if the filename ends with .gz, then external compression is enabled. From the API point of view, no change is required wrt. uncompressed files.

Internal Compression

Principle

For a start, here is how the FITS standard definition document (v4.0) introduces the internal compression:

‍The general principle is to first divide the N-dimensional image into a rectangular grid of subimages or "tiles." Each tile is then compressed as a block of data, and the resulting compressed byte stream is stored in a row of a variable-length column in a FITS binary table. By dividing the image into tiles it is possible to extract and decompress subsections of the image without having to decompress the whole image. The default tiling pattern treats each row of a two-dimensional image (or higher-dimensional cube) as a tile, such that each tile contains NAXIS1 pixels. This default may not be optimal for some applications or compression algorithms, so any other rectangular tiling pattern may be defined [...]. In the case of relatively small images it may suffice to compress the entire image as a single tile, resulting in an output binary table containing a single row. In the case of three-dimensional data cubes, it may be advantageous to treat each plane of the cube as a separate tile if application software typically needs to access the cube on a plane-by-plane basis.

Standard FITS files are made of HDUs which can be either images or tables. Binary tables can contain variable-length columns (which are not publically supported by EleFits): such columns can hold vector values of varying length.

Internally compressed image HDUs are partitioned into regular non-overlapping boxes called tiles (typically, rows). Each of the tiles is compressed with a given algorithm, and the compressed tile values are stored in a cell of a binary table. The compressed tiles have different sizes since the compression rate depends on the actual pixel values, which is why variable-length columns are used.

In practice, this means that a compressed image HDU is effectively stored as a binary table HDU, although with specific keywords stating that this is no classical table. Most viewers and libraries (including EleFits) provide an image HDU interface for them, i.e. at reading, it is not necessary to know whether an image is compressed or not. For writing, compression must be explicitely enabled before creating the HDU, but then classical write functions are used as if the HDU was not compressed.

Algorithms and Parameters

EleFits, like CFITSIO, supports several compression algorithms. They are implemented as independent data classes responsible for storing the parameters. Let us first introduce briefly the various algorithms and parameters, before discussing the interfaces.

All internal compression algorithms act on data regions named tiles, which are compressed and decompressed independently. Tiles are represented by their common shape (edge tiles may be cropped). Generally, the default tile is a row or an amount of consecutive rows. When the images to be compressed have a small width (say, less than a thousand pixels), such a tile might be too small. In this case, using larger tiles is recommended. For smallest images (a few thousands of pixels at most), the whole data can even be used. When image processing is planned to be performed tile-wise (e.g. row-by-row), then the compression tile should relate to the processing tile such that compression and processing tile borders match as often as possible.

As opposed to CFITSIO, by default, compression is lossless whatever the pixel type. If one tries to use some lossless algorithm with incompatible data type (e.g. H-compress with double), an error is thrown. Lossy compression is available for floating point data, by enabling quantization. Quantization is a conversion from floating point to integer values, performed before the actual compression algorithm is run. This is a first level of compression, where the least significant digits of a floating point number are dropped.

Additionally, H-compress is able to compress integer data with loss. This is achieved by enabling so-called scaling.

Here is a brief description of the supported algorithms. More details can be found in the associated classes documentation.

Class Integral data Floating point data
Gzip Always lossless Lossless iff quantization is disabled
ShuffledGzip Always lossless Lossless iff quantization is disabled
Rice Always lossless Always lossy
HCompress Lossless iff scaling is disabled Always lossy
Plio Always lossless Unsupported
See also
R. L. White, P. Greenfield, W. Pence, D. Tody, R. Seaman. Tiled Image Convention for Storing Compressed Images in FITS Binary Tables.
Warning
64-bit integers cannot be compressed due to a CFITSIO limitation.

Interface

As mentioned in the previous section, compression algorithms are represented by dedicated classes. In addition, NoCompression can be used to disable internal compression. All those classes implement the Compression interface. Generic compression parameters are the tiling shape (as a Position<-1> object) and quantization (as a Quantization object). More details can be found in the classes and methods documentation (see list below).

Once the Compression instance is created, it can be passed to a MefFile through a so-called compression strategy. The strategy is a very powerful object which adapts the compression algorithm to the HDU being created (see File-level strategies). It is composed of a list (possily a singleton) of compression actions, which are tried one after the other. If an action is successful, trying stops. The simplest compression action is made of a single compression algorithm, Compress, which consists in applying the same compression algorithm to all HDUs unless impossible. A list of such actions can build up a compression strategy which selects the preferred algorithm which works. For example, registering Compress<Plio> and then Compress<Rice> will activate PLIO if possible, and Rice otherwise, or not compress at all if the HDU is very small.

More elaborated actions can be user-defined by extending the base class CompressionAction (e.g. to use PLIO only for HDUs whose name ends with "MASK"). The generic action CompressAuto is the recommended default. It draws on the various papers pointed by the FITS support office and on internal benchmarks, and has variants for lossless and lossy compressions.

As soon as the strategy is enabled (with MefFile::strategy()), newly created image HDUs will automatically be compressed. Note that it is not possible to convert an uncompressed image HDU into a compressed one in place, and compression must be activated before creating HDUs. The aforementioned strategy can be enabled as follows:

MefFile f(filename, FileMode::Create);
f.strategy(Compress<Plio>(), Compress<Rice>());
f.append_image("", {}, raster);

Guidelines

Don't Compress the Primary

Given that the Primary HDU is necessarily an image HDU, it cannot be compressed. Generally, if the Primary has to be internally compressed (e.g. for SIF files), then an extension is added after an empty Primary. Therefore, even lossless compression may be non-idempotent: indeed, decompressing such a file would result in a MEF file with an empty Primary and decompressed extension corresponding to the input Primary, instead of a single decompressed Primary. Therefore, it is recommended to avoid writing data to the Primary HDU, or at least to not compress it and copy it verbatim (which is the default behavior of EleFitsCompress). This way, compressing and decompressing the file returns the original file.

# Default: copy the Primary
EleFitsCompress original.fits compressed.fits
EleFitsDecompress compressed.fits decompressed.fits
# decompressed.fits = original.fits
# Compress the Primary, too
EleFitsCompress original.fits compressed.fits --primary
EleFitsDecompress compressed.fits decompressed.fits
# decompressed.fits = empty Primary + original.fits

Use Lossless Compression for Integers

As stated above, lossy compression of integral data brings little improvement over lossless compression.

Give a Chance to Lossy Compression for Floating Points

Although scary, lossy compression of floating point images generally preserves signal very well, while producing much higher compression rates than lossless compression, even with very conservative quantization. As an order of magnitude, expect a compression ratio of around 10 with quantization = RMS / 16, instead of just above 1 with quantization turned of.

Try CompressAuto!

This is the default recommendation when you have no specific knowledge on the data stationnarity. It was designed with care based on already published studies, and tuned with internal benchmarks. More specifically, following the previous guideline, try CompressAuto(CompressionType::LosslessInts)!

Classes

class  AlgoMixin< TDerived >
 Intermediate class for internal dispatching. More...
 
class  Compress< TAlgo >
 A compression action made of a single algorithm. More...
 
class  CompressAuto
 A basic adaptive compression strategy. More...
 
class  CompressFloats< TAlgo >
 A restriction of Compress to floating point values. More...
 
class  CompressInts< TAlgo >
 A restriction of Compress to integral values. More...
 
class  Compression
 Interface for compression algorithms. More...
 
class  CompressionAction
 The interface for implementing compression actions. More...
 
class  CompressionActionMixin< TDerived >
 A mixin to simplify CompressionAction implementation. More...
 
class  Gzip
 The GZIP algorithm. More...
 
class  HCompress
 The H-compress algorithm. More...
 
class  NoCompression
 No compression. More...
 
class  Plio
 The PLIO algorithm. More...
 
class  Quantization
 Quantization parameters. More...
 
class  Rice
 The Rice algorithm. More...
 
class  ShuffledGzip
 The GZIP algorithm applied to "shuffled" pixel values. More...
 
struct  Tile
 Helper class for tile-related parameters. More...