fcollections.implementations

fcollections.implementations#

Module Attributes

`AVISO_L2_LR_SSH_LAYOUT`	Layout on Aviso FTP, Aviso TDS for the L2_LR_SSH product
`AVISO_L3_LR_SSH_LAYOUT_V3`	Layout on Aviso FTP, Aviso TDS for the L3_LR_SSH product
`AVISO_L3_LR_SSH_LAYOUT_V2`	Layout on Aviso FTP, Aviso TDS for the L3_LR_SSH product
`AVISO_L3_LR_WINDWAVE_LAYOUT`	Layout on Aviso FTP, Aviso TDS for the L3_LR_WindWave product
`AVISO_L4_SWOT_LAYOUT`	Layout on Aviso FTP, Aviso TDS for the L4 Sea Level Anomaly experimental product including karin measurements
`CMEMS_SSHA_L3_LAYOUT`	Layout on CMEMS for the Level 3 SSHA nadir products
`CMEMS_L4_SSHA_LAYOUT`	Layout on CMEMS for the Level 4 SSHA gridded products
`CMEMS_OC_LAYOUT`	Layout on CMEMS for the Level 3 and 4 ocean colour products
`CMEMS_SWH_LAYOUT`	Layout on CMEMS for the WAVE_GLO_PHY_SWH_L3_NRT_014_001 product
`CMEMS_SST_LAYOUT`	Layout on CMEMS for the SST_GLO_SST_L3S_NRT_OBSERVATIONS_010_010 product

Functions

build_version_parser()

Build file name convention to parse CRID versions.

Classes

`BasicNetcdfFilesDatabaseDAC`(path[, fs])	Database mapping to select and read Dynamic atmospheric correction Netcdf files in a local file system.
`BasicNetcdfFilesDatabaseGriddedSLA`(path[, ...])	Database mapping to select and read gridded Sla Netcdf files in a local file system.
`BasicNetcdfFilesDatabaseSwotLRL2`(path[, fs, ...])	Database mapping to select and read Swot LR L2 Netcdf files in a local file system.
`BasicNetcdfFilesDatabaseL2Nadir`(path[, fs, ...])	Database mapping to select and read L2 nadir Netcdf files in a local file system.
`BasicNetcdfFilesDatabaseSwotLRL3`(path[, fs, ...])	Database mapping to select and read Swot LR L3 Netcdf files in a local file system.
`BasicNetcdfFilesDatabaseSwotLRWW`(path[, fs, ...])	Database mapping to explore and read the L3_LR_WIND_WAVE product.
`BasicNetcdfFilesDatabaseL3Nadir`(path[, fs, ...])	Database mapping to select and read L3 nadir Netcdf files in a local file system.
`BasicNetcdfFilesDatabaseMUR`(path[, fs, ...])	Database mapping to select and read GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis product Netcdf file in a local file system.
`BasicNetcdfFilesDatabaseOC`(path[, fs, ...])	Database mapping to select and read ocean color Netcdf files in a local file system.
`BasicNetcdfFilesDatabaseOHC`(path[, fs, ...])	Database mapping to select and read ocean heat content Netcdf files in a local file system.
`BasicNetcdfFilesDatabaseSST`(path[, fs, ...])	Database mapping to select and read sea surface temperature Netcdf files in a local file system.
`NetcdfFilesDatabaseSwotLRL2`(path[, fs, ...])
`NetcdfFilesDatabaseSwotLRL3`(path[, fs, ...])
`NetcdfFilesDatabaseGriddedSLA`(path[, fs, ...])
`NetcdfFilesDatabaseSST`(path[, fs, ...])
`NetcdfFilesDatabaseDAC`(path[, fs])
`NetcdfFilesDatabaseOC`(path[, fs, ...])
`NetcdfFilesDatabaseSWH`(path[, fs, ...])
`BasicNetcdfFilesDatabaseSWH`(path[, fs, ...])	Database mapping to select and read significant wave height Netcdf files in a local file system.
`NetcdfFilesDatabaseOHC`(path[, fs, ...])
`NetcdfFilesDatabaseS1AOWI`(path[, fs, ...])	Database mapping to select and read S1A Ocean surface wind product Netcdf files in a local file system.
`NetcdfFilesDatabaseMUR`(path[, fs, ...])
`NetcdfFilesDatabaseERA5`(path[, fs, ...])	Database mapping to select and read ERA5 reanalysis product Netcdf files in a local file system.
`NetcdfFilesDatabaseL2Nadir`(path[, fs, ...])
`NetcdfFilesDatabaseL3Nadir`(path[, fs, ...])
`NetcdfFilesDatabaseSwotLRWW`(path[, fs, ...])
`FileNameConventionERA5`()
`FileNameConventionOC`()	Ocean Color datafiles parser.
`FileNameConventionGriddedSLA`()	Gridded SLA datafiles parser.
`FileNameConventionGriddedSLAInternal`()
`FileNameConventionSST`()	Sea Surface Temperature datafiles parser.
`FileNameConventionDAC`()
`FileNameConventionSwotL2`()	Swot LR L2 datafiles parser.
`FileNameConventionSwotL3`()	Swot LR L3 datafiles parser.
`FileNameConventionSwotL3WW`()	Swot L3_LR_WIND_WAVE product file names convention.
`FileNameConventionOHC`()
`FileNameConventionS1AOWI`()
`FileNameConventionMUR`()
`FileNameConventionSWH`()
`FileNameConventionL2Nadir`()	L2 Nadir datafiles parser.
`FileNameConventionL3Nadir`()	L3 Nadir datafiles parser.
`SwotReaderL2LRSSH`([xarray_options])	Reader for SWOT KaRIn L2_LR_SSH products.
`SwotReaderL3LRSSH`([xarray_options])	Reader for SWOT KaRIn L3_LR_SSH products.
`SwotReaderL3WW`([xarray_options])	Reader for the SWOT L3_LR_WIND_WAVE product.
`Delay`(*values)	Delay definition for L3 and L4 sea level products.
`ProductLevel`(*values)	Product level.
`Origin`(*values)	Dataset origin.
`Group`(*values)	Dataset group.
`ProductClass`(*values)	Dataset product class.
`DataType`(*values)	Dataset type.
`Thematic`(*values)	Dataset thematic.
`Area`(*values)	Dataset area of interest.
`Variable`(*values)	Dataset variable group.
`Typology`(*values)	Dataset typology.
`Sensors`(*values)	Aggregation of sensors for multiple CMEMS products.
`Temporality`(*values)	Temporality of the L3_LR_SSH product.
`ProductSubset`(*values)	Swot product subset enum.
`SwotPhases`(*values)	Swot mission phases definitions.
`StackLevel`(*values)	Stack level for swath half orbits on reference grid.
`Timeliness`(*values)	Timeliness of the SWOT L2_LR_SSH products.
`L2Version`([temporality, baseline, ...])	Represents a L2 Version of half orbits and enables version comparison.
`L2VersionField`(name[, ignore_product_counter])
`AcquisitionMode`(*values)
`S1AOWIProductType`(*values)
`S1AOWISlicePostProcessing`(*values)

fcollections.implementations.AVISO_L2_LR_SSH_LAYOUT: Layout = <fcollections.core._listing.Layout object>#: Layout on Aviso FTP, Aviso TDS for the L2_LR_SSH product

fcollections.implementations.AVISO_L3_LR_SSH_LAYOUT_V2: Layout = <fcollections.core._listing.Layout object>#: Layout on Aviso FTP, Aviso TDS for the L3_LR_SSH product

fcollections.implementations.AVISO_L3_LR_SSH_LAYOUT_V3: Layout = <fcollections.core._listing.Layout object>#: Layout on Aviso FTP, Aviso TDS for the L3_LR_SSH product

fcollections.implementations.AVISO_L3_LR_WINDWAVE_LAYOUT: Layout = <fcollections.core._listing.Layout object>#: Layout on Aviso FTP, Aviso TDS for the L3_LR_WindWave product

fcollections.implementations.AVISO_L4_SWOT_LAYOUT: Layout = <fcollections.core._listing.Layout object>#: Layout on Aviso FTP, Aviso TDS for the L4 Sea Level Anomaly experimental product including karin measurements

class fcollections.implementations.AcquisitionMode(*values)[source]#

Bases: Enum

EW = 2#

IW = 1#

SM = 4#

WV = 3#

class fcollections.implementations.Area(*values)[source]#

Bases: Enum

Dataset area of interest.

ANT = 3#: Antarctic.

ARC = 2#: Arctic.

ATL = 1#: Atlantic.

BAL = 4#: Baltic.

BLK = 5#: Black sea.

EUR = 6#: Europe.

GLO = 7#: Global.

IBI = 8#: Iberian sea.

MED = 9#: Mediterranean.

NWS = 10#: North west shelf.

class fcollections.implementations.BasicNetcdfFilesDatabaseDAC(path: Path, fs: fsspec.AbstractFileSystem = fs_loc.LocalFileSystem())[source]#

Bases: FilesDatabase, DiscreteTimesMixin

Database mapping to select and read Dynamic atmospheric correction Netcdf files in a local file system.

layouts: list[Layout] | None = [<fcollections.core._listing.Layout object>]#

Semantic describing how the files are organized.

Useful to extract information and have an efficient file system scanning. The pre-configured layouts can mismatch the current files organization, in which case the user can build its own or set enable_layouts to False.

list_files(sort: bool = False, deduplicate: bool = False, unmix: bool = False, predicates: tp.Iterable[IPredicate] = (), stat_fields: tuple[str] = (), *, time: datetime64)#

List the files matching the given criteria.

Parameters:

sort – Sort the results using the sort_keys attribute if this class
deduplicate – In case the class deduplicator is defined, the results are analyzed to search for duplicates according to a set of unique keys. In case duplicates are found, deduplication is run along a set of defined columns where duplicates are expected to occur
unmix – Multiple subsets may be mixed in the files metadata table. Use this argument to separate the subsets. An auto pick will also be performed according to the SubsetsUnmixer instance of this class. In case the auto pick cannot get a unique subset, an error is raised deduplication operation is done and if there are still duplicates, an error is raised
predicates – Additional complex filters to run on the record parsed by the filename. ex. lambda record: record[1] in [1, 4, 5]. Predicates are knowledgeable about the record contents and the file name convention
stat_fields – File system information that can be retrieved from the fsspec underlying implementation. For example, ‘size’ or ‘created’ are valid for a local file system
time – As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Raises:

ValueError – In case unmix is True, an error is raised if one unique and homogeneous subset cannot be extracted from the files metadata table
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

listing_parameters = {'time': <Parameter "time: numpy.datetime64">}#

map(func: tp.Callable[[xr_t.Dataset, dict[str, tp.Any]], tp.Any], *, selected_variables: list[str] | None = None, time: datetime64)#

Map a function over dataset extracted from the files.

Parameters:

func – Callable that works on a xarray dataset.
selected_variables – Variables that needs to be read. Set to None to read everything
time – As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Raises:

NotImplementedError – In case dask is not available
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

metadata_injection: dict[str, tuple[str, ...]] | None = {'time': ('time',)}#

Configures how metadata from the files listing can be injected in a dataset returned from the read.

The keys is the columns of the file metadata table, the value is a tuple of dimensions for insertion.

query(*, selected_variables: list[str] | None = None, time: datetime64)#

Query a dataset by reading selected files in file system.

Parameters:

selected_variables – Variables that needs to be read. Set to None to read everything
time – As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Returns:

A dataset containing the result of the query, or an None if there is
nothing matching the query

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

reader: IFilesReader | None = <fcollections.core._readers.OpenMfDataset object>#: Files reader.

reading_parameters = {'selected_variables': <Parameter "selected_variables: 'list[str] | None' = None">}#

sort_keys: list[str] | str | None = ['time']#

Keys that specifies the fields used to sort the records extracted from the filenames.

Useful to order the files prior to reading them.

variables_info()#

Returns the variables metadata.

Because the files collection may mix multiple subsets, we want to ensure that we return the variables of one subset only. The parameters of this method are the subset partitioning keys and can be given by the user to ensure a consistent set of variables. If the input parameters are not sufficient to unmix the subsets, the user will be notified with a ValueError

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

class fcollections.implementations.BasicNetcdfFilesDatabaseGriddedSLA(path: str, fs: AbstractFileSystem = LocalFileSystem(), enable_layouts: bool = True, follow_symlinks: bool = False)[source]#

Bases: FilesDatabase, PeriodMixin

Database mapping to select and read gridded Sla Netcdf files in a local file system.

layouts: list[Layout] | None = [<fcollections.core._listing.Layout object>, <fcollections.core._listing.Layout object>, <fcollections.core._listing.Layout object>]#

Semantic describing how the files are organized.

Useful to extract information and have an efficient file system scanning. The pre-configured layouts can mismatch the current files organization, in which case the user can build its own or set enable_layouts to False.

list_files(sort: bool = False, deduplicate: bool = False, unmix: bool = False, predicates: tp.Iterable[IPredicate] = (), stat_fields: tuple[str] = (), *, delay: Delay, time: Period, production_date: datetime64)#

List the files matching the given criteria.

Parameters:

sort – Sort the results using the sort_keys attribute if this class
deduplicate – In case the class deduplicator is defined, the results are analyzed to search for duplicates according to a set of unique keys. In case duplicates are found, deduplication is run along a set of defined columns where duplicates are expected to occur
unmix – Multiple subsets may be mixed in the files metadata table. Use this argument to separate the subsets. An auto pick will also be performed according to the SubsetsUnmixer instance of this class. In case the auto pick cannot get a unique subset, an error is raised deduplication operation is done and if there are still duplicates, an error is raised
predicates – Additional complex filters to run on the record parsed by the filename. ex. lambda record: record[1] in [1, 4, 5]. Predicates are knowledgeable about the record contents and the file name convention
stat_fields – File system information that can be retrieved from the fsspec underlying implementation. For example, ‘size’ or ‘created’ are valid for a local file system
delay – Delay. As an Enum field, it can be filtered using a reference <enum ‘Delay’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘NRT’, ‘DT’]
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
production_date – Production date of a given file. The same granule is regenerated multiple times with updated corrections. Hence there can be multiple files for the same period, but with a different production date. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Raises:

ValueError – In case unmix is True, an error is raised if one unique and homogeneous subset cannot be extracted from the files metadata table
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

listing_parameters = {'delay': <Parameter "delay: fcollections.implementations._definitions._constants.Delay">, 'production_date': <Parameter "production_date: numpy.datetime64">, 'time': <Parameter "time: fcollections.time._periods.Period">}#

map(func: tp.Callable[[xr_t.Dataset, dict[str, tp.Any]], tp.Any], *, selected_variables: list[str] | None = None, delay: Delay, time: Period, production_date: datetime64)#

Map a function over dataset extracted from the files.

Parameters:

func – Callable that works on a xarray dataset.
selected_variables – Variables that needs to be read. Set to None to read everything
delay – Delay. As an Enum field, it can be filtered using a reference <enum ‘Delay’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘NRT’, ‘DT’]
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
production_date – Production date of a given file. The same granule is regenerated multiple times with updated corrections. Hence there can be multiple files for the same period, but with a different production date. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Raises:

NotImplementedError – In case dask is not available
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

query(*, selected_variables: list[str] | None = None, delay: Delay, time: Period, production_date: datetime64)#

Query a dataset by reading selected files in file system.

Parameters:

selected_variables – Variables that needs to be read. Set to None to read everything
delay – Delay. As an Enum field, it can be filtered using a reference <enum ‘Delay’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘NRT’, ‘DT’]
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
production_date – Production date of a given file. The same granule is regenerated multiple times with updated corrections. Hence there can be multiple files for the same period, but with a different production date. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Returns:

A dataset containing the result of the query, or an None if there is
nothing matching the query

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

reader: IFilesReader | None = <fcollections.core._readers.OpenMfDataset object>#: Files reader.

reading_parameters = {'selected_variables': <Parameter "selected_variables: 'list[str] | None' = None">}#

sort_keys: list[str] | str | None = 'time'#

Keys that specifies the fields used to sort the records extracted from the filenames.

Useful to order the files prior to reading them.

variables_info()#

Returns the variables metadata.

Because the files collection may mix multiple subsets, we want to ensure that we return the variables of one subset only. The parameters of this method are the subset partitioning keys and can be given by the user to ensure a consistent set of variables. If the input parameters are not sufficient to unmix the subsets, the user will be notified with a ValueError

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

class fcollections.implementations.BasicNetcdfFilesDatabaseL2Nadir(path: str, fs: AbstractFileSystem = LocalFileSystem(), enable_layouts: bool = True, follow_symlinks: bool = False)[source]#

Bases: FilesDatabase, PeriodMixin

Database mapping to select and read L2 nadir Netcdf files in a local file system.

layouts: list[Layout] | None = [<fcollections.core._listing.Layout object>]#

Semantic describing how the files are organized.

Useful to extract information and have an efficient file system scanning. The pre-configured layouts can mismatch the current files organization, in which case the user can build its own or set enable_layouts to False.

list_files(sort: bool = False, deduplicate: bool = False, unmix: bool = False, predicates: tp.Iterable[IPredicate] = (), stat_fields: tuple[str] = (), *, cycle_number: list[int] | slice | int, pass_number: list[int] | slice | int, time: Period)#

List the files matching the given criteria.

Parameters:

sort – Sort the results using the sort_keys attribute if this class
deduplicate – In case the class deduplicator is defined, the results are analyzed to search for duplicates according to a set of unique keys. In case duplicates are found, deduplication is run along a set of defined columns where duplicates are expected to occur
unmix – Multiple subsets may be mixed in the files metadata table. Use this argument to separate the subsets. An auto pick will also be performed according to the SubsetsUnmixer instance of this class. In case the auto pick cannot get a unique subset, an error is raised deduplication operation is done and if there are still duplicates, an error is raised
predicates – Additional complex filters to run on the record parsed by the filename. ex. lambda record: record[1] in [1, 4, 5]. Predicates are knowledgeable about the record contents and the file name convention
stat_fields – File system information that can be retrieved from the fsspec underlying implementation. For example, ‘size’ or ‘created’ are valid for a local file system
cycle_number – Cycle number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
pass_number – Pass number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting

Raises:

ValueError – In case unmix is True, an error is raised if one unique and homogeneous subset cannot be extracted from the files metadata table
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

listing_parameters = {'cycle_number': <Parameter "cycle_number: list[int] | slice | int">, 'pass_number': <Parameter "pass_number: list[int] | slice | int">, 'time': <Parameter "time: fcollections.time._periods.Period">}#

Map a function over dataset extracted from the files.

Parameters:

func – Callable that works on a xarray dataset.
selected_variables – Variables that needs to be read. Set to None to read everything
cycle_number – Cycle number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
pass_number – Pass number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting

Raises:

NotImplementedError – In case dask is not available
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

Query a dataset by reading selected files in file system.

Parameters:

selected_variables – Variables that needs to be read. Set to None to read everything
cycle_number – Cycle number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
pass_number – Pass number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting

Returns:

A dataset containing the result of the query, or an None if there is
nothing matching the query

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

reader: IFilesReader | None = <fcollections.core._readers.OpenMfDataset object>#: Files reader.

reading_parameters = {'selected_variables': <Parameter "selected_variables: 'list[str] | None' = None">}#

sort_keys: list[str] | str | None = 'time'#

Keys that specifies the fields used to sort the records extracted from the filenames.

Useful to order the files prior to reading them.

variables_info()#

Returns the variables metadata.

Because the files collection may mix multiple subsets, we want to ensure that we return the variables of one subset only. The parameters of this method are the subset partitioning keys and can be given by the user to ensure a consistent set of variables. If the input parameters are not sufficient to unmix the subsets, the user will be notified with a ValueError

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

class fcollections.implementations.BasicNetcdfFilesDatabaseL3Nadir(path: str, fs: AbstractFileSystem = LocalFileSystem(), enable_layouts: bool = True, follow_symlinks: bool = False)[source]#

Bases: FilesDatabase, PeriodMixin

Database mapping to select and read L3 nadir Netcdf files in a local file system.

deduplicator: Deduplicator | None = Deduplicator(unique=('time',), auto_pick_last=('production_date',))#: Deduplicate the file metadata table of a unique subset (after unmixing).

layouts: list[Layout] | None = [<fcollections.core._listing.Layout object>, <fcollections.core._listing.Layout object>]#

Semantic describing how the files are organized.

Useful to extract information and have an efficient file system scanning. The pre-configured layouts can mismatch the current files organization, in which case the user can build its own or set enable_layouts to False.

list_files(sort: bool = False, deduplicate: bool = False, unmix: bool = False, predicates: tp.Iterable[IPredicate] = (), stat_fields: tuple[str] = (), *, delay: Delay, time: Period, production_date: datetime64, sensor: Sensors, product_level: ProductLevel, resolution: list[int] | slice | int)#

List the files matching the given criteria.

Parameters:

sort – Sort the results using the sort_keys attribute if this class
deduplicate – In case the class deduplicator is defined, the results are analyzed to search for duplicates according to a set of unique keys. In case duplicates are found, deduplication is run along a set of defined columns where duplicates are expected to occur
unmix – Multiple subsets may be mixed in the files metadata table. Use this argument to separate the subsets. An auto pick will also be performed according to the SubsetsUnmixer instance of this class. In case the auto pick cannot get a unique subset, an error is raised deduplication operation is done and if there are still duplicates, an error is raised
predicates – Additional complex filters to run on the record parsed by the filename. ex. lambda record: record[1] in [1, 4, 5]. Predicates are knowledgeable about the record contents and the file name convention
stat_fields – File system information that can be retrieved from the fsspec underlying implementation. For example, ‘size’ or ‘created’ are valid for a local file system
delay – Delay. As an Enum field, it can be filtered using a reference <enum ‘Delay’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘NRT’, ‘DT’]
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
production_date – Production date of a given file. The same granule is regenerated multiple times with updated corrections. Hence there can be multiple files for the same period, but with a different production date. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])
sensor – As an Enum field, it can be filtered using a reference <enum ‘Sensors’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘C2’, ‘C2N’, ‘EN’, ‘ENN’, ‘E1’, ‘E1G’, ‘E2’, ‘G2’, ‘H2A’, ‘H2AG’, ‘H2B’, ‘J1’, ‘J1G’, ‘J1N’, ‘J2’, ‘J2N’, ‘J2G’, ‘J3’, ‘J3N’, ‘J3G’, ‘AL’, ‘ALG’, ‘S3A’, ‘S3B’, ‘S6A’, ‘S6A_LR’, ‘S6A_HR’, ‘SWON’, ‘SWONC’, ‘TP’, ‘TPN’, ‘ALLSAT’, ‘DEMO_ALLSAT_SWOTS’, ‘ALLSAT_SWOS’, ‘CFO’, ‘H2C’, ‘SWOT’, ‘GIR’, ‘PIR’, ‘PMW’, ‘OLCI’, ‘MULTI’]
product_level – Product level of the data. As an Enum field, it can be filtered using a reference <enum ‘ProductLevel’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘L2’, ‘L3’, ‘L4’]
resolution – Data resolution. Nadir products may be sampled at 1Hz, 5Hz or 20Hz depending on the level and dataset considered. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.

Raises:

ValueError – In case unmix is True, an error is raised if one unique and homogeneous subset cannot be extracted from the files metadata table
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

listing_parameters = {'delay': <Parameter "delay: fcollections.implementations._definitions._constants.Delay">, 'product_level': <Parameter "product_level: fcollections.implementations._definitions._constants.ProductLevel">, 'production_date': <Parameter "production_date: numpy.datetime64">, 'resolution': <Parameter "resolution: list[int] | slice | int">, 'sensor': <Parameter "sensor: fcollections.implementations._definitions._cmems.Sensors">, 'time': <Parameter "time: fcollections.time._periods.Period">}#

map(func: tp.Callable[[xr_t.Dataset, dict[str, tp.Any]], tp.Any], *, selected_variables: list[str] | None = None, delay: Delay, time: Period, production_date: datetime64, sensor: Sensors, product_level: ProductLevel, resolution: list[int] | slice | int)#

Map a function over dataset extracted from the files.

Parameters:

func – Callable that works on a xarray dataset.
selected_variables – Variables that needs to be read. Set to None to read everything
delay – Delay. As an Enum field, it can be filtered using a reference <enum ‘Delay’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘NRT’, ‘DT’]
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
production_date – Production date of a given file. The same granule is regenerated multiple times with updated corrections. Hence there can be multiple files for the same period, but with a different production date. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])
sensor – As an Enum field, it can be filtered using a reference <enum ‘Sensors’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘C2’, ‘C2N’, ‘EN’, ‘ENN’, ‘E1’, ‘E1G’, ‘E2’, ‘G2’, ‘H2A’, ‘H2AG’, ‘H2B’, ‘J1’, ‘J1G’, ‘J1N’, ‘J2’, ‘J2N’, ‘J2G’, ‘J3’, ‘J3N’, ‘J3G’, ‘AL’, ‘ALG’, ‘S3A’, ‘S3B’, ‘S6A’, ‘S6A_LR’, ‘S6A_HR’, ‘SWON’, ‘SWONC’, ‘TP’, ‘TPN’, ‘ALLSAT’, ‘DEMO_ALLSAT_SWOTS’, ‘ALLSAT_SWOS’, ‘CFO’, ‘H2C’, ‘SWOT’, ‘GIR’, ‘PIR’, ‘PMW’, ‘OLCI’, ‘MULTI’]
product_level – Product level of the data. As an Enum field, it can be filtered using a reference <enum ‘ProductLevel’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘L2’, ‘L3’, ‘L4’]
resolution – Data resolution. Nadir products may be sampled at 1Hz, 5Hz or 20Hz depending on the level and dataset considered. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.

Raises:

NotImplementedError – In case dask is not available
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

query(*, selected_variables: list[str] | None = None, delay: Delay, time: Period, production_date: datetime64, sensor: Sensors, product_level: ProductLevel, resolution: list[int] | slice | int)#

Query a dataset by reading selected files in file system.

Parameters:

selected_variables – Variables that needs to be read. Set to None to read everything
delay – Delay. As an Enum field, it can be filtered using a reference <enum ‘Delay’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘NRT’, ‘DT’]
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
production_date – Production date of a given file. The same granule is regenerated multiple times with updated corrections. Hence there can be multiple files for the same period, but with a different production date. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])
sensor – As an Enum field, it can be filtered using a reference <enum ‘Sensors’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘C2’, ‘C2N’, ‘EN’, ‘ENN’, ‘E1’, ‘E1G’, ‘E2’, ‘G2’, ‘H2A’, ‘H2AG’, ‘H2B’, ‘J1’, ‘J1G’, ‘J1N’, ‘J2’, ‘J2N’, ‘J2G’, ‘J3’, ‘J3N’, ‘J3G’, ‘AL’, ‘ALG’, ‘S3A’, ‘S3B’, ‘S6A’, ‘S6A_LR’, ‘S6A_HR’, ‘SWON’, ‘SWONC’, ‘TP’, ‘TPN’, ‘ALLSAT’, ‘DEMO_ALLSAT_SWOTS’, ‘ALLSAT_SWOS’, ‘CFO’, ‘H2C’, ‘SWOT’, ‘GIR’, ‘PIR’, ‘PMW’, ‘OLCI’, ‘MULTI’]
product_level – Product level of the data. As an Enum field, it can be filtered using a reference <enum ‘ProductLevel’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘L2’, ‘L3’, ‘L4’]
resolution – Data resolution. Nadir products may be sampled at 1Hz, 5Hz or 20Hz depending on the level and dataset considered. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.

Returns:

A dataset containing the result of the query, or an None if there is
nothing matching the query

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

reader: IFilesReader | None = <fcollections.core._readers.OpenMfDataset object>#: Files reader.

reading_parameters = {'selected_variables': <Parameter "selected_variables: 'list[str] | None' = None">}#

sort_keys: list[str] | str | None = 'time'#

Keys that specifies the fields used to sort the records extracted from the filenames.

Useful to order the files prior to reading them.

unmixer: SubsetsUnmixer | None = SubsetsUnmixer(partition_keys=['sensor', 'resolution'], auto_pick_last=())#: Specify how to interpret the file metadata table to unmix subsets.

variables_info(*, sensor: Sensors, resolution: list[int] | slice | int)#

Returns the variables metadata.

Because the files collection may mix multiple subsets, we want to ensure that we return the variables of one subset only. The parameters of this method are the subset partitioning keys and can be given by the user to ensure a consistent set of variables. If the input parameters are not sufficient to unmix the subsets, the user will be notified with a ValueError

Parameters:

sensor – As an Enum field, it can be filtered using a reference <enum ‘Sensors’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘C2’, ‘C2N’, ‘EN’, ‘ENN’, ‘E1’, ‘E1G’, ‘E2’, ‘G2’, ‘H2A’, ‘H2AG’, ‘H2B’, ‘J1’, ‘J1G’, ‘J1N’, ‘J2’, ‘J2N’, ‘J2G’, ‘J3’, ‘J3N’, ‘J3G’, ‘AL’, ‘ALG’, ‘S3A’, ‘S3B’, ‘S6A’, ‘S6A_LR’, ‘S6A_HR’, ‘SWON’, ‘SWONC’, ‘TP’, ‘TPN’, ‘ALLSAT’, ‘DEMO_ALLSAT_SWOTS’, ‘ALLSAT_SWOS’, ‘CFO’, ‘H2C’, ‘SWOT’, ‘GIR’, ‘PIR’, ‘PMW’, ‘OLCI’, ‘MULTI’]
resolution – Data resolution. Nadir products may be sampled at 1Hz, 5Hz or 20Hz depending on the level and dataset considered. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

class fcollections.implementations.BasicNetcdfFilesDatabaseMUR(path: str, fs: AbstractFileSystem = LocalFileSystem(), enable_layouts: bool = True, follow_symlinks: bool = False)[source]#

Bases: FilesDatabase, PeriodMixin

Database mapping to select and read GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis product Netcdf file in a local file system.

layouts: list[Layout] | None = [<fcollections.core._listing.Layout object>]#

Semantic describing how the files are organized.

Useful to extract information and have an efficient file system scanning. The pre-configured layouts can mismatch the current files organization, in which case the user can build its own or set enable_layouts to False.

list_files(sort: bool = False, deduplicate: bool = False, unmix: bool = False, predicates: tp.Iterable[IPredicate] = (), stat_fields: tuple[str] = (), *, time: datetime64)#

List the files matching the given criteria.

Parameters:

sort – Sort the results using the sort_keys attribute if this class
deduplicate – In case the class deduplicator is defined, the results are analyzed to search for duplicates according to a set of unique keys. In case duplicates are found, deduplication is run along a set of defined columns where duplicates are expected to occur
unmix – Multiple subsets may be mixed in the files metadata table. Use this argument to separate the subsets. An auto pick will also be performed according to the SubsetsUnmixer instance of this class. In case the auto pick cannot get a unique subset, an error is raised deduplication operation is done and if there are still duplicates, an error is raised
predicates – Additional complex filters to run on the record parsed by the filename. ex. lambda record: record[1] in [1, 4, 5]. Predicates are knowledgeable about the record contents and the file name convention
stat_fields – File system information that can be retrieved from the fsspec underlying implementation. For example, ‘size’ or ‘created’ are valid for a local file system
time – Period covered by the file. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Raises:

ValueError – In case unmix is True, an error is raised if one unique and homogeneous subset cannot be extracted from the files metadata table
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

listing_parameters = {'time': <Parameter "time: numpy.datetime64">}#

map(func: tp.Callable[[xr_t.Dataset, dict[str, tp.Any]], tp.Any], *, selected_variables: list[str] | None = None, time: datetime64)#

Map a function over dataset extracted from the files.

Parameters:

func – Callable that works on a xarray dataset.
selected_variables – Variables that needs to be read. Set to None to read everything
time – Period covered by the file. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Raises:

NotImplementedError – In case dask is not available
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

query(*, selected_variables: list[str] | None = None, time: datetime64)#

Query a dataset by reading selected files in file system.

Parameters:

selected_variables – Variables that needs to be read. Set to None to read everything
time – Period covered by the file. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Returns:

A dataset containing the result of the query, or an None if there is
nothing matching the query

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

reader: IFilesReader | None = <fcollections.core._readers.OpenMfDataset object>#: Files reader.

reading_parameters = {'selected_variables': <Parameter "selected_variables: 'list[str] | None' = None">}#

sort_keys: list[str] | str | None = 'time'#

Keys that specifies the fields used to sort the records extracted from the filenames.

Useful to order the files prior to reading them.

variables_info()#

Returns the variables metadata.

Because the files collection may mix multiple subsets, we want to ensure that we return the variables of one subset only. The parameters of this method are the subset partitioning keys and can be given by the user to ensure a consistent set of variables. If the input parameters are not sufficient to unmix the subsets, the user will be notified with a ValueError

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

class fcollections.implementations.BasicNetcdfFilesDatabaseOC(path: str, fs: AbstractFileSystem = LocalFileSystem(), enable_layouts: bool = True, follow_symlinks: bool = False)[source]#

Bases: FilesDatabase, PeriodMixin

Database mapping to select and read ocean color Netcdf files in a local file system.

layouts: list[Layout] | None = [<fcollections.core._listing.Layout object>, <fcollections.core._listing.Layout object>]#

Semantic describing how the files are organized.

Useful to extract information and have an efficient file system scanning. The pre-configured layouts can mismatch the current files organization, in which case the user can build its own or set enable_layouts to False.

list_files(sort: bool = False, deduplicate: bool = False, unmix: bool = False, predicates: tp.Iterable[IPredicate] = (), stat_fields: tuple[str] = (), *, time: Period, origin: Origin, group: Group, pc: ProductClass, area: Area, thematic: Thematic, variable: Variable, type: DataType, level: str, sensor: Sensors, spatial_resolution: str, temporal_resolution: ISODuration, typology: Typology, version: str)#

List the files matching the given criteria.

Parameters:

sort – Sort the results using the sort_keys attribute if this class
deduplicate – In case the class deduplicator is defined, the results are analyzed to search for duplicates according to a set of unique keys. In case duplicates are found, deduplication is run along a set of defined columns where duplicates are expected to occur
unmix – Multiple subsets may be mixed in the files metadata table. Use this argument to separate the subsets. An auto pick will also be performed according to the SubsetsUnmixer instance of this class. In case the auto pick cannot get a unique subset, an error is raised deduplication operation is done and if there are still duplicates, an error is raised
predicates – Additional complex filters to run on the record parsed by the filename. ex. lambda record: record[1] in [1, 4, 5]. Predicates are knowledgeable about the record contents and the file name convention
stat_fields – File system information that can be retrieved from the fsspec underlying implementation. For example, ‘size’ or ‘created’ are valid for a local file system
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
origin – As an Enum field, it can be filtered using a reference <enum ‘Origin’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘CMEMS’, ‘C3S’, ‘CCI’, ‘OSISAF’]
group – As an Enum field, it can be filtered using a reference <enum ‘Group’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘OBS’, ‘MOD’]
pc – As an Enum field, it can be filtered using a reference <enum ‘ProductClass’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘SST’, ‘SL’, ‘OC’, ‘SI’, ‘WIND’, ‘WAVE’, ‘MOB’, ‘INS’]
area – As an Enum field, it can be filtered using a reference <enum ‘Area’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘ATL’, ‘ARC’, ‘ANT’, ‘BAL’, ‘BLK’, ‘EUR’, ‘GLO’, ‘IBI’, ‘MED’, ‘NWS’]
thematic – As an Enum field, it can be filtered using a reference <enum ‘Thematic’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘PHY’, ‘BGC’, ‘WAV’, ‘PHYBGC’, ‘PHYBGCWAV’]
variable – As an Enum field, it can be filtered using a reference <enum ‘Variable’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘TEMP’, ‘CUR’, ‘CHL’, ‘CAR’, ‘NUT’, ‘GEOPHY’, ‘PLANKTON’, ‘TRANSP’, ‘OPTICS’, ‘PP’, ‘MFLUX’, ‘WFLUX’, ‘HFLUX’, ‘SWH’, ‘SSH’, ‘REFLECTANCE’]
type – As an Enum field, it can be filtered using a reference <enum ‘DataType’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘MY’, ‘MYINT’, ‘NRT’, ‘ANFC’, ‘HCST’, ‘MYNRT’]
level – Product level of the data. As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.
sensor – As an Enum field, it can be filtered using a reference <enum ‘Sensors’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘C2’, ‘C2N’, ‘EN’, ‘ENN’, ‘E1’, ‘E1G’, ‘E2’, ‘G2’, ‘H2A’, ‘H2AG’, ‘H2B’, ‘J1’, ‘J1G’, ‘J1N’, ‘J2’, ‘J2N’, ‘J2G’, ‘J3’, ‘J3N’, ‘J3G’, ‘AL’, ‘ALG’, ‘S3A’, ‘S3B’, ‘S6A’, ‘S6A_LR’, ‘S6A_HR’, ‘SWON’, ‘SWONC’, ‘TP’, ‘TPN’, ‘ALLSAT’, ‘DEMO_ALLSAT_SWOTS’, ‘ALLSAT_SWOS’, ‘CFO’, ‘H2C’, ‘SWOT’, ‘GIR’, ‘PIR’, ‘PMW’, ‘OLCI’, ‘MULTI’]
spatial_resolution – Spatial resolution, such as 4km, 1km, 300M. As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.
temporal_resolution – ISO8601 duration field can be tested against an ISODuration object or its string representation (PT1S, …)
typology – As an Enum field, it can be filtered using a reference <enum ‘Typology’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘I’, ‘M’]
version – As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.

Raises:

ValueError – In case unmix is True, an error is raised if one unique and homogeneous subset cannot be extracted from the files metadata table
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

listing_parameters = {'area': <Parameter "area: fcollections.implementations._definitions._cmems.Area">, 'group': <Parameter "group: fcollections.implementations._definitions._cmems.Group">, 'level': <Parameter "level: str">, 'origin': <Parameter "origin: fcollections.implementations._definitions._cmems.Origin">, 'pc': <Parameter "pc: fcollections.implementations._definitions._cmems.ProductClass">, 'sensor': <Parameter "sensor: fcollections.implementations._definitions._cmems.Sensors">, 'spatial_resolution': <Parameter "spatial_resolution: str">, 'temporal_resolution': <Parameter "temporal_resolution: fcollections.time.ISODuration">, 'thematic': <Parameter "thematic: fcollections.implementations._definitions._cmems.Thematic">, 'time': <Parameter "time: fcollections.time._periods.Period">, 'type': <Parameter "type: fcollections.implementations._definitions._cmems.DataType">, 'typology': <Parameter "typology: fcollections.implementations._definitions._cmems.Typology">, 'variable': <Parameter "variable: fcollections.implementations._definitions._cmems.Variable">, 'version': <Parameter "version: str">}#

map(func: tp.Callable[[xr_t.Dataset, dict[str, tp.Any]], tp.Any], *, selected_variables: list[str] | None = None, time: Period, origin: Origin, group: Group, pc: ProductClass, area: Area, thematic: Thematic, variable: Variable, type: DataType, level: str, sensor: Sensors, spatial_resolution: str, temporal_resolution: ISODuration, typology: Typology, version: str)#

Map a function over dataset extracted from the files.

Parameters:

func – Callable that works on a xarray dataset.
selected_variables – Variables that needs to be read. Set to None to read everything
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
origin – As an Enum field, it can be filtered using a reference <enum ‘Origin’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘CMEMS’, ‘C3S’, ‘CCI’, ‘OSISAF’]
group – As an Enum field, it can be filtered using a reference <enum ‘Group’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘OBS’, ‘MOD’]
pc – As an Enum field, it can be filtered using a reference <enum ‘ProductClass’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘SST’, ‘SL’, ‘OC’, ‘SI’, ‘WIND’, ‘WAVE’, ‘MOB’, ‘INS’]
area – As an Enum field, it can be filtered using a reference <enum ‘Area’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘ATL’, ‘ARC’, ‘ANT’, ‘BAL’, ‘BLK’, ‘EUR’, ‘GLO’, ‘IBI’, ‘MED’, ‘NWS’]
thematic – As an Enum field, it can be filtered using a reference <enum ‘Thematic’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘PHY’, ‘BGC’, ‘WAV’, ‘PHYBGC’, ‘PHYBGCWAV’]
variable – As an Enum field, it can be filtered using a reference <enum ‘Variable’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘TEMP’, ‘CUR’, ‘CHL’, ‘CAR’, ‘NUT’, ‘GEOPHY’, ‘PLANKTON’, ‘TRANSP’, ‘OPTICS’, ‘PP’, ‘MFLUX’, ‘WFLUX’, ‘HFLUX’, ‘SWH’, ‘SSH’, ‘REFLECTANCE’]
type – As an Enum field, it can be filtered using a reference <enum ‘DataType’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘MY’, ‘MYINT’, ‘NRT’, ‘ANFC’, ‘HCST’, ‘MYNRT’]
level – Product level of the data. As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.
sensor – As an Enum field, it can be filtered using a reference <enum ‘Sensors’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘C2’, ‘C2N’, ‘EN’, ‘ENN’, ‘E1’, ‘E1G’, ‘E2’, ‘G2’, ‘H2A’, ‘H2AG’, ‘H2B’, ‘J1’, ‘J1G’, ‘J1N’, ‘J2’, ‘J2N’, ‘J2G’, ‘J3’, ‘J3N’, ‘J3G’, ‘AL’, ‘ALG’, ‘S3A’, ‘S3B’, ‘S6A’, ‘S6A_LR’, ‘S6A_HR’, ‘SWON’, ‘SWONC’, ‘TP’, ‘TPN’, ‘ALLSAT’, ‘DEMO_ALLSAT_SWOTS’, ‘ALLSAT_SWOS’, ‘CFO’, ‘H2C’, ‘SWOT’, ‘GIR’, ‘PIR’, ‘PMW’, ‘OLCI’, ‘MULTI’]
spatial_resolution – Spatial resolution, such as 4km, 1km, 300M. As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.
temporal_resolution – ISO8601 duration field can be tested against an ISODuration object or its string representation (PT1S, …)
typology – As an Enum field, it can be filtered using a reference <enum ‘Typology’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘I’, ‘M’]
version – As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.

Raises:

NotImplementedError – In case dask is not available
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

query(*, selected_variables: list[str] | None = None, time: Period, origin: Origin, group: Group, pc: ProductClass, area: Area, thematic: Thematic, variable: Variable, type: DataType, level: str, sensor: Sensors, spatial_resolution: str, temporal_resolution: ISODuration, typology: Typology, version: str)#

Query a dataset by reading selected files in file system.

Parameters:

selected_variables – Variables that needs to be read. Set to None to read everything
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
origin – As an Enum field, it can be filtered using a reference <enum ‘Origin’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘CMEMS’, ‘C3S’, ‘CCI’, ‘OSISAF’]
group – As an Enum field, it can be filtered using a reference <enum ‘Group’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘OBS’, ‘MOD’]
pc – As an Enum field, it can be filtered using a reference <enum ‘ProductClass’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘SST’, ‘SL’, ‘OC’, ‘SI’, ‘WIND’, ‘WAVE’, ‘MOB’, ‘INS’]
area – As an Enum field, it can be filtered using a reference <enum ‘Area’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘ATL’, ‘ARC’, ‘ANT’, ‘BAL’, ‘BLK’, ‘EUR’, ‘GLO’, ‘IBI’, ‘MED’, ‘NWS’]
thematic – As an Enum field, it can be filtered using a reference <enum ‘Thematic’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘PHY’, ‘BGC’, ‘WAV’, ‘PHYBGC’, ‘PHYBGCWAV’]
variable – As an Enum field, it can be filtered using a reference <enum ‘Variable’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘TEMP’, ‘CUR’, ‘CHL’, ‘CAR’, ‘NUT’, ‘GEOPHY’, ‘PLANKTON’, ‘TRANSP’, ‘OPTICS’, ‘PP’, ‘MFLUX’, ‘WFLUX’, ‘HFLUX’, ‘SWH’, ‘SSH’, ‘REFLECTANCE’]
type – As an Enum field, it can be filtered using a reference <enum ‘DataType’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘MY’, ‘MYINT’, ‘NRT’, ‘ANFC’, ‘HCST’, ‘MYNRT’]
level – Product level of the data. As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.
sensor – As an Enum field, it can be filtered using a reference <enum ‘Sensors’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘C2’, ‘C2N’, ‘EN’, ‘ENN’, ‘E1’, ‘E1G’, ‘E2’, ‘G2’, ‘H2A’, ‘H2AG’, ‘H2B’, ‘J1’, ‘J1G’, ‘J1N’, ‘J2’, ‘J2N’, ‘J2G’, ‘J3’, ‘J3N’, ‘J3G’, ‘AL’, ‘ALG’, ‘S3A’, ‘S3B’, ‘S6A’, ‘S6A_LR’, ‘S6A_HR’, ‘SWON’, ‘SWONC’, ‘TP’, ‘TPN’, ‘ALLSAT’, ‘DEMO_ALLSAT_SWOTS’, ‘ALLSAT_SWOS’, ‘CFO’, ‘H2C’, ‘SWOT’, ‘GIR’, ‘PIR’, ‘PMW’, ‘OLCI’, ‘MULTI’]
spatial_resolution – Spatial resolution, such as 4km, 1km, 300M. As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.
temporal_resolution – ISO8601 duration field can be tested against an ISODuration object or its string representation (PT1S, …)
typology – As an Enum field, it can be filtered using a reference <enum ‘Typology’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘I’, ‘M’]
version – As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.

Returns:

A dataset containing the result of the query, or an None if there is
nothing matching the query

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

reader: IFilesReader | None = <fcollections.core._readers.OpenMfDataset object>#: Files reader.

reading_parameters = {'selected_variables': <Parameter "selected_variables: 'list[str] | None' = None">}#

sort_keys: list[str] | str | None = 'time'#

Keys that specifies the fields used to sort the records extracted from the filenames.

Useful to order the files prior to reading them.

variables_info()#

Returns the variables metadata.

Because the files collection may mix multiple subsets, we want to ensure that we return the variables of one subset only. The parameters of this method are the subset partitioning keys and can be given by the user to ensure a consistent set of variables. If the input parameters are not sufficient to unmix the subsets, the user will be notified with a ValueError

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

class fcollections.implementations.BasicNetcdfFilesDatabaseOHC(path: str, fs: AbstractFileSystem = LocalFileSystem(), enable_layouts: bool = True, follow_symlinks: bool = False)[source]#

Bases: FilesDatabase, PeriodMixin

Database mapping to select and read ocean heat content Netcdf files in a local file system.

layouts: list[Layout] | None = [<fcollections.core._listing.Layout object>]#

Semantic describing how the files are organized.

Useful to extract information and have an efficient file system scanning. The pre-configured layouts can mismatch the current files organization, in which case the user can build its own or set enable_layouts to False.

list_files(sort: bool = False, deduplicate: bool = False, unmix: bool = False, predicates: tp.Iterable[IPredicate] = (), stat_fields: tuple[str] = (), *, time: datetime64)#

List the files matching the given criteria.

Parameters:

sort – Sort the results using the sort_keys attribute if this class
deduplicate – In case the class deduplicator is defined, the results are analyzed to search for duplicates according to a set of unique keys. In case duplicates are found, deduplication is run along a set of defined columns where duplicates are expected to occur
unmix – Multiple subsets may be mixed in the files metadata table. Use this argument to separate the subsets. An auto pick will also be performed according to the SubsetsUnmixer instance of this class. In case the auto pick cannot get a unique subset, an error is raised deduplication operation is done and if there are still duplicates, an error is raised
predicates – Additional complex filters to run on the record parsed by the filename. ex. lambda record: record[1] in [1, 4, 5]. Predicates are knowledgeable about the record contents and the file name convention
stat_fields – File system information that can be retrieved from the fsspec underlying implementation. For example, ‘size’ or ‘created’ are valid for a local file system
time – Period covered by the file. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Raises:

ValueError – In case unmix is True, an error is raised if one unique and homogeneous subset cannot be extracted from the files metadata table
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

listing_parameters = {'time': <Parameter "time: numpy.datetime64">}#

map(func: tp.Callable[[xr_t.Dataset, dict[str, tp.Any]], tp.Any], *, selected_variables: list[str] | None = None, time: datetime64)#

Map a function over dataset extracted from the files.

Parameters:

func – Callable that works on a xarray dataset.
selected_variables – Variables that needs to be read. Set to None to read everything
time – Period covered by the file. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Raises:

NotImplementedError – In case dask is not available
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

query(*, selected_variables: list[str] | None = None, time: datetime64)#

Query a dataset by reading selected files in file system.

Parameters:

selected_variables – Variables that needs to be read. Set to None to read everything
time – Period covered by the file. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Returns:

A dataset containing the result of the query, or an None if there is
nothing matching the query

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

reader: IFilesReader | None = <fcollections.core._readers.OpenMfDataset object>#: Files reader.

reading_parameters = {'selected_variables': <Parameter "selected_variables: 'list[str] | None' = None">}#

sort_keys: list[str] | str | None = 'time'#

Keys that specifies the fields used to sort the records extracted from the filenames.

Useful to order the files prior to reading them.

variables_info()#

Returns the variables metadata.

Because the files collection may mix multiple subsets, we want to ensure that we return the variables of one subset only. The parameters of this method are the subset partitioning keys and can be given by the user to ensure a consistent set of variables. If the input parameters are not sufficient to unmix the subsets, the user will be notified with a ValueError

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

class fcollections.implementations.BasicNetcdfFilesDatabaseSST(path: str, fs: AbstractFileSystem = LocalFileSystem(), enable_layouts: bool = True, follow_symlinks: bool = False)[source]#

Bases: FilesDatabase, PeriodMixin

Database mapping to select and read sea surface temperature Netcdf files in a local file system.

layouts: list[Layout] | None = [<fcollections.core._listing.Layout object>, <fcollections.core._listing.Layout object>, <fcollections.core._listing.Layout object>]#

Semantic describing how the files are organized.

Useful to extract information and have an efficient file system scanning. The pre-configured layouts can mismatch the current files organization, in which case the user can build its own or set enable_layouts to False.

list_files(sort: bool = False, deduplicate: bool = False, unmix: bool = False, predicates: tp.Iterable[IPredicate] = (), stat_fields: tuple[str] = (), *, time: datetime64)#

List the files matching the given criteria.

Parameters:

sort – Sort the results using the sort_keys attribute if this class
deduplicate – In case the class deduplicator is defined, the results are analyzed to search for duplicates according to a set of unique keys. In case duplicates are found, deduplication is run along a set of defined columns where duplicates are expected to occur
unmix – Multiple subsets may be mixed in the files metadata table. Use this argument to separate the subsets. An auto pick will also be performed according to the SubsetsUnmixer instance of this class. In case the auto pick cannot get a unique subset, an error is raised deduplication operation is done and if there are still duplicates, an error is raised
predicates – Additional complex filters to run on the record parsed by the filename. ex. lambda record: record[1] in [1, 4, 5]. Predicates are knowledgeable about the record contents and the file name convention
stat_fields – File system information that can be retrieved from the fsspec underlying implementation. For example, ‘size’ or ‘created’ are valid for a local file system
time – Period covered by the file. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Raises:

ValueError – In case unmix is True, an error is raised if one unique and homogeneous subset cannot be extracted from the files metadata table
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

listing_parameters = {'time': <Parameter "time: numpy.datetime64">}#

map(func: tp.Callable[[xr_t.Dataset, dict[str, tp.Any]], tp.Any], *, selected_variables: list[str] | None = None, time: datetime64)#

Map a function over dataset extracted from the files.

Parameters:

func – Callable that works on a xarray dataset.
selected_variables – Variables that needs to be read. Set to None to read everything
time – Period covered by the file. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Raises:

NotImplementedError – In case dask is not available
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

query(*, selected_variables: list[str] | None = None, time: datetime64)#

Query a dataset by reading selected files in file system.

Parameters:

selected_variables – Variables that needs to be read. Set to None to read everything
time – Period covered by the file. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Returns:

A dataset containing the result of the query, or an None if there is
nothing matching the query

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

reader: IFilesReader | None = <fcollections.core._readers.OpenMfDataset object>#: Files reader.

reading_parameters = {'selected_variables': <Parameter "selected_variables: 'list[str] | None' = None">}#

sort_keys: list[str] | str | None = 'time'#

Keys that specifies the fields used to sort the records extracted from the filenames.

Useful to order the files prior to reading them.

variables_info()#

Returns the variables metadata.

Because the files collection may mix multiple subsets, we want to ensure that we return the variables of one subset only. The parameters of this method are the subset partitioning keys and can be given by the user to ensure a consistent set of variables. If the input parameters are not sufficient to unmix the subsets, the user will be notified with a ValueError

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

class fcollections.implementations.BasicNetcdfFilesDatabaseSWH(path: str, fs: AbstractFileSystem = LocalFileSystem(), enable_layouts: bool = True, follow_symlinks: bool = False)[source]#

Bases: FilesDatabase, PeriodMixin

Database mapping to select and read significant wave height Netcdf files in a local file system.

layouts: list[Layout] | None = [<fcollections.core._listing.Layout object>, <fcollections.core._listing.Layout object>]#

Semantic describing how the files are organized.

Useful to extract information and have an efficient file system scanning. The pre-configured layouts can mismatch the current files organization, in which case the user can build its own or set enable_layouts to False.

list_files(sort: bool = False, deduplicate: bool = False, unmix: bool = False, predicates: tp.Iterable[IPredicate] = (), stat_fields: tuple[str] = (), *, sensorf: Sensors, time: Period, production_date: datetime64)#

List the files matching the given criteria.

Parameters:

sort – Sort the results using the sort_keys attribute if this class
deduplicate – In case the class deduplicator is defined, the results are analyzed to search for duplicates according to a set of unique keys. In case duplicates are found, deduplication is run along a set of defined columns where duplicates are expected to occur
unmix – Multiple subsets may be mixed in the files metadata table. Use this argument to separate the subsets. An auto pick will also be performed according to the SubsetsUnmixer instance of this class. In case the auto pick cannot get a unique subset, an error is raised deduplication operation is done and if there are still duplicates, an error is raised
predicates – Additional complex filters to run on the record parsed by the filename. ex. lambda record: record[1] in [1, 4, 5]. Predicates are knowledgeable about the record contents and the file name convention
stat_fields – File system information that can be retrieved from the fsspec underlying implementation. For example, ‘size’ or ‘created’ are valid for a local file system
sensorf – As an Enum field, it can be filtered using a reference <enum ‘Sensors’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘C2’, ‘C2N’, ‘EN’, ‘ENN’, ‘E1’, ‘E1G’, ‘E2’, ‘G2’, ‘H2A’, ‘H2AG’, ‘H2B’, ‘J1’, ‘J1G’, ‘J1N’, ‘J2’, ‘J2N’, ‘J2G’, ‘J3’, ‘J3N’, ‘J3G’, ‘AL’, ‘ALG’, ‘S3A’, ‘S3B’, ‘S6A’, ‘S6A_LR’, ‘S6A_HR’, ‘SWON’, ‘SWONC’, ‘TP’, ‘TPN’, ‘ALLSAT’, ‘DEMO_ALLSAT_SWOTS’, ‘ALLSAT_SWOS’, ‘CFO’, ‘H2C’, ‘SWOT’, ‘GIR’, ‘PIR’, ‘PMW’, ‘OLCI’, ‘MULTI’]
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
production_date – Production date of a given file. The same granule is regenerated multiple times with updated corrections. Hence there can be multiple files for the same period, but with a different production date. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Raises:

ValueError – In case unmix is True, an error is raised if one unique and homogeneous subset cannot be extracted from the files metadata table
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

listing_parameters = {'production_date': <Parameter "production_date: numpy.datetime64">, 'sensorf': <Parameter "sensorf: fcollections.implementations._definitions._cmems.Sensors">, 'time': <Parameter "time: fcollections.time._periods.Period">}#

map(func: tp.Callable[[xr_t.Dataset, dict[str, tp.Any]], tp.Any], *, selected_variables: list[str] | None = None, sensorf: Sensors, time: Period, production_date: datetime64)#

Map a function over dataset extracted from the files.

Parameters:

func – Callable that works on a xarray dataset.
selected_variables – Variables that needs to be read. Set to None to read everything
sensorf – As an Enum field, it can be filtered using a reference <enum ‘Sensors’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘C2’, ‘C2N’, ‘EN’, ‘ENN’, ‘E1’, ‘E1G’, ‘E2’, ‘G2’, ‘H2A’, ‘H2AG’, ‘H2B’, ‘J1’, ‘J1G’, ‘J1N’, ‘J2’, ‘J2N’, ‘J2G’, ‘J3’, ‘J3N’, ‘J3G’, ‘AL’, ‘ALG’, ‘S3A’, ‘S3B’, ‘S6A’, ‘S6A_LR’, ‘S6A_HR’, ‘SWON’, ‘SWONC’, ‘TP’, ‘TPN’, ‘ALLSAT’, ‘DEMO_ALLSAT_SWOTS’, ‘ALLSAT_SWOS’, ‘CFO’, ‘H2C’, ‘SWOT’, ‘GIR’, ‘PIR’, ‘PMW’, ‘OLCI’, ‘MULTI’]
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
production_date – Production date of a given file. The same granule is regenerated multiple times with updated corrections. Hence there can be multiple files for the same period, but with a different production date. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Raises:

NotImplementedError – In case dask is not available
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

query(*, selected_variables: list[str] | None = None, sensorf: Sensors, time: Period, production_date: datetime64)#

Query a dataset by reading selected files in file system.

Parameters:

selected_variables – Variables that needs to be read. Set to None to read everything
sensorf – As an Enum field, it can be filtered using a reference <enum ‘Sensors’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘C2’, ‘C2N’, ‘EN’, ‘ENN’, ‘E1’, ‘E1G’, ‘E2’, ‘G2’, ‘H2A’, ‘H2AG’, ‘H2B’, ‘J1’, ‘J1G’, ‘J1N’, ‘J2’, ‘J2N’, ‘J2G’, ‘J3’, ‘J3N’, ‘J3G’, ‘AL’, ‘ALG’, ‘S3A’, ‘S3B’, ‘S6A’, ‘S6A_LR’, ‘S6A_HR’, ‘SWON’, ‘SWONC’, ‘TP’, ‘TPN’, ‘ALLSAT’, ‘DEMO_ALLSAT_SWOTS’, ‘ALLSAT_SWOS’, ‘CFO’, ‘H2C’, ‘SWOT’, ‘GIR’, ‘PIR’, ‘PMW’, ‘OLCI’, ‘MULTI’]
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
production_date – Production date of a given file. The same granule is regenerated multiple times with updated corrections. Hence there can be multiple files for the same period, but with a different production date. As a DateTime field, it can be filtered by giving a reference Period, datetime. The tested value from the file name will be filtered out if it is not included or not equal to the reference Period or datetime respectively. The reference value can be given as a string or tuple of string following with the numpy date formatting [%Y-%m-%dT%H:%M:%S])

Returns:

A dataset containing the result of the query, or an None if there is
nothing matching the query

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

reader: IFilesReader | None = <fcollections.core._readers.OpenMfDataset object>#: Files reader.

reading_parameters = {'selected_variables': <Parameter "selected_variables: 'list[str] | None' = None">}#

sort_keys: list[str] | str | None = 'time'#

Keys that specifies the fields used to sort the records extracted from the filenames.

Useful to order the files prior to reading them.

variables_info()#

Returns the variables metadata.

Because the files collection may mix multiple subsets, we want to ensure that we return the variables of one subset only. The parameters of this method are the subset partitioning keys and can be given by the user to ensure a consistent set of variables. If the input parameters are not sufficient to unmix the subsets, the user will be notified with a ValueError

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

class fcollections.implementations.BasicNetcdfFilesDatabaseSwotLRL2(path: str, fs: AbstractFileSystem = LocalFileSystem(), enable_layouts: bool = True, follow_symlinks: bool = False)[source]#

Bases: FilesDatabase, PeriodMixin

Database mapping to select and read Swot LR L2 Netcdf files in a local file system.

deduplicator: Deduplicator | None = Deduplicator(unique=('cycle_number', 'pass_number'), auto_pick_last=('version',))#: Deduplicate the file metadata table of a unique subset (after unmixing).

layouts: list[Layout] | None = [<fcollections.core._listing.Layout object>, <fcollections.core._listing.Layout object>]#

Semantic describing how the files are organized.

Useful to extract information and have an efficient file system scanning. The pre-configured layouts can mismatch the current files organization, in which case the user can build its own or set enable_layouts to False.

list_files(sort: bool = False, deduplicate: bool = False, unmix: bool = False, predicates: tp.Iterable[IPredicate] = (), stat_fields: tuple[str] = (), *, cycle_number: list[int] | slice | int, pass_number: list[int] | slice | int, time: Period, level: ProductLevel, subset: ProductSubset, version: L2Version)#

List the files matching the given criteria.

Parameters:

sort – Sort the results using the sort_keys attribute if this class
deduplicate – In case the class deduplicator is defined, the results are analyzed to search for duplicates according to a set of unique keys. In case duplicates are found, deduplication is run along a set of defined columns where duplicates are expected to occur
unmix – Multiple subsets may be mixed in the files metadata table. Use this argument to separate the subsets. An auto pick will also be performed according to the SubsetsUnmixer instance of this class. In case the auto pick cannot get a unique subset, an error is raised deduplication operation is done and if there are still duplicates, an error is raised
predicates – Additional complex filters to run on the record parsed by the filename. ex. lambda record: record[1] in [1, 4, 5]. Predicates are knowledgeable about the record contents and the file name convention
stat_fields – File system information that can be retrieved from the fsspec underlying implementation. For example, ‘size’ or ‘created’ are valid for a local file system
cycle_number – Cycle number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
pass_number – Pass number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
level – Product level of the data. As an Enum field, it can be filtered using a reference <enum ‘ProductLevel’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘L2’, ‘L3’, ‘L4’]
subset – Subset of the LR Karin products. The Basic, Expert and Technical subsets are defined on a reference grid, opening the possibility of stacking the files, whereas the Unsmoothed subset is defined on a different grid for each cycle. The Light and Extended subset are specific to the L3_LR_WIND_WAVE product. As an Enum field, it can be filtered using a reference <enum ‘ProductSubset’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘Basic’, ‘Expert’, ‘WindWave’, ‘Unsmoothed’, ‘Technical’, ‘Light’, ‘Extended’]
version – Version of the L2_LR_SSH product, composed of a CRID and a product counter. The CRID can be further decomposed with the timeliness (I/G), the baseline (A/B/C…) and the minor version (a number) (ex. PIC0). The product counter is a numberthat increased when a half orbit has been regenerated for the same crid. This can happens if an anomaly is detected or if there is a change in the upstream data. As a L2Version field, this field can be tested by providing another L2Version instance. This instance can be partially set, with some missing attributes set to None. In this case, the check will be performed on these attributes only.

Raises:

ValueError – In case unmix is True, an error is raised if one unique and homogeneous subset cannot be extracted from the files metadata table
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

listing_parameters = {'cycle_number': <Parameter "cycle_number: list[int] | slice | int">, 'level': <Parameter "level: fcollections.implementations._definitions._constants.ProductLevel">, 'pass_number': <Parameter "pass_number: list[int] | slice | int">, 'subset': <Parameter "subset: fcollections.implementations._definitions._swot.ProductSubset">, 'time': <Parameter "time: fcollections.time._periods.Period">, 'version': <Parameter "version: fcollections.implementations._l2_lr_ssh.L2Version">}#

map(func: tp.Callable[[xr_t.Dataset, dict[str, tp.Any]], tp.Any], *, subset: ProductSubset, selected_variables: list[str] | None = None, stack: StackLevel | str = StackLevel.NOSTACK, left_swath: bool = True, right_swath: bool = False, preprocessor: tp.Callable[[xr.Dataset], xr.Dataset] | None = None, cycle_number: list[int] | slice | int, pass_number: list[int] | slice | int, time: Period, level: ProductLevel, version: L2Version)#

Map a function over dataset extracted from the files.

Parameters:

func – Callable that works on a xarray dataset.
selected_variables – list of variables to select in dataset. Set to None (default) to disable the selection
subset – Subset of the LR Karin products. The Basic, Expert and Technical subsets are defined on a reference grid, opening the possibility of stacking the files, whereas the Unsmoothed subset is defined on a different grid for each cycle. The Light and Extended subset are specific to the L3_LR_WIND_WAVE product. As an Enum field, it can be filtered using a reference <enum ‘ProductSubset’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘Basic’, ‘Expert’, ‘WindWave’, ‘Unsmoothed’, ‘Technical’, ‘Light’, ‘Extended’]
bbox – the bounding box (lon_min, lat_min, lon_max, lat_max) used to select the data in a given area. Longitude coordinates can be provided in [-180, 180[ or [0, 360[ convention. If bbox’s longitude crosses the circularity, it will be split in two subboxes to ensure a proper selection (e.g. longitude interval: [170, -170] -> data in [170, 180[ and [-180, -170] will be retrieved)
left_swath – Whether to load the left side of the swath for Unsmoothed datasets. Set to False in conjunction to right_swath will disable swath reading for Expert and Basic dataset
right_swath – Whether to load the right side of the swath for Unsmoothed datasets. Set to False in conjunction to right_swath will disable swath reading for Expert and Basic dataset
stack – Whether to stack the cycles and passes of the dataset. This option is only available for Basic, Expert and WindWave datasets which are defined on a reference grid (fixed grid between cycles). Set to CYCLES_PASSES to stack both cycles and passes. Set to CYCLES to stack only the cycles, in which case cycles with missing passes will be left over. Defaults to NOSTACK
cycle_number – Cycle number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
pass_number – Pass number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
level – Product level of the data. As an Enum field, it can be filtered using a reference <enum ‘ProductLevel’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘L2’, ‘L3’, ‘L4’]
version – Version of the L2_LR_SSH product, composed of a CRID and a product counter. The CRID can be further decomposed with the timeliness (I/G), the baseline (A/B/C…) and the minor version (a number) (ex. PIC0). The product counter is a numberthat increased when a half orbit has been regenerated for the same crid. This can happens if an anomaly is detected or if there is a change in the upstream data. As a L2Version field, this field can be tested by providing another L2Version instance. This instance can be partially set, with some missing attributes set to None. In this case, the check will be performed on these attributes only.

Raises:

NotImplementedError – In case dask is not available
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

query(*, subset: ProductSubset, selected_variables: list[str] | None = None, stack: StackLevel | str = StackLevel.NOSTACK, left_swath: bool = True, right_swath: bool = False, preprocessor: tp.Callable[[xr.Dataset], xr.Dataset] | None = None, cycle_number: list[int] | slice | int, pass_number: list[int] | slice | int, time: Period, level: ProductLevel, version: L2Version)#

Query a dataset by reading selected files in file system.

Parameters:

selected_variables – list of variables to select in dataset. Set to None (default) to disable the selection
subset – Subset of the LR Karin products. The Basic, Expert and Technical subsets are defined on a reference grid, opening the possibility of stacking the files, whereas the Unsmoothed subset is defined on a different grid for each cycle. The Light and Extended subset are specific to the L3_LR_WIND_WAVE product. As an Enum field, it can be filtered using a reference <enum ‘ProductSubset’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘Basic’, ‘Expert’, ‘WindWave’, ‘Unsmoothed’, ‘Technical’, ‘Light’, ‘Extended’]
bbox – the bounding box (lon_min, lat_min, lon_max, lat_max) used to select the data in a given area. Longitude coordinates can be provided in [-180, 180[ or [0, 360[ convention. If bbox’s longitude crosses the circularity, it will be split in two subboxes to ensure a proper selection (e.g. longitude interval: [170, -170] -> data in [170, 180[ and [-180, -170] will be retrieved)
left_swath – Whether to load the left side of the swath for Unsmoothed datasets. Set to False in conjunction to right_swath will disable swath reading for Expert and Basic dataset
right_swath – Whether to load the right side of the swath for Unsmoothed datasets. Set to False in conjunction to right_swath will disable swath reading for Expert and Basic dataset
stack – Whether to stack the cycles and passes of the dataset. This option is only available for Basic, Expert and WindWave datasets which are defined on a reference grid (fixed grid between cycles). Set to CYCLES_PASSES to stack both cycles and passes. Set to CYCLES to stack only the cycles, in which case cycles with missing passes will be left over. Defaults to NOSTACK
cycle_number – Cycle number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
pass_number – Pass number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
level – Product level of the data. As an Enum field, it can be filtered using a reference <enum ‘ProductLevel’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘L2’, ‘L3’, ‘L4’]
version – Version of the L2_LR_SSH product, composed of a CRID and a product counter. The CRID can be further decomposed with the timeliness (I/G), the baseline (A/B/C…) and the minor version (a number) (ex. PIC0). The product counter is a numberthat increased when a half orbit has been regenerated for the same crid. This can happens if an anomaly is detected or if there is a change in the upstream data. As a L2Version field, this field can be tested by providing another L2Version instance. This instance can be partially set, with some missing attributes set to None. In this case, the check will be performed on these attributes only.

Returns:

A dataset containing the result of the query, or an None if there is
nothing matching the query

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

reader: IFilesReader | None = <fcollections.implementations._readers.SwotReaderL2LRSSH object>#: Files reader.

reading_parameters = {'left_swath': <Parameter "left_swath: 'bool' = True">, 'preprocessor': <Parameter "preprocessor: 'tp.Callable[[xr.Dataset], xr.Dataset] | None' = None">, 'right_swath': <Parameter "right_swath: 'bool' = False">, 'selected_variables': <Parameter "selected_variables: 'list[str] | None' = None">, 'stack': <Parameter "stack: 'StackLevel | str' = <StackLevel.NOSTACK: 1>">, 'subset': <Parameter "subset: 'ProductSubset'">}#

sort_keys: list[str] | str | None = 'time'#

Keys that specifies the fields used to sort the records extracted from the filenames.

Useful to order the files prior to reading them.

unmixer: SubsetsUnmixer | None = SubsetsUnmixer(partition_keys=['level', 'subset'], auto_pick_last=())#: Specify how to interpret the file metadata table to unmix subsets.

variables_info(*, level: ProductLevel, subset: ProductSubset)#

Returns the variables metadata.

Because the files collection may mix multiple subsets, we want to ensure that we return the variables of one subset only. The parameters of this method are the subset partitioning keys and can be given by the user to ensure a consistent set of variables. If the input parameters are not sufficient to unmix the subsets, the user will be notified with a ValueError

Parameters:

level – Product level of the data. As an Enum field, it can be filtered using a reference <enum ‘ProductLevel’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘L2’, ‘L3’, ‘L4’]
subset – Subset of the LR Karin products. The Basic, Expert and Technical subsets are defined on a reference grid, opening the possibility of stacking the files, whereas the Unsmoothed subset is defined on a different grid for each cycle. The Light and Extended subset are specific to the L3_LR_WIND_WAVE product. As an Enum field, it can be filtered using a reference <enum ‘ProductSubset’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘Basic’, ‘Expert’, ‘WindWave’, ‘Unsmoothed’, ‘Technical’, ‘Light’, ‘Extended’]

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

class fcollections.implementations.BasicNetcdfFilesDatabaseSwotLRL3(path: str, fs: AbstractFileSystem = LocalFileSystem(), enable_layouts: bool = True, follow_symlinks: bool = False)[source]#

Bases: FilesDatabase, PeriodMixin

Database mapping to select and read Swot LR L3 Netcdf files in a local file system.

layouts: list[Layout] | None = [<fcollections.core._listing.Layout object>, <fcollections.core._listing.Layout object>, <fcollections.core._listing.Layout object>]#

Semantic describing how the files are organized.

Useful to extract information and have an efficient file system scanning. The pre-configured layouts can mismatch the current files organization, in which case the user can build its own or set enable_layouts to False.

list_files(sort: bool = False, deduplicate: bool = False, unmix: bool = False, predicates: tp.Iterable[IPredicate] = (), stat_fields: tuple[str] = (), *, cycle_number: list[int] | slice | int, pass_number: list[int] | slice | int, time: Period, level: ProductLevel, subset: ProductSubset, version: str)#

List the files matching the given criteria.

Parameters:

sort – Sort the results using the sort_keys attribute if this class
deduplicate – In case the class deduplicator is defined, the results are analyzed to search for duplicates according to a set of unique keys. In case duplicates are found, deduplication is run along a set of defined columns where duplicates are expected to occur
unmix – Multiple subsets may be mixed in the files metadata table. Use this argument to separate the subsets. An auto pick will also be performed according to the SubsetsUnmixer instance of this class. In case the auto pick cannot get a unique subset, an error is raised deduplication operation is done and if there are still duplicates, an error is raised
predicates – Additional complex filters to run on the record parsed by the filename. ex. lambda record: record[1] in [1, 4, 5]. Predicates are knowledgeable about the record contents and the file name convention
stat_fields – File system information that can be retrieved from the fsspec underlying implementation. For example, ‘size’ or ‘created’ are valid for a local file system
cycle_number – Cycle number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
pass_number – Pass number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
level – Product level of the data. As an Enum field, it can be filtered using a reference <enum ‘ProductLevel’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘L2’, ‘L3’, ‘L4’]
subset – Subset of the LR Karin products. The Basic, Expert and Technical subsets are defined on a reference grid, opening the possibility of stacking the files, whereas the Unsmoothed subset is defined on a different grid for each cycle. The Light and Extended subset are specific to the L3_LR_WIND_WAVE product. As an Enum field, it can be filtered using a reference <enum ‘ProductSubset’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘Basic’, ‘Expert’, ‘WindWave’, ‘Unsmoothed’, ‘Technical’, ‘Light’, ‘Extended’]
version – Version of the L3_LR_WIND_WAVE and L3_LR_SSH Swot products (they share their versioning). This is a tri-number version x.y.z, where “x” denotes a major change in the product, “y” a minor change and “z” a fix. As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.

Raises:

ValueError – In case unmix is True, an error is raised if one unique and homogeneous subset cannot be extracted from the files metadata table
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

listing_parameters = {'cycle_number': <Parameter "cycle_number: list[int] | slice | int">, 'level': <Parameter "level: fcollections.implementations._definitions._constants.ProductLevel">, 'pass_number': <Parameter "pass_number: list[int] | slice | int">, 'subset': <Parameter "subset: fcollections.implementations._definitions._swot.ProductSubset">, 'time': <Parameter "time: fcollections.time._periods.Period">, 'version': <Parameter "version: str">}#

map(func: tp.Callable[[xr_t.Dataset, dict[str, tp.Any]], tp.Any], *, subset: ProductSubset, selected_variables: list[str] | None = None, stack: str | StackLevel = StackLevel.NOSTACK, swath: bool = True, nadir: bool = False, preprocessor: tp.Callable[[xr.Dataset], xr.Dataset] | None = None, cycle_number: list[int] | slice | int, pass_number: list[int] | slice | int, time: Period, level: ProductLevel, version: str)#

Map a function over dataset extracted from the files.

Parameters:

func – Callable that works on a xarray dataset.
selected_variables – list of variables to select in dataset. Set to None (default) to disable the selection
subset – Subset of the LR Karin products. The Basic, Expert and Technical subsets are defined on a reference grid, opening the possibility of stacking the files, whereas the Unsmoothed subset is defined on a different grid for each cycle. The Light and Extended subset are specific to the L3_LR_WIND_WAVE product. As an Enum field, it can be filtered using a reference <enum ‘ProductSubset’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘Basic’, ‘Expert’, ‘WindWave’, ‘Unsmoothed’, ‘Technical’, ‘Light’, ‘Extended’]
bbox – the bounding box (lon_min, lat_min, lon_max, lat_max) used to select the data in a given area. Longitude coordinates can be provided in [-180, 180[ or [0, 360[ convention. If bbox’s longitude crosses the circularity, it will be split in two subboxes to ensure a proper selection (e.g. longitude interval: [170, -170] -> data in [170, 180[ and [-180, -170] will be retrieved)
stack – Whether to stack the cycles and passes of the dataset. This option is only available for Basic, Expert and Technical datasets which are defined on a reference grid (fixed grid between cycles). Set to CYCLES_PASSES to stack both cycles and passes. Set to CYCLES to stack only the cycles, in which case cycles with missing passes will be left over. Defaults to NOSTACK
nadir – Whether to read the nadir data from the product. Only relevant the Basic and Expert subsets where the nadir data is clipped in the swath. Defaults to False
swath – Whether to read the swath data from the product. Only relevant the Basic and Expert subsets where the nadir data is clipped in the swath. Defaults to True
cycle_number – Cycle number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
pass_number – Pass number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
level – Product level of the data. As an Enum field, it can be filtered using a reference <enum ‘ProductLevel’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘L2’, ‘L3’, ‘L4’]
version – Version of the L3_LR_WIND_WAVE and L3_LR_SSH Swot products (they share their versioning). This is a tri-number version x.y.z, where “x” denotes a major change in the product, “y” a minor change and “z” a fix. As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.

Raises:

NotImplementedError – In case dask is not available
LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected

query(*, subset: ProductSubset, selected_variables: list[str] | None = None, stack: str | StackLevel = StackLevel.NOSTACK, swath: bool = True, nadir: bool = False, preprocessor: tp.Callable[[xr.Dataset], xr.Dataset] | None = None, cycle_number: list[int] | slice | int, pass_number: list[int] | slice | int, time: Period, level: ProductLevel, version: str)#

Query a dataset by reading selected files in file system.

Parameters:

selected_variables – list of variables to select in dataset. Set to None (default) to disable the selection
subset – Subset of the LR Karin products. The Basic, Expert and Technical subsets are defined on a reference grid, opening the possibility of stacking the files, whereas the Unsmoothed subset is defined on a different grid for each cycle. The Light and Extended subset are specific to the L3_LR_WIND_WAVE product. As an Enum field, it can be filtered using a reference <enum ‘ProductSubset’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘Basic’, ‘Expert’, ‘WindWave’, ‘Unsmoothed’, ‘Technical’, ‘Light’, ‘Extended’]
bbox – the bounding box (lon_min, lat_min, lon_max, lat_max) used to select the data in a given area. Longitude coordinates can be provided in [-180, 180[ or [0, 360[ convention. If bbox’s longitude crosses the circularity, it will be split in two subboxes to ensure a proper selection (e.g. longitude interval: [170, -170] -> data in [170, 180[ and [-180, -170] will be retrieved)
stack – Whether to stack the cycles and passes of the dataset. This option is only available for Basic, Expert and Technical datasets which are defined on a reference grid (fixed grid between cycles). Set to CYCLES_PASSES to stack both cycles and passes. Set to CYCLES to stack only the cycles, in which case cycles with missing passes will be left over. Defaults to NOSTACK
nadir – Whether to read the nadir data from the product. Only relevant the Basic and Expert subsets where the nadir data is clipped in the swath. Defaults to False
swath – Whether to read the swath data from the product. Only relevant the Basic and Expert subsets where the nadir data is clipped in the swath. Defaults to True
cycle_number – Cycle number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
pass_number – Pass number of the half orbit. A half orbit is identified using a cycle number and a pass number. As a Integer field, it can be filtered by using a reference value. The reference value can either be a list, a slice or an integer. The tested value from the file name will be filtered out if it is outside the given list/slice or not equal to the integer value.
time – Period covered by the file. As a Period field, it can be filtered by giving a reference Period or datetime. The tested value from the file name will be filtered out if it does not intersect the reference Period or does not contain the reference datetime. The reference value can be given as a string or tuple of string following the [%Y-%m-%dT%H:%M:%S] formatting
level – Product level of the data. As an Enum field, it can be filtered using a reference <enum ‘ProductLevel’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘L2’, ‘L3’, ‘L4’]
version – Version of the L3_LR_WIND_WAVE and L3_LR_SSH Swot products (they share their versioning). This is a tri-number version x.y.z, where “x” denotes a major change in the product, “y” a minor change and “z” a fix. As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.

Returns:

A dataset containing the result of the query, or an None if there is
nothing matching the query

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

reader: IFilesReader | None = <fcollections.implementations._readers.SwotReaderL3LRSSH object>#: Files reader.

reading_parameters = {'nadir': <Parameter "nadir: 'bool' = False">, 'preprocessor': <Parameter "preprocessor: 'tp.Callable[[xr.Dataset], xr.Dataset] | None' = None">, 'selected_variables': <Parameter "selected_variables: 'list[str] | None' = None">, 'stack': <Parameter "stack: 'str | StackLevel' = <StackLevel.NOSTACK: 1>">, 'subset': <Parameter "subset: 'ProductSubset'">, 'swath': <Parameter "swath: 'bool' = True">}#

sort_keys: list[str] | str | None = 'time'#

Keys that specifies the fields used to sort the records extracted from the filenames.

Useful to order the files prior to reading them.

unmixer: SubsetsUnmixer | None = SubsetsUnmixer(partition_keys=['version', 'subset'], auto_pick_last=('version',))#: Specify how to interpret the file metadata table to unmix subsets.

variables_info(*, subset: ProductSubset, version: str)#

Returns the variables metadata.

Because the files collection may mix multiple subsets, we want to ensure that we return the variables of one subset only. The parameters of this method are the subset partitioning keys and can be given by the user to ensure a consistent set of variables. If the input parameters are not sufficient to unmix the subsets, the user will be notified with a ValueError

Parameters:

subset – Subset of the LR Karin products. The Basic, Expert and Technical subsets are defined on a reference grid, opening the possibility of stacking the files, whereas the Unsmoothed subset is defined on a different grid for each cycle. The Light and Extended subset are specific to the L3_LR_WIND_WAVE product. As an Enum field, it can be filtered using a reference <enum ‘ProductSubset’> or its equivalent string. The tested value found in the file name will be filtered out if it is not equal to the given enum field. Possible values are: [‘Basic’, ‘Expert’, ‘WindWave’, ‘Unsmoothed’, ‘Technical’, ‘Light’, ‘Extended’]
version – Version of the L3_LR_WIND_WAVE and L3_LR_SSH Swot products (they share their versioning). This is a tri-number version x.y.z, where “x” denotes a major change in the product, “y” a minor change and “z” a fix. As a String field, it can filtered by giving a reference string. The tested value from the file name will be filtered out if it is not equal to the reference value.

Raises:

LayoutMismatchError – In case enable_layouts is True and a mismatch between the layouts and the actual files is detected
ValueError – In case if one unique and homogeneous subset could not be extracted from the files metadata table

class fcollections.implementations.BasicNetcdfFilesDatabaseSwotLRWW(path: str, fs: AbstractFileSystem = LocalFileSystem(), enable_layouts: bool = True, follow_symlinks: bool = False)[source]#

Bases: FilesDatabase, PeriodMixin

Database mapping to explore and read the L3_LR_WIND_WAVE product.

fcollections.implementations

Contents

fcollections.implementations#