Advanced usage#

Remote file systems#

It is possible to access a file set from a remote location. Fcollections is based on the powerful fsspec abstraction. As a consequence, files collections might accept any file system.

Warning

In case the reader does not support a specific file system, an error will be triggered. The solution is to implement its own reader following Building an implementation

The following code shows how to access data directly from the AVISO public FTP server using both FTP and SFTP protocols. You will need credentials to authentify (see the aviso website)

from fsspec.implementations.ftp import FTPFileSystem
from fcollections.implementations import NetcdfFilesDatabaseSwotLRL2

fs = FTPFileSystem('ftp-access.aviso.altimetry.fr', 21, username='...', password='...')
db = NetcdfFilesDatabaseSwotLRL2('/swot_products/l2_karin/l2_lr_ssh/PIC2/Expert/cycle_031', fs=fs)
ds = db.list_files(pass_number=1)
from fsspec.implementations.sftp import SFTPFileSystem
from fcollections.implementations import NetcdfFilesDatabaseSwotLRL2

fs = SFTPFileSystem(host='ftp-access.aviso.altimetry.fr', port=2221, username='...', password='...')
db = NetcdfFilesDatabaseSwotLRL2('/swot_products/l2_karin/l2_lr_ssh/PIC2/Expert/cycle_031', fs=fs)
ds = db.list_files(pass_number=1)

Note

paramiko must be installed to use the SFTP implementation of fsspec

Remote file system listing can be quite long. Implementations are usually shipped with layouts for an improved listing speed. See the Layout introduction if listing performance becomes an issue.

Disable layouts#

Files Collections implementations can define up to two classes of pre-configured layouts:

  • Flat layouts: the files are in a single folder without any nesting

  • Official layouts: files’ organization mirrors public data providers such as AVISO, Copernicus Marine, etc…

There is no easy way to ensure that the pre-configured layouts perfectly matches your target. The current strategy is to raise a LayoutMismatchError if a folder is not recognized. This behavior can be changed setting the enable_layouts parameter:

from fcollections.implementations import NetcdfFilesDatabaseSwotLRL2

db = NetcdfFilesDatabaseSwotLRL2('/mypath/with/custom_nesting', enable_layouts=False)

This will disable the branch exploration pruning and slow down the files listing. To avoid losing performance, it is possible to modify an existing implementation by defining an additionnal layout.

from fcollections.core import Layout

layout = Layout(...)
NetcdfFilesDatabaseSwotLRL2.layouts.append(layout)

# Branch pruning will be enabled again
db = NetcdfFilesDatabaseSwotLRL2('/mypath/with/custom_nesting', enable_layouts=True)