Find and download files from ESGF#
esmvalcore.esgf#
Find files on the ESGF and download them.
This module uses esgf-pyclient
to search for and download files from the Earth System Grid Federation (ESGF).
esgf-pyclient uses a
deprecated API
that is scheduled to be taken offline and replaced by new APIs based on
STAC (ESGF East) and Globus (ESGF West). An ESGF node mimicking the deprecated
API but built op top of Globus will be kept online for some time at
https://esgf-node.ornl.gov/esgf-1-5-bridge, but users are encouraged
to migrate to the new APIs as soon as possible by using the
esmvalcore.io.intake_esgf module instead.
This module provides the function esmvalcore.esgf.find_files()
for searching for files on ESGF using the ESMValTool vocabulary.
It returns esmvalcore.esgf.ESGFFile objects, which have a convenient
esmvalcore.esgf.ESGFFile.download() method for downloading the file.
A esmvalcore.esgf.download() function for downloading multiple files in
parallel is also available.
It also provides an esmvalcore.esgf.ESGFDataSource that can be
used to find files on ESGF from the Dataset
or the recipe. To use it, create a file with the following
configuration in ~/.config/esmvaltool:
# Use a lower priority than for esmvalcore.local.LocalDataSource
# to avoid searching ESGF with the setting `search_esgf: when_missing`.
projects:
CMIP6: &esgf-pyclient-data
data:
esgf-pyclient:
type: "esmvalcore.esgf.ESGFDataSource"
download_dir: ~/climate_data
priority: 10
CMIP5:
<<: *esgf-pyclient-data
CMIP3:
<<: *esgf-pyclient-data
CORDEX:
<<: *esgf-pyclient-data
obs4MIPs:
<<: *esgf-pyclient-data
See ESGF configuration for instructions on additional configuration options of this module.
Classes:
|
|
|
File on the ESGF. |
Functions:
|
Download multiple ESGFFiles in parallel. |
|
Search for files on ESGF. |
- class esmvalcore.esgf.ESGFDataSource(name: str, project: str, priority: int, download_dir: pathlib._local.Path)[source]
Bases:
DataSourceAttributes:
debug_infoA string containing debug information when no data is found.
download_dirThe destination directory where data will be downloaded.
nameA name identifying the data source.
priorityThe priority of the data source.
projectThe project that the data source provides data for.
Methods:
find_data(**facets)Find data.
- debug_info: str = ''
A string containing debug information when no data is found.
- download_dir: Path
The destination directory where data will be downloaded.
- name: str
A name identifying the data source.
- priority: int
The priority of the data source. Lower values have priority.
- project: str
The project that the data source provides data for.
- class esmvalcore.esgf.ESGFFile(results: Iterable[FileResult], dest_folder: Path | None = None)[source]
Bases:
DataElementFile on the ESGF.
This is the object returned by
esmvalcore.esgf.find_files().- Parameters:
results (Iterable[FileResult])
dest_folder (Path | None)
- dataset
The name of the dataset that the file is part of.
- Type:
- name
The name of the file.
- Type:
- size
The size of the file in bytes.
- Type:
Attributes:
attributesAttributes are key-value pairs describing the data.
Facets are key-value pairs that can be used for searching the data.
A unique name identifying the data.
Methods:
download(dest_folder)Download the file.
local_file(dest_folder)Return the path to the local file after download.
prepare()Prepare the data for access.
to_iris([ignore_warnings])Load the data as Iris cubes.
- local_file(dest_folder: Path | None) LocalFile[source]
Return the path to the local file after download.
- name: str
A unique name identifying the data.
- to_iris(ignore_warnings: list[dict[str, Any]] | None = None) CubeList[source]
Load the data as Iris cubes.
- Parameters:
ignore_warnings (list[dict[str, Any]] | None) – Keyword arguments passed to
warnings.filterwarnings()used to ignore warnings issued byiris.load_raw(). Each list element corresponds to one call towarnings.filterwarnings().- Returns:
The loaded data.
- Return type:
- esmvalcore.esgf.download(files, dest_folder, n_jobs=4)[source]
Download multiple ESGFFiles in parallel.
- esmvalcore.esgf.find_files(*, project, short_name, dataset, **facets)[source]
Search for files on ESGF.
- Parameters:
project (str) – Choose from CMIP3, CMIP5, CMIP6, CORDEX, or obs4MIPs.
short_name (str) – The name of the variable.
dataset (str) – The name of the dataset.
**facets (Union[str, list[str]]) – Any other search facets. An
'*'can be used to match any value. By default, only the latest version of a file will be returned. To select all versions useversion='*'while other omitted facets will default to'*'. It is also possible to specify multiple values for a facet, e.g.exp=['historical', 'ssp585']will match any file that belongs to either the historical or ssp585 experiment. Thetimerangefacet can be specified in ISO 8601 format.
Note
A value of
timerange='*'is supported, but combining a'*'with a time or period as supported in the recipe is currently not supported and will return all found files.Examples
Examples of how to use this function for all supported projects.
Search for a CMIP3 dataset:
>>> find_files( ... project='CMIP3', ... frequency='mon', ... short_name='tas', ... dataset='cccma_cgcm3_1', ... exp='historical', ... ensemble='run1', ... ) [ESGFFile:cmip3/CCCma/cccma_cgcm3_1/historical/mon/atmos/run1/tas/v1/tas_a1_20c3m_1_cgcm3.1_t47_1850_2000.nc]
Search for a CMIP5 dataset:
>>> find_files( ... project='CMIP5', ... mip='Amon', ... short_name='tas', ... dataset='inmcm4', ... exp='historical', ... ensemble='r1i1p1', ... ) [ESGFFile:cmip5/output1/INM/inmcm4/historical/mon/atmos/Amon/r1i1p1/v20130207/tas_Amon_inmcm4_historical_r1i1p1_185001-200512.nc]
Search for a CMIP6 dataset:
>>> find_files( ... project='CMIP6', ... mip='Amon', ... short_name='tas', ... dataset='CanESM5', ... exp='historical', ... ensemble='r1i1p1f1', ... ) [ESGFFile:CMIP6/CMIP/CCCma/CanESM5/historical/r1i1p1f1/Amon/tas/gn/v20190429/tas_Amon_CanESM5_historical_r1i1p1f1_gn_185001-201412.nc]
Search for a CORDEX dataset and limit the search results to files containing data to the years in the range 1990-2000:
>>> find_files( ... project='CORDEX', ... frequency='mon', ... dataset='COSMO-crCLIM-v1-1', ... short_name='tas', ... exp='historical', ... ensemble='r1i1p1', ... domain='EUR-11', ... driver='MPI-M-MPI-ESM-LR', ... timerange='1990/2000', ... ) [ESGFFile:cordex/output/EUR-11/CLMcom-ETH/MPI-M-MPI-ESM-LR/historical/r1i1p1/COSMO-crCLIM-v1-1/v1/mon/tas/v20191219/tas_EUR-11_MPI-M-MPI-ESM-LR_historical_r1i1p1_CLMcom-ETH-COSMO-crCLIM-v1-1_v1_mon_198101-199012.nc, ESGFFile:cordex/output/EUR-11/CLMcom-ETH/MPI-M-MPI-ESM-LR/historical/r1i1p1/COSMO-crCLIM-v1-1/v1/mon/tas/v20191219/tas_EUR-11_MPI-M-MPI-ESM-LR_historical_r1i1p1_CLMcom-ETH-COSMO-crCLIM-v1-1_v1_mon_199101-200012.nc]
Search for an obs4MIPs dataset:
>>> find_files( ... project='obs4MIPs', ... frequency='mon', ... dataset='CERES-EBAF', ... short_name='rsutcs', ... ) [ESGFFile:obs4MIPs/NASA-LaRC/CERES-EBAF/atmos/mon/v20160610/rsutcs_CERES-EBAF_L3B_Ed2-8_200003-201404.nc]
Search for any ensemble member:
>>> find_files( ... project='CMIP6', ... mip='Amon', ... short_name='tas', ... dataset='BCC-CSM2-MR', ... exp='historical', ... ensemble='*', ... ) [ESGFFile:CMIP6/CMIP/BCC/BCC-CSM2-MR/historical/r1i1p1f1/Amon/tas/gn/v20181126/tas_Amon_BCC-CSM2-MR_historical_r1i1p1f1_gn_185001-201412.nc, ESGFFile:CMIP6/CMIP/BCC/BCC-CSM2-MR/historical/r2i1p1f1/Amon/tas/gn/v20181115/tas_Amon_BCC-CSM2-MR_historical_r2i1p1f1_gn_185001-201412.nc, ESGFFile:CMIP6/CMIP/BCC/BCC-CSM2-MR/historical/r3i1p1f1/Amon/tas/gn/v20181119/tas_Amon_BCC-CSM2-MR_historical_r3i1p1f1_gn_185001-201412.nc]
Search for all available versions of a file:
>>> find_files( ... project='CMIP5', ... mip='Amon', ... short_name='tas', ... dataset='CCSM4', ... exp='historical', ... ensemble='r1i1p1', ... version='*', ... ) [ESGFFile:cmip5/output1/NCAR/CCSM4/historical/mon/atmos/Amon/r1i1p1/v20121031/tas_Amon_CCSM4_historical_r1i1p1_185001-200512.nc, ESGFFile:cmip5/output1/NCAR/CCSM4/historical/mon/atmos/Amon/r1i1p1/v20130425/tas_Amon_CCSM4_historical_r1i1p1_185001-200512.nc, ESGFFile:cmip5/output1/NCAR/CCSM4/historical/mon/atmos/Amon/r1i1p1/v20160829/tas_Amon_CCSM4_historical_r1i1p1_185001-200512.nc]
Search for a specific version of a file:
>>> find_files( ... project='CMIP5', ... mip='Amon', ... short_name='tas', ... dataset='CCSM4', ... exp='historical', ... ensemble='r1i1p1', ... version='v20130425', ... ) [ESGFFile:cmip5/output1/NCAR/CCSM4/historical/mon/atmos/Amon/r1i1p1/v20130425/tas_Amon_CCSM4_historical_r1i1p1_185001-200512.nc]
esmvalcore.esgf.facets#
Module containing mappings from our names to ESGF names.
Data:
Cache for the mapping between recipe/filesystem and ESGF dataset names. |
|
Mapping between the recipe and ESGF facet names. |
Functions:
Create the DATASET_MAP from recipe datasets to ESGF dataset names. |
- esmvalcore.esgf.facets.DATASET_MAP = {'CMIP3': {}, 'CMIP5': {'ACCESS1-0': 'ACCESS1.0', 'ACCESS1-3': 'ACCESS1.3', 'CESM1-BGC': 'CESM1(BGC)', 'CESM1-CAM5': 'CESM1(CAM5)', 'CESM1-CAM5-1-FV2': 'CESM1(CAM5.1,FV2)', 'CESM1-FASTCHEM': 'CESM1(FASTCHEM)', 'CESM1-WACCM': 'CESM1(WACCM)', 'CSIRO-Mk3-6-0': 'CSIRO-Mk3.6.0', 'GFDL-CM2p1': 'GFDL-CM2.1', 'MRI-AGCM3-2H': 'MRI-AGCM3.2H', 'MRI-AGCM3-2S': 'MRI-AGCM3.2S', 'bcc-csm1-1': 'BCC-CSM1.1', 'bcc-csm1-1-m': 'BCC-CSM1.1(m)', 'fio-esm': 'FIO-ESM', 'inmcm4': 'INM-CM4'}, 'CMIP6': {}, 'CORDEX': {}, 'obs4MIPs': {}}#
Cache for the mapping between recipe/filesystem and ESGF dataset names.
- esmvalcore.esgf.facets.FACETS = {'CMIP3': {'dataset': 'model', 'ensemble': 'ensemble', 'exp': 'experiment', 'frequency': 'time_frequency', 'short_name': 'variable'}, 'CMIP5': {'dataset': 'model', 'ensemble': 'ensemble', 'exp': 'experiment', 'frequency': 'time_frequency', 'institute': 'institute', 'mip': 'cmor_table', 'product': 'product', 'short_name': 'variable'}, 'CMIP6': {'activity': 'activity_drs', 'dataset': 'source_id', 'ensemble': 'member_id', 'exp': 'experiment_id', 'grid': 'grid_label', 'institute': 'institution_id', 'mip': 'table_id', 'short_name': 'variable'}, 'CORDEX': {'dataset': 'rcm_name', 'domain': 'domain', 'driver': 'driving_model', 'ensemble': 'ensemble', 'exp': 'experiment', 'frequency': 'time_frequency', 'institute': 'institute', 'product': 'product', 'short_name': 'variable'}, 'obs4MIPs': {'dataset': 'source_id', 'frequency': 'time_frequency', 'institute': 'institute', 'short_name': 'variable'}}#
Mapping between the recipe and ESGF facet names.