Skip to content

CORDEX Regional Climate Projections

Climate Models and Climatic Indices

Climate models are sophisticated computational tools that simulate Earth's climate system by integrating physical, chemical, and biological processes across the atmosphere, oceans, land, and ice. These models, ranging from global (GCMs) to regional (RCMs) scales, use mathematical equations to represent interactions such as energy transfer, ocean currents, and greenhouse gas effects. By projecting future scenarios under varying emissions pathways (e.g., RCPs or SSPs), they generate data on key variables like temperature, precipitation, and wind. These outputs feed into climatic indices—metrics such as heatwave frequency, drought severity, or extreme rainfall return periods—that quantify specific climate risks.

CORDEX provides high-resolution regional climate projections essential for assessing localized climate impacts across 14 continent-scale domains, including Europe (Diez-Sierra et al. 2022)1. The climatic indices provided by such projections are vital for policymakers, guiding adaptation strategies and resilience planning in sectors like agriculture, hydrology, and urban development.

Global Climate Models (GCMs)

Several studies have evaluated and ranked CMIP5 Global Climate Models (GCMs) used in EURO-CORDEX based on their ability to simulate historical climate and provide reliable boundary conditions for regional downscaling. Based on (Jury et al. 2015)2, we select a subset of GCMs as top performers for EURO-CORDEX.

Regional Climate Models (RCMs)

Based on evaluations from the EURO-CORDEX ensemble and peer-reviewed studies, there is no universal "best" regional climate model (RCM), but consensus exists on top-performing models for specific variables and regions (Kotlarski et al. 2015; Coppola et al. 2021)3 4.

CORDEX Variables

The table below provides a partial list of variables available in EURO-CORDEX — such as near-surface temperature (tas), precipitation (pr), and wind speed (sfcWind). The complete list is available in the CORDEX Variable Requirements document. These variables serve as foundational inputs for deriving critical climatic indices, ranging from heatwave duration (tx40_cdd) to extreme precipitation return periods (r100yrRP), which quantify climate extremes and trends, enabling researchers and policymakers to evaluate risks like droughts, floods, and temperature anomalies. By leveraging these standardized variables, stakeholders can perform robust, region-specific analyses to inform adaptation strategies and resilience planning under evolving climate scenarios.

Variable Name (CDS) Short Name Units Description
10m u-component of the wind uas m s⁻¹ Eastward wind component at 10m above the surface
10m v-component of the wind vas m s⁻¹ Northward wind component at 10m above the surface
10m wind speed sfcWind m s⁻¹ Magnitude of horizontal wind speed at 10m above the surface
200hPa temperature ta200 K Air temperature at the 200hPa pressure level
200hPa u-component of the wind ua200 m s⁻¹ Eastward wind component at 200hPa
200hPa v-component of the wind va200 m s⁻¹ Northward wind component at 200hPa
2m air temperature tas K Near-surface air temperature at 2m above ground
2m relative humidity hurs % Relative humidity at 2m above ground
2m surface specific humidity huss Dimensionless Specific humidity (mass of water vapor per unit mass of air) at 2m
500hPa geopotential height zg500 m Gravitational potential energy per unit mass at 500hPa
850hPa u-component of the wind ua850 m s⁻¹ Eastward wind component at 850hPa
850hPa v-component of the wind va850 m s⁻¹ Northward wind component at 850hPa
Evaporation evspsbl kg m⁻² s⁻¹ Mass of liquid water evaporating from land (includes sublimation)
Land area fraction sftlf % Fraction of grid cell occupied by land
Maximum 2m temperature in the last 24 hours tasmax K Daily maximum temperature at 2m above the surface
Mean precipitation flux pr kg m⁻² s⁻¹ Deposition of water (rain, snow, ice, hail) to the Earth's surface
Mean sea level pressure psl Pa Air pressure adjusted to sea level
Minimum 2m temperature in the last 24 hours tasmin K Daily minimum temperature at 2m above the surface
Orography orog m Surface elevation (0.0 over oceans)
Surface pressure ps Pa Air pressure at the lower boundary of the atmosphere
Surface solar radiation downwards rsds W m⁻² Downward shortwave radiative flux at the surface
Surface thermal radiation downward rlds W m⁻² Downward longwave radiative flux at the surface
Surface upwelling shortwave radiation rsus W m⁻² Upward shortwave radiative flux from the surface
Total cloud cover clt Dimensionless Fraction of the sky covered by clouds (whole atmospheric column)
Total run-off flux mrro kg m⁻² s⁻¹ Combined surface and subsurface liquid water draining from land

Notes

  • Units: Align with CORDEX/CDS specifications (e.g., pr in kg.m⁻².s⁻¹, tas in K)
  • Temporal resolution: Variables are available at 3-hourly, daily, monthly, or seasonal frequencies, which in the CDS API are indicated as 3hr, daily_mean, monthly_mean, seasonal_mean, respectively (non-European domains only include daily data).
  • Static variables: sftlf (land area fraction) and orog (topography) are time-independent
  • Standard Names: Standard naming following CF conventions can be found in CORDEX Variable Requirements Table

For further details, see the CORDEX Documentation.

Downloading CORDEX data

To download all the CORDEX data necessary to compute climatic indices and indicators useful to NATURE-DEMO, make sure to set up a Copernicus Climate Data Store (CDS) account as explained in the main page. Then run the following script:

python scripts/download_cordex_data.py

This will take a long time (~24h), since it will download a lot of data (~5TB). The data will be downloaded to the folder ~/data/cordex. That path can be changed by modifying the DATADIR variable in the script.

This download script uses the code in the module clima_data.cordex, which provides a convenient interface to the Copernicus Climate Data Store (CDS) API for downloading CORDEX data and is documented below.

Code documentation clima_data.cordex

TIME_FRAMES = {'his': {'historical': [1976, 2005]}, 'rcp': {'short': [2011, 2040], 'medium': [2041, 2070], 'long': [2071, 2100]}} module-attribute

Time frames for EURO-CORDEX data

Definition of the time frames used in analysing EURO-CORDEX data and computing climate indices and climate indicators.

CordexNames

Centralized registry for CORDEX climate model naming formats and conversions.

Handles conversion between different naming conventions used by: - Input/keys: Short lowercase keys for easy use - ESGF: Used in ESGF table and GCM parts of downloaded filenames - CDS: Used in CDS API requests (lowercase with underscores) - Filenames: Used in downloaded filenames (hybrid format)

Source code in clima_data/cordex.py
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
class CordexNames:
    r"""Centralized registry for CORDEX climate model naming formats and conversions.

    Handles conversion between different naming conventions used by:
    - Input/keys: Short lowercase keys for easy use
    - ESGF: Used in ESGF table and GCM parts of downloaded filenames
    - CDS: Used in CDS API requests (lowercase with underscores)
    - Filenames: Used in downloaded filenames (hybrid format)
    """

    # EURO-CORDEX coordinate system parameters
    CORDEX_CRS = ccrs.RotatedPole(pole_latitude=39.25, pole_longitude=-162)
    WGS84_TO_CORDEX_TRANS = pyproj.Transformer.from_crs("epsg:4326", CORDEX_CRS)

    # Global Climate Models - mapping from short keys to various formats
    GCMS = {
        "ichec": {
            "input": "ICHEC-EC-EARTH",
            "esgf": "ICHEC-EC-EARTH",
            "cds": "ichec_ec_earth",
        },
        "mpi": {
            "input": "MPI-M-MPI-ESM-LR",
            "esgf": "MPI-M-MPI-ESM-LR",
            "cds": "mpi_m_mpi_esm_lr",
        },
        "cnrm": {
            "input": "CNRM-CERFACS-CM5",
            "esgf": "CNRM-CERFACS-CNRM-CM5",
            "cds": "cnrm_cerfacs_cm5",
        },
        "hadgem": {
            "input": "MOHC-HadGEM2-ES",
            "esgf": "MOHC-HadGEM2-ES",
            "cds": "mohc_hadgem2_es",
        },
    }

    # Regional Climate Models - mapping from short keys to various formats
    RCMS = {
        "racmo": {
            "input": "KNMI-RACMO22E",
            "esgf": "RACMO22E",
            "cds": "knmi_racmo22e",
        },
        "rca4": {
            "input": "SMHI-RCA4",
            "esgf": "RCA4",
            "cds": "smhi_rca4",
        },
        "remo": {
            "input": "GERICS-REMO2015",
            "esgf": "REMO2015",
            "cds": "gerics_remo2015",
        },
        "cosmo": {
            "input": "CLMcom-ETH-COSMO-crCLIM",
            "esgf": "COSMO-crCLIM-v1-1",
            "cds": "clmcom_eth_cosmo_crclim",
        },
    }

    # Variables - mapping from input names to various formats
    VARIABLES = {
        "tas": {
            "input": "tas",
            "esgf": "tas",
            "cds": "2m_air_temperature",
        },
        "sfcWind": {
            "input": "sfcWind",
            "esgf": "sfcWind",
            "cds": "10m_wind_speed",
        },
        "tasmax": {
            "input": "tasmax",
            "esgf": "tasmax",
            "cds": "maximum_2m_temperature_in_the_last_24_hours",
        },
        "tasmin": {
            "input": "tasmin",
            "esgf": "tasmin",
            "cds": "minimum_2m_temperature_in_the_last_24_hours",
        },
        "pr": {
            "input": "pr",
            "esgf": "pr",
            "cds": "mean_precipitation_flux",
        },
        "uas": {
            "input": "uas",
            "esgf": "uas",
            "cds": "10m_u_component_of_the_wind",
        },
        "vas": {
            "input": "vas",
            "esgf": "vas",
            "cds": "10m_v_component_of_the_wind",
        },
        "ta200": {
            "input": "ta200",
            "esgf": "ta200",
            "cds": "200hpa_temperature",
        },
        "ua200": {
            "input": "ua200",
            "esgf": "ua200",
            "cds": "200hpa_u_component_of_the_wind",
        },
        "va200": {
            "input": "va200",
            "esgf": "va200",
            "cds": "200hpa_v_component_of_the_wind",
        },
        "hurs": {
            "input": "hurs",
            "esgf": "hurs",
            "cds": "2m_relative_humidity",
        },
        "huss": {
            "input": "huss",
            "esgf": "huss",
            "cds": "2m_surface_specific_humidity",
        },
        "zg500": {
            "input": "zg500",
            "esgf": "zg500",
            "cds": "500hpa_geopotential_height",
        },
        "ua850": {
            "input": "ua850",
            "esgf": "ua850",
            "cds": "850hpa_u_component_of_the_wind",
        },
        "va850": {
            "input": "va850",
            "esgf": "va850",
            "cds": "850hpa_v_component_of_the_wind",
        },
        "evspsbl": {
            "input": "evspsbl",
            "esgf": "evspsbl",
            "cds": "evaporation",
        },
        "sftlf": {
            "input": "sftlf",
            "esgf": "sftlf",
            "cds": "land_area_fraction",
        },
        "psl": {
            "input": "psl",
            "esgf": "psl",
            "cds": "mean_sea_level_pressure",
        },
        "orog": {
            "input": "orog",
            "esgf": "orog",
            "cds": "orography",
        },
        "ps": {
            "input": "ps",
            "esgf": "ps",
            "cds": "surface_pressure",
        },
        "rsds": {
            "input": "rsds",
            "esgf": "rsds",
            "cds": "surface_solar_radiation_downwards",
        },
        "rlds": {
            "input": "rlds",
            "esgf": "rlds",
            "cds": "surface_thermal_radiation_downward",
        },
        "rsus": {
            "input": "rsus",
            "esgf": "rsus",
            "cds": "surface_upwelling_shortwave_radiation",
        },
        "clt": {
            "input": "clt",
            "esgf": "clt",
            "cds": "total_cloud_cover",
        },
        "mrro": {
            "input": "mrro",
            "esgf": "mrro",
            "cds": "total_run_off_flux",
        },
    }

    # Frequencies - mapping from input names to various formats
    FREQUENCIES = {
        "3hr": {
            "input": "3hr",
            "esgf": "3hr",
            "cds": "3_hours",
        },
        "6hr": {
            "input": "6hr",
            "esgf": "6hr",
            "cds": "6_hours",
        },
        "day": {
            "input": "day",
            "esgf": "day",
            "cds": "daily_mean",
        },
        "mon": {
            "input": "mon",
            "esgf": "mon",
            "cds": "monthly_mean",
        },
        "sea": {
            "input": "sea",
            "esgf": "sea",
            "cds": "seasonal_mean",
        },
    }

    # Experiments - mapping from input names to various formats
    EXPERIMENTS = {
        "historical": {
            "input": "historical",
            "esgf": "historical",
            "cds": "historical",
        },
        "rcp25": {
            "input": "rcp25",
            "esgf": "rcp25",
            "cds": "rcp_2_5",
        },
        "rcp45": {
            "input": "rcp45",
            "esgf": "rcp45",
            "cds": "rcp_4_5",
        },
        "rcp85": {
            "input": "rcp85",
            "esgf": "rcp85",
            "cds": "rcp_8_5",
        },
    }

    @classmethod
    def get_gcm_name(cls, key: str, format: str) -> str:
        """Get GCM name in specified format. Returns input if no match found."""
        if key not in cls.GCMS:
            return key
        if format not in cls.GCMS[key]:
            return key
        return cls.GCMS[key][format]

    @classmethod
    def get_rcm_name(cls, key: str, format: str) -> str:
        """Get RCM name in specified format. Returns input if no match found."""
        if key not in cls.RCMS:
            return key
        if format not in cls.RCMS[key]:
            return key
        return cls.RCMS[key][format]

    @classmethod
    def get_variable_name(cls, name: str, format: str) -> str:
        """Get variable name in specified format. Returns input if no match found."""
        if name not in cls.VARIABLES:
            return name
        if format not in cls.VARIABLES[name]:
            return name
        return cls.VARIABLES[name][format]

    @classmethod
    def get_frequency_name(cls, name: str, format: str) -> str:
        """Get frequency name in specified format. Returns input if no match found."""
        if name not in cls.FREQUENCIES:
            return name
        if format not in cls.FREQUENCIES[name]:
            return name
        return cls.FREQUENCIES[name][format]

    @classmethod
    def get_experiment_name(cls, name: str, format: str) -> str:
        """Get experiment name in specified format. Returns input if no match found."""
        if name not in cls.EXPERIMENTS:
            return name
        if format not in cls.EXPERIMENTS[name]:
            return name
        return cls.EXPERIMENTS[name][format]

    @classmethod
    def get_filename_gcm(cls, key: str) -> str:
        """Get GCM name as it appears in downloaded filenames (ESGF format)."""
        return cls.get_gcm_name(key, "esgf")

    @classmethod
    def get_filename_rcm(cls, key: str) -> str:
        """Get RCM name as it appears in downloaded filenames (input format)."""
        return cls.get_rcm_name(key, "input")

    @classmethod
    def list_gcms(cls) -> list[str]:
        """Get list of all supported GCM keys."""
        return list(cls.GCMS.keys())

    @classmethod
    def list_rcms(cls) -> list[str]:
        """Get list of all supported RCM keys."""
        return list(cls.RCMS.keys())

    @classmethod
    def list_variables(cls) -> list[str]:
        """Get list of all supported variable names."""
        return list(cls.VARIABLES.keys())

    @classmethod
    def list_frequencies(cls) -> list[str]:
        """Get list of all supported frequency names."""
        return list(cls.FREQUENCIES.keys())

    @classmethod
    def list_experiments(cls) -> list[str]:
        """Get list of all supported experiment names."""
        return list(cls.EXPERIMENTS.keys())

get_experiment_name(name, format) classmethod

Get experiment name in specified format. Returns input if no match found.

Source code in clima_data/cordex.py
323
324
325
326
327
328
329
330
@classmethod
def get_experiment_name(cls, name: str, format: str) -> str:
    """Get experiment name in specified format. Returns input if no match found."""
    if name not in cls.EXPERIMENTS:
        return name
    if format not in cls.EXPERIMENTS[name]:
        return name
    return cls.EXPERIMENTS[name][format]

get_filename_gcm(key) classmethod

Get GCM name as it appears in downloaded filenames (ESGF format).

Source code in clima_data/cordex.py
332
333
334
335
@classmethod
def get_filename_gcm(cls, key: str) -> str:
    """Get GCM name as it appears in downloaded filenames (ESGF format)."""
    return cls.get_gcm_name(key, "esgf")

get_filename_rcm(key) classmethod

Get RCM name as it appears in downloaded filenames (input format).

Source code in clima_data/cordex.py
337
338
339
340
@classmethod
def get_filename_rcm(cls, key: str) -> str:
    """Get RCM name as it appears in downloaded filenames (input format)."""
    return cls.get_rcm_name(key, "input")

get_frequency_name(name, format) classmethod

Get frequency name in specified format. Returns input if no match found.

Source code in clima_data/cordex.py
314
315
316
317
318
319
320
321
@classmethod
def get_frequency_name(cls, name: str, format: str) -> str:
    """Get frequency name in specified format. Returns input if no match found."""
    if name not in cls.FREQUENCIES:
        return name
    if format not in cls.FREQUENCIES[name]:
        return name
    return cls.FREQUENCIES[name][format]

get_gcm_name(key, format) classmethod

Get GCM name in specified format. Returns input if no match found.

Source code in clima_data/cordex.py
287
288
289
290
291
292
293
294
@classmethod
def get_gcm_name(cls, key: str, format: str) -> str:
    """Get GCM name in specified format. Returns input if no match found."""
    if key not in cls.GCMS:
        return key
    if format not in cls.GCMS[key]:
        return key
    return cls.GCMS[key][format]

get_rcm_name(key, format) classmethod

Get RCM name in specified format. Returns input if no match found.

Source code in clima_data/cordex.py
296
297
298
299
300
301
302
303
@classmethod
def get_rcm_name(cls, key: str, format: str) -> str:
    """Get RCM name in specified format. Returns input if no match found."""
    if key not in cls.RCMS:
        return key
    if format not in cls.RCMS[key]:
        return key
    return cls.RCMS[key][format]

get_variable_name(name, format) classmethod

Get variable name in specified format. Returns input if no match found.

Source code in clima_data/cordex.py
305
306
307
308
309
310
311
312
@classmethod
def get_variable_name(cls, name: str, format: str) -> str:
    """Get variable name in specified format. Returns input if no match found."""
    if name not in cls.VARIABLES:
        return name
    if format not in cls.VARIABLES[name]:
        return name
    return cls.VARIABLES[name][format]

list_experiments() classmethod

Get list of all supported experiment names.

Source code in clima_data/cordex.py
362
363
364
365
@classmethod
def list_experiments(cls) -> list[str]:
    """Get list of all supported experiment names."""
    return list(cls.EXPERIMENTS.keys())

list_frequencies() classmethod

Get list of all supported frequency names.

Source code in clima_data/cordex.py
357
358
359
360
@classmethod
def list_frequencies(cls) -> list[str]:
    """Get list of all supported frequency names."""
    return list(cls.FREQUENCIES.keys())

list_gcms() classmethod

Get list of all supported GCM keys.

Source code in clima_data/cordex.py
342
343
344
345
@classmethod
def list_gcms(cls) -> list[str]:
    """Get list of all supported GCM keys."""
    return list(cls.GCMS.keys())

list_rcms() classmethod

Get list of all supported RCM keys.

Source code in clima_data/cordex.py
347
348
349
350
@classmethod
def list_rcms(cls) -> list[str]:
    """Get list of all supported RCM keys."""
    return list(cls.RCMS.keys())

list_variables() classmethod

Get list of all supported variable names.

Source code in clima_data/cordex.py
352
353
354
355
@classmethod
def list_variables(cls) -> list[str]:
    """Get list of all supported variable names."""
    return list(cls.VARIABLES.keys())

cordex_data_check(filepath)

Lightweight quality check for CDS CORDEX data

Source code in clima_data/cordex.py
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
def cordex_data_check(filepath: str) -> list:
    """Lightweight quality check for CDS CORDEX data"""
    ds = xr.open_dataset(filepath)

    # Quick data content checks (since metadata is already validated by CDS)
    issues = []
    for var_name, var in ds.data_vars.items():
        # Check for obvious data issues
        if var.isnull().all():
            issues.append(f"{var_name}: All values are NaN")
            continue

        # Use xclim for variable-specific checks
        flags = data_flags(var, ds=ds)
        if flags is not None:
            # flags is a Dataset with boolean scalar values for each check
            failed_checks = []
            for check_name, check_result in flags.data_vars.items():
                if check_result.item():  # .item() extracts the boolean value
                    failed_checks.append(check_name)
            if failed_checks:
                issues.append(f"{var_name}: Failed checks - {', '.join(failed_checks)}")
    ds.close()
    return issues

download_cordex(variable, gcm_model, rcm_model, experiment, freq, year_start, year_end, data_dir, verbose=False)

Worker function handling CDS download and file processing

Source code in clima_data/cordex.py
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
def download_cordex(
    variable: str,
    gcm_model: str,
    rcm_model: str,
    experiment: str,
    freq: str,
    year_start: int,
    year_end: int,
    data_dir: str,
    verbose: bool = False,
) -> str:
    """Worker function handling CDS download and file processing"""

    # Check that all parameters are valid
    assert experiment in CordexNames.EXPERIMENTS, f"Invalid experiment name: {experiment}"

    # Configure paths
    filename = get_filename(variable, gcm_model, rcm_model, experiment, freq, year_start, year_end)
    dir_path = get_dirpath(data_dir, variable, experiment, freq)
    zip_path = os.path.join(dir_path, f"{filename.replace('*', 'X')}.zip")
    os.makedirs(dir_path, exist_ok=True)

    # Check if files already exist before downloading
    existing_files = _check_existing_files(
        dir_path, variable, gcm_model, rcm_model, experiment, freq, year_start, year_end
    )
    if existing_files:
        return f"Already exists: {existing_files[0]}. SKIPPING."

    # Create unique client instance per thread
    client = cdsapi.Client()  # url and key are looked up in ~/.cdsapirc

    # Build CDS API request
    request = _build_cds_request(
        variable, gcm_model, rcm_model, experiment, freq, year_start, year_end
    )
    if verbose:
        print(f"Requesting {filename} with parameters: {request}")

    try:
        client.retrieve(
            "projections-cordex-domains-single-levels",
            request,
            zip_path,
        )

        # Process downloaded files
        with zipfile.ZipFile(zip_path, "r") as z:
            z.extractall(dir_path)
        os.remove(zip_path)
        return f"Success: {filename}.nc"
    except Exception as e:
        return f"Failed to download {filename}: {e!s}"

get_esgf_combinations_for_experiment(experiment, variable=None, frequency=None)

Get available combinations from ESGF for specific experiment with optional filtering

Parameters:

Name Type Description Default
experiment str

Experiment ID (e.g., 'historical', 'rcp45', 'rcp85')

required
variable str | None

Optional variable name to filter by

None
frequency str | None

Optional frequency to filter by (e.g., 'day', 'mon')

None

Returns:

Type Description
set[tuple[str, str, str, str]]

Set of tuples containing (gcm, rcm, variable, frequency) for valid combinations

Source code in clima_data/cordex.py
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
def get_esgf_combinations_for_experiment(
    experiment: str, variable: str | None = None, frequency: str | None = None
) -> set[tuple[str, str, str, str]]:
    """Get available combinations from ESGF for specific experiment with optional filtering

    Args:
        experiment: Experiment ID (e.g., 'historical', 'rcp45', 'rcp85')
        variable: Optional variable name to filter by
        frequency: Optional frequency to filter by (e.g., 'day', 'mon')

    Returns:
        Set of tuples containing (gcm, rcm, variable, frequency) for valid combinations
    """

    try:
        print(f"Querying ESGF data for experiment: {experiment}")
        if variable:
            print(f"  Filtering by variable: {variable}")
        if frequency:
            print(f"  Filtering by frequency: {frequency}")

        df = _get_esgf_data()

        # Filter by experiment
        filtered_data = df[df["experiment_id"] == experiment]
        if filtered_data.empty:
            print(f"Warning: No data found for experiment '{experiment}' in ESGF table")
            return set()

        # Apply additional filters if provided
        if variable:
            filtered_data = filtered_data[filtered_data["variable"] == variable]
            if filtered_data.empty:
                print(
                    f"Warning: No data found for variable '{variable}' in experiment '{experiment}'"
                )
                return set()

        if frequency:
            filtered_data = filtered_data[filtered_data["frequency"] == frequency]
            if filtered_data.empty:
                print(
                    f"Warning: No data found for frequency '{frequency}' in experiment '{experiment}'"
                )
                return set()

        # Extract unique combinations with all relevant parameters
        combinations = set()
        unique_combinations = filtered_data[
            ["driving_model_id", "model_id", "variable", "frequency"]
        ].drop_duplicates()

        for _, row in unique_combinations.iterrows():
            gcm = row["driving_model_id"]
            rcm = row["model_id"]
            var = row["variable"]
            freq = row["frequency"]
            combinations.add((gcm, rcm, var, freq))

        print(f"Found {len(combinations)} unique combinations for {experiment}")
        return combinations

    except pd.errors.EmptyDataError:
        print("Error: ESGF CSV file is empty or malformed")
        return set()
    except pd.errors.ParserError as e:
        print(f"Error: Could not parse ESGF CSV file: {e}")
        return set()
    except Exception as e:
        print(f"Warning: Could not fetch ESGF data for {experiment}: {e}")
        print("All tasks will be pruned - no combinations available")
        return set()

get_filename(variable, gcm_model, rcm_model, experiment, freq, year_start, year_end)

File name for the downloaded data based on the parameters provided

Return file name following the CDS API naming convention for CORDEX data explained at https://confluence.ecmwf.int/display/CKB/CORDEX%3A+Regional+climate+projections:

<variable>_<domain>_<driving-model>_<experiment>_<ensemble_member>_<rcm-model>_<rcm-run>_<time-frequency>_<temporal-range>.nc

Where
  • is a short variable name, e.g. “tas” for “temperature at the surface”
  • is "EUR-11" for EURO-CORDEX data
  • is the GCM model that produced the boundary conditions
  • is the name of the experiment used to extract the boundary conditions (historical, rcp45, rcp85)
  • is the ensemble identifier in the form “rip”, X, Y and Z are integers
  • is the name of the model that produced the data
  • is the version run of the model in the form of "vX" where X is integer
  • is the time series frequency (e.g., monthly, daily, seasonal)
  • the is in the form YYYYMM[DDHH]-YYYY[MMDDHH], where Y is year, M is the month, D is day and H is hour. Note that day and hour are optional (indicated by the square brackets) and are only used if needed by the frequency of the data. For example daily data from the 1st of January 1980 to the 31st of December 2010 would be written 19800101-20101231.
Source code in clima_data/cordex.py
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
def get_filename(
    variable: str,
    gcm_model: str,
    rcm_model: str,
    experiment: str,
    freq: str,
    year_start: int | None,
    year_end: int | None,
) -> str:
    """File name for the downloaded data based on the parameters provided

    Return file name following the CDS API naming convention for CORDEX data explained
    at <https://confluence.ecmwf.int/display/CKB/CORDEX%3A+Regional+climate+projections>:

    `<variable>_<domain>_<driving-model>_<experiment>_<ensemble_member>_<rcm-model>_<rcm-run>_<time-frequency>_<temporal-range>.nc`

    Where:
        - <variable> is a short variable name, e.g. “tas” for “temperature at the surface”
        - <domain> is "EUR-11" for EURO-CORDEX data
        - <driving-model> is the GCM model that produced the boundary conditions
        - <experiment> is the name of the experiment used to extract the boundary conditions (historical, rcp45, rcp85)
        - <ensemble-member> is the ensemble identifier in the form “r<X>i<Y>p<Z>”, X, Y and Z are integers
        - <rcm-model> is the name of the model that produced the data
        - <rcm-run> is the version run of the model in the form of "vX" where X is integer
        - <time-frequency> is the time series frequency (e.g., monthly, daily, seasonal)
        - the <temporal-range> is in the form YYYYMM[DDHH]-YYYY[MMDDHH], where Y is year, M is the month, D is day and H is hour. Note that day and hour are optional (indicated by the square brackets) and are only used if needed by the frequency of the data. For example daily data from the 1st of January 1980 to the 31st of December 2010 would be written 19800101-20101231.
    """
    assert variable in CordexNames.VARIABLES, f"Invalid variable name: {variable}"
    domain = "EUR-11"

    # Convert to ESGF format for filename (returns input if not found)
    if gcm_model != "*":
        gcm_model = CordexNames.get_gcm_name(gcm_model, "esgf")

    assert experiment in CordexNames.EXPERIMENTS, f"Invalid experiment name: {experiment}"
    ensemble = "r1i1p1"

    if rcm_model != "*":
        rcm_model = CordexNames.get_rcm_name(rcm_model, "input")

    rcm_run = "v*"

    assert freq in CordexNames.FREQUENCIES, f"Invalid frequency name: {freq}"
    if freq in ["3hr", "6hr", "day"]:
        day_start, day_end = "0101", "1231"
    elif freq == "mon":
        day_start, day_end = "01", "12"
    elif freq == "sea":
        day_start, day_end = "12", "11"
    else:
        raise ValueError(f"Invalid frequency: {freq}")

    if year_start is None:
        year_start = "*"  # type: ignore[assignment]
    if year_end is None:
        year_end = "*"  # type: ignore[assignment]

    return f"{variable}_{domain}_{gcm_model}_{experiment}_{ensemble}_{rcm_model}*_{rcm_run}_{freq}_{year_start}{day_start}-{year_end}{day_end}"

get_files(data_dir, variable, experiment, freq, year_start=None)

Get file paths for the specified variable, experiment, frequency, and year range, for any given RCM model.

Source code in clima_data/cordex.py
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
def get_files(
    data_dir: str,
    variable: str,
    experiment: str,
    freq: str,
    year_start: int | None = None,
) -> list:
    """Get file paths for the specified variable, experiment, frequency, and year range, for any given RCM model."""
    dir_path = get_dirpath(data_dir, variable, experiment, freq)
    if isinstance(year_start, int):
        year_end = year_start + 4 if freq == "day" else year_start + 9
    elif year_start is None:
        year_end = None
    else:
        raise ValueError("year_start must be an integer or '*'")
    filename = get_filename(variable, "*", "*", experiment, freq, year_start, year_end)
    return glob.glob(dir_path + f"/{filename}.nc")

prune_invalid_tasks(tasks)

Prune tasks that are not available in ESGF.

This function validates GCM-RCM combinations against ESGF availability for each experiment using the EURO-CORDEX ESGF table. Note that ESGF availability does not guarantee CDS API availability - some combinations may exist in ESGF but not be accessible through the CDS API.

Parameters:

Name Type Description Default
tasks list

List of task dictionaries with keys: variable, gcm_model, rcm_model, experiment, freq, year_start, year_end

required

Returns:

Type Description
list

List of pruned tasks with only valid combinations according to ESGF

Source code in clima_data/cordex.py
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
def prune_invalid_tasks(tasks: list) -> list:
    """
    Prune tasks that are not available in ESGF.

    This function validates GCM-RCM combinations against ESGF availability
    for each experiment using the EURO-CORDEX ESGF table. Note that ESGF
    availability does not guarantee CDS API availability - some combinations
    may exist in ESGF but not be accessible through the CDS API.

    Args:
        tasks: List of task dictionaries with keys: variable, gcm_model, rcm_model, experiment, freq, year_start, year_end

    Returns:
        List of pruned tasks with only valid combinations according to ESGF
    """

    # Group tasks by experiment, variable, and frequency for more efficient ESGF queries
    tasks_by_criteria = defaultdict(list)
    for task in tasks:
        # Get ESGF frequency name (same as input for frequencies)
        esgf_freq = task["freq"]

        criteria = (task["experiment"], task["variable"], esgf_freq)
        tasks_by_criteria[criteria].append(task)

    pruned_tasks = []
    total_pruned = 0

    for (experiment, variable, frequency), exp_tasks in tasks_by_criteria.items():
        print(f"\nProcessing {len(exp_tasks)} tasks for: {experiment}, {variable}, {frequency}")

        # Get available combinations for this specific criteria set
        available_combinations = get_esgf_combinations_for_experiment(
            experiment, variable, frequency
        )

        if not available_combinations:
            print(f"No valid combinations found, skipping all {len(exp_tasks)} tasks")
            total_pruned += len(exp_tasks)
            continue

        exp_pruned, exp_kept = 0, 0

        # Track which combinations we're looking for vs what's available
        requested_combinations = set()

        for task in exp_tasks:
            gcm_norm = CordexNames.get_gcm_name(task["gcm_model"], "esgf")
            rcm_norm = CordexNames.get_rcm_name(task["rcm_model"], "esgf")

            # Create tuple to match the enhanced function's return format
            requested_combo = (gcm_norm, rcm_norm, variable, frequency)
            requested_combinations.add(requested_combo)

            if requested_combo in available_combinations:
                pruned_tasks.append(task)
                exp_kept += 1
            else:
                exp_pruned += 1

        print(f"  Requested {len(requested_combinations)} unique combinations")
        print(f"  Available {len(available_combinations)} combinations in ESGF")
        print(f"  Kept {exp_kept} tasks, pruned {exp_pruned} tasks")

        # Show which combinations were requested but not available
        missing_combinations = requested_combinations - available_combinations
        if missing_combinations:
            print("  Missing combinations:")
            for combo in sorted(missing_combinations):
                print(f"    {combo[0]} + {combo[1]} ({combo[2]}, {combo[3]})")

        total_pruned += exp_pruned

    print(f"ESGF pruning removed {total_pruned} tasks not available in ESGF")
    print(f"Final: {len(pruned_tasks)} valid tasks out of {len(tasks)} original tasks")

    return pruned_tasks

  1. Diez-Sierra J, Iturbide M, Gutiérrez JM, et al (2022) The Worldwide C3S CORDEX Grand Ensemble: A Major Contribution to Assess Regional Climate Change in the IPCC AR6 Atlas. Bulletin of the American Meteorological Society 103:E2804--E2826. https://doi.org/10.1175/BAMS-D-22-0111.1 

  2. Jury MW, Prein AF, Truhetz H, Gobiet A (2015) Evaluation of CMIP5 models in the context of dynamical downscaling over europe. Journal of Climate 28:5575--5582. https://doi.org/10.1175/JCLI-D-14-00430.1 

  3. Kotlarski S, Lüthi D, Schär C (2015) The elevation dependency of 21st century european climate change: An RCM ensemble perspective. International Journal of Climatology 35:3902--3920. https://doi.org/10.1002/joc.4254 

  4. Coppola E, Stocchi P, Pichelli E, et al (2021) Non-hydrostatic RegCM4 (RegCM4-NH): Model description and case studies over multiple domains. Geoscientific Model Development 14:7705--7723. https://doi.org/10.5194/gmd-14-7705-2021