Introduction
Overview
Teaching: 5 min
Exercises: 10 min
Compatibility:Questions
What is ESMValTool?
Who are the people behind ESMValTool?
Objectives
Familiarize with ESMValTool
Synchronize expectations
What is ESMValTool?
This tutorial is a first introduction to ESMValTool. Before diving into the technical steps, let’s talk about what ESMValTool is all about.
What is ESMValTool?
What do you already know about or expect from ESMValTool?
ESMValTool is…
EMSValTool is many things, but in this tutorial we will focus on the following traits:
✓ A tool to analyse climate data
✓ A collection of diagnostics for reproducible climate science
✓ A community effort
A tool to analyse climate data
ESMValTool takes care of finding, opening, checking, fixing, concatenating, and preprocessing CMIP data and several other supported datasets.
The central component of ESMValTool that we will see in this tutorial is the recipe. Any ESMValTool recipe is basically a set of instructions to reproduce a certain result. The basic structure of a recipe is as follows:
- Documentation with relevant (citation) information
- Datasets that should be analysed
- Preprocessor steps that must be applied
- Diagnostic scripts performing more specific evaluation steps
An example recipe could look like this:
documentation:
title: This is an example recipe.
description: Example recipe
authors:
- lastname_firstname
datasets:
- {dataset: HadGEM2-ES, project: CMIP5, exp: historical, mip: Amon,
ensemble: r1i1p1, start_year: 1960, end_year: 2005}
preprocessors:
global_mean:
area_statistics:
operator: mean
diagnostics:
hockeystick_plot:
description: plot of global mean temperature change
variables:
temperature:
short_name: tas
preprocessor: global_mean
scripts: hockeystick.py
Understanding the different section of the recipe
Try to figure out the meaning of the different dataset keys. Hint: they can be found in the documentation of ESMValTool.
Solution
The keys are explained in the ESMValTool documentation, in the
Recipe section
, under datasets
A collection of diagnostics for reproducible climate science
More than a tool, ESMValTool is a collection of publicly available recipes and diagnostic scripts. This makes it possible to easily reproduce important results.
Explore the available recipes
Go to the ESMValTool Documentation webpage and explore the
Available recipes
section. Which recipe(s) would you like to try?
A community effort
ESMValTool is built and maintained by an active community of scientists and software engineers. It is an open source project to which anyone can contribute. Many of the interactions take place on GitHub. Here, we briefly introduce you to some of the most important pages.
Meet the ESMValGroup
Go to github.com/ESMValGroup. This is the GitHub page of our ‘organization’. Have a look around. How many collaborators are there? Do you know any of them?
Near the top of the page there are 2 pinned repositories: ESMValTool and ESMValCore. Visit each of the repositories. How many people have contributed to each of them? Can you also find out how many people have contributed to this tutorial?
Issues and pull requests
Go back to the repository pages of ESMValTool or ESMValCore. There are tabs for ‘issues’ and ‘pull requests’. You can use the labels to navigate them a bit more. How many open issues are about enhancements of ESMValTool? And how many bugs have been fixed in ESMValCore? There is also an ‘insights’ tab, where you can see a summary of recent activity. How many issues have been opened and closed in the past month?
Conclusion
This concludes the introduction of the tutorial. You now have a basic knowledge of ESMValTool and its community. The following episodes will walk you through the installation, configuration and running your first recipes.
Key Points
ESMValTool provides a reliable interface to analyse and evaluate climate data
A large collection of recipes and diagnostic scripts is already available
ESMValTool is built and maintained by an active community of scientists and developers
Quickstart guide
Overview
Teaching: 2 min
Exercises: 8 min
Compatibility:Questions
What is the purpose of the quickstart guide?
How do I load and check the ESMValTool environment?
How do I configure ESMValTool?
How do I run a recipe?
Objectives
Understand the purpose of the quickstart guide
Load and check the ESMValTool environment
Configure ESMValTool
Run a recipe
What is the purpose of the quickstart guide?
- The purpose of the quickstart guide is to enable a user of ESMValTool to run ESMValTool as quickly as possible by making the bare minimum number of changes.
How do I load and check the ESMValTool environment?
For this quickstart guide, an assumption is made that ESMValTool has already been installed at the site where ESMValTool will be run. If this is not the case, see the Installation episode in this tutorial.
Load the ESMValTool environment by following the instructions at ESMValTool: Pre-installed versions on HPC clusters / other servers.
Check the ESMValTool environment by accessing the help for ESMValTool:
esmvaltool --help
How do I configure ESMValTool?
Create the ESMValTool user configuration file (the file is written by default to
~/.esmvaltool/config-user.yml
):esmvaltool config get_config_user
Edit the ESMValTool user configuration file using your favourite text editor to uncomment the lines relating to the site where ESMValTool will be run.
For more details about the ESMValTool user configuration file see the Configuration episode in this tutorial.
How do I run a recipe?
Run the example Python recipe:
esmvaltool run examples/recipe_python.yml
Wait for the recipe to complete. If the recipe completes successfully, the last line printed to screen at the end of the log will look something like:
YYYY-MM-DD HH:mm:SS, NNN UTC [NNNNN] INFO Run was successful
View the output of the recipe by opening the HTML file produced by ESMValTool (the location of this file is printed to screen near the end of the log):
YYYY-MM-DD HH:mm:SS, NNN UTC [NNNNN] INFO Wrote recipe output to: file:///$HOME/esmvaltool_output/recipe_python_<date>_<time>/index.html
For more details about running recipes see the Running your first recipe episode in this tutorial.
Key Points
The purpose of the quickstart guide is to enable a user of ESMValTool to run ESMValTool as quickly as possible without having to go through the whole tutorial
Use the
module load
command to load the ESMValTool environment, see the [Installation][lesson-installation] episode for more details and useesmvaltool --help
to check the ESMValTool environmentUse
esmvaltool config get_config_user
to create the ESMValTool user configuration fileUse
esmvaltool run <recipe>.yml
to run a recipe
Installation
Overview
Teaching: 10 min
Exercises: 10 min
Compatibility:Questions
What are the prerequisites for installing ESMValTool?
How do I confirm that the installation was successful?
Objectives
Install ESMValTool
Demonstrate that the installation was successful
Overview
The instructions help with the installation of ESMValTool on operating systems like Linux/MacOSX/Windows. We use the Mamba package manager to install the ESMValTool. Other installation methods are also available; they can be found in the documentation. We will first install Mamba, and then ESMValTool. We end this chapter by testing that the installation was successful.
Before we begin, here are all the possible ways in which you can use ESMValTool depending on your level of expertise or involvement with ESMValTool and associated software such as GitHub and Mamba.
- If you have access to a server where ESMValTool is already installed
as a module, for e.g., the CEDA JASMIN
server, you can simply load the module with the following command:
module load esmvaltool
After loading
esmvaltool
, we can start using ESMValTool right away. Please see the next lesson. - If you would like to install ESMValTool as a mamba package, then this lesson will tell you how!
- If you would like to start experimenting with existing diagnostics or contributing to ESMvalTool, please see the instructions for source installation in the lesson Development and contribution and in the documentation.
Install ESMValTool on Windows
ESMValTool does not directly support Windows, but successful usage has been reported through the Windows Subsystem for Linux(WSL), available in Windows 10. To install the WSL please follow the instructions on the Windows Documentation page. After installing the WSL, installation can be done using the same instructions for Linux/MacOSX.
Install ESMValTool on Linux/MacOSX
Install Mamba
ESMValTool is distributed using Mamba.
To install mamba on Linux
or MacOSX
, follow the instructions below:
-
Please download the installation file for the latest Mamba version here.
-
Next, run the installer from the place where you downloaded it:
On
Linux
:bash Mambaforge-Linux-x86_64.sh
On
MacOSX
:bash Mambaforge-MacOSX-x86_64.sh
-
Follow the instructions in the installer. The defaults should normally suffice.
-
You will need to restart your terminal for the changes to have effect.
-
We recommend updating mamba before the esmvaltool installation. To do so, run:
mamba update --name base mamba
-
Verify you have a working mamba installation by:
which mamba
This should show the path to your mamba executable, e.g.
~/mambaforge/bin/mamba
.
For more information about installing mamba, see the mamba installation documentation.
Install the ESMValTool package
The ESMValTool package contains diagnostics scripts in four languages: R, Python, Julia and NCL. This introduces a lot of dependencies, and therefore the installation can take quite long. It is, however, possible to install ‘subpackages’ for each of the languages. The following (sub)packages are available:
esmvaltool-python
esmvaltool-ncl
esmvaltool-r
esmvaltool
–> the complete package, i.e. the combination of the above.
For the tutorial, we will install the complete package. Thus, to install the ESMValTool package, run
mamba create --name esmvaltool esmvaltool
On MacOSX ESMValTool functionalities in Julia, NCL, and R are not supported. To install a Mamba environment on MacOSX, please refer to specific information.
This will create a new Mamba
environment
called esmvaltool
, with the ESMValTool package and all of its dependencies
installed in it.
Common issues
You find a list of common installation problems and their solutions in the documentation.
Install Julia
Some ESMValTool diagnostics are written in the Julia programming language. If you want a full installation of ESMValTool including Julia diagnostics, you need to make sure Julia is installed before installing ESMValTool.
In this tutorial, we will not use Julia, but for reference, we have listed the steps to install Julia below. Complete instructions for installing Julia can be found on the Julia installation page.
Julia installation instructions
First, open a bash terminal and activate the newly created
esmvaltool
environment.conda activate esmvaltool
Next, to install Julia via
mamba
, you can use the following command:mamba install julia
To check that the Julia executable can be found, run
which julia
to display the path to the Julia executable, it should be
~/mambaforge/envs/esmvaltool/bin/julia
To test that Julia is installed correctly, run
julia
to start the interactive Julia interpreter. Press
Ctrl+D
to exit.
Test that the installation was successful
To test that the installation was successful, run
conda activate esmvaltool
to activate the conda environment called esmvaltool
. In the shell prompt the
active conda environment should have been changed from (base)
to
(esmvaltool)
.
Next, run
esmvaltool --help
to display the command line help.
Version of ESMValTool
Can you figure out which version of ESMValTool has been installed?
Solution
The
esmvaltool --help
command listsversion
as a command to get the versionWhen you run
esmvaltool version
The version of ESMValTool installed should be displayed on the screen as:
ESMValCore: 2.11.0 ESMValTool: 2.11.0
Note that on HPC servers such as JASMIN, sometimes a more recent development version may be displayed for ESMValTool, for e.g.
ESMValTool: 2.11.0.dev71+g2c60b4d97
Key Points
All the required packages can be installed using mamba.
You can find more information about installation in the documentation.
Configuration
Overview
Teaching: 10 min
Exercises: 10 min
Compatibility:Questions
What is the user configuration file and how should I use it?
Objectives
Understand the contents of the user-config.yml file
Prepare a personalized user-config.yml file
Configure ESMValTool to use some settings
The configuration file
For the purposes of this tutorial, we will create a directory in our home directory
called esmvaltool_tutorial
and use that as our working directory. The following steps
should do that:
mkdir esmvaltool_tutorial
cd esmvaltool_tutorial
The config-user.yml
configuration file contains all the global level
information needed by ESMValTool to run.
This is a YAML file.
You can get the default configuration file by running:
esmvaltool config get_config_user --path=<target_dir>
The default configuration file will be downloaded to the directory specified with
the --path
variable. For instance, you can provide the path to your working directory
as the target_dir
. If this option is not used, the file will be saved to the default
location: ~/.esmvaltool/config-user.yml
, where ~
is the
path to your home directory. Note that files and directories starting with a
period are “hidden”, to see the .esmvaltool
directory in the terminal use
ls -la ~
. Note that if a configuration file by that name already exists in the default
location, the get_config_user
command will not update the file as ESMValTool will not
overwrite the file. You will have to move the file first if you want an updated copy of the
user configuration file.
We run a text editor called nano
to have a look inside the configuration file
and then modify it if needed:
nano ~/.esmvaltool/config-user.yml
Any other editor can be used, e.g.vim.
This file contains the information for:
- Output settings
- Destination directory
- Auxiliary data directory
- Number of tasks that can be run in parallel
- Rootpath to input data
- Directory structure for the data from different projects
Text editor side note
No matter what editor you use, you will need to know where it searches for and saves files. If you start it from the shell, it will (probably) use your current working directory as its default location. We use
nano
in examples here because it is one of the least complex text editors. Press ctrl + O to save the file, and then ctrl + X to exitnano
.
Output settings
The configuration file starts with output settings that
inform ESMValTool about your preference for output.
You can turn on or off the setting by true
or false
values. Most of these settings are fairly self-explanatory.
Saving preprocessed data
Later in this tutorial, we will want to look at the contents of the
preproc
folder. This folder contains preprocessed data and is removed by default when ESMValTool is run. In the configuration file, which settings can be modified to prevent this from happening?Solution
If the option
remove_preproc_dir
is set tofalse
, then thepreproc/
directory contains all the pre-processed data and the metadata interface files. If the optionsave_intermediary_cubes
is set totrue
then data will also be saved after each preprocessor step in the folderpreproc
. Note that saving all intermediate results to file will result in a considerable slowdown, and can quickly fill your disk.
Destination directory
The destination directory is the rootpath where ESMValTool will store its output folders containing e.g. figures, data, logs, etc. With every run, ESMValTool automatically generates a new output folder determined by recipe name, and date and time using the format: YYYYMMDD_HHMMSS.
Set the destination directory
Let’s name our destination directory
esmvaltool_output
in the working directory. ESMValTool should write the output to this path, so make sure you have the disk space to write output to this directory. How do we set this in theconfig-user.yml
?Solution
We use
output_dir
entry in theconfig-user.yml
file as:output_dir: ./esmvaltool_output
If the
esmvaltool_output
does not exist, ESMValTool will generate it for you.
Rootpath to input data
ESMValTool uses several categories (in ESMValTool, this is referred to as projects) for input data based on their source. The current categories in the configuration file are mentioned below. For example, CMIP is used for a dataset from the Climate Model Intercomparison Project whereas OBS may be used for an observational dataset. More information about the projects used in ESMValTool is available in the documentation. When using ESMValTool on your own machine, you can create a directory to download climate model data or observation data sets and let the tool use data from there. It is also possible to ask ESMValTool to download climate model data as needed. This can be done by specifying a download directory and by setting the option to download data as shown below.
# Directory for storing downloaded climate data
download_dir: ~/climate_data
search_esgf: always
If you are working offline or do not want to download the data then set the
option above to never
. If you want to download data only when the necessary files
are missing at the usual location, you can set the option to when_missing
.
The rootpath
specifies the directories where ESMValTool will look for input data.
For each category, you can define either one path or several paths as a list. For example:
rootpath:
CMIP5: [~/cmip5_inputpath1, ~/cmip5_inputpath2]
OBS: ~/obs_inputpath
RAWOBS: ~/rawobs_inputpath
default: ~/climate_data
These are typically available in the default configuration file you downloaded, so simply removing the machine specific lines should be sufficient to access input data.
Set the correct rootpath
In this tutorial, we will work with data from CMIP5 and CMIP6. How can we modify the
rootpath
to make sure the data path is set correctly for both CMIP5 and CMIP6? Note: to get the data, check the instructions in Setup.Solution
- Are you working on your own local machine? You need to add the root path of the folder where the data is available to the
config-user.yml
file as:rootpath: ... CMIP5: ~/esmvaltool_tutorial/data CMIP6: ~/esmvaltool_tutorial/data
- Are you working on your local machine and have downloaded data using ESMValTool? You need to add the root path of the folder where the data has been downloaded to as specified in the
download_dir
.rootpath: ... CMIP5: ~/climate_data CMIP6: ~/climate_data
- Are you working on a computer cluster like Jasmin or DKRZ? Site-specific path to the data for JASMIN/DKRZ/ETH/IPSL are already listed at the end of the
config-user.yml
file. You need to uncomment the related lines. For example, on JASMIN:auxiliary_data_dir: /gws/nopw/j04/esmeval/aux_data/AUX rootpath: CMIP6: /badc/cmip6/data/CMIP6 CMIP5: /badc/cmip5/data/cmip5/output1 OBS: /gws/nopw/j04/esmeval/obsdata-v2 OBS6: /gws/nopw/j04/esmeval/obsdata-v2 obs4MIPs: /gws/nopw/j04/esmeval/obsdata-v2 ana4mips: /gws/nopw/j04/esmeval/obsdata-v2 default: /gws/nopw/j04/esmeval/obsdata-v2
- For more information about setting the rootpath, see also the ESMValTool documentation.
Directory structure for the data from different projects
Input data can be from various models, observations and reanalysis data that
adhere to the CF/CMOR standard. The drs
setting
describes the file structure.
The drs
setting describes the file structure for several projects (e.g.
CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines
(e.g. BADC, CP4CDS, DKRZ, ETHZ, SMHI, BSC). For more
information about drs
, you can visit the ESMValTool documentation on
Data Reference Syntax (DRS).
Set the correct drs
In this lesson, we will work with data from CMIP5 and CMIP6. How can we set the correct
drs
?Solution
- Are you working on your own local machine? You need to set the
drs
of the data in theconfig-user.yml
file as:drs: CMIP5: default CMIP6: default
- Are you asking ESMValTool to download the data for use with your diagnostics? You need to set the
drs
of the data in theconfig-user.yml
file as:drs: CMIP5: ESGF CMIP6: ESGF CORDEX: ESGF obs4MIPs: ESGF
- Are you working on a computer cluster like Jasmin or DKRZ? Site-specific
drs
of the data are already listed at the end of theconfig-user.yml
file. You need to uncomment the related lines. For example, on Jasmin:# Site-specific entries: Jasmin # Uncomment the lines below to locate data on JASMIN drs: CMIP6: BADC CMIP5: BADC OBS: default OBS6: default obs4mips: default ana4mips: default
Explain the default drs (if working on local machine)
- In the previous exercise, we set the
drs
of CMIP5 data todefault
. Can you explain why?- Have a look at the directory structure of the
OBS
data. There is a folder calledTier1
. What does it mean?Solution
drs: default
is one way to retrieve data from a ROOT directory that has no DRS-like structure.default
indicates that all the files are in a folder without any structure.Observational data are organized in Tiers depending on their level of public availability. Therefore the default directory must be structured accordingly with sub-directories
TierX
e.g. Tier1, Tier2 or Tier3, even whendrs: default
. More details can be found in the documentation.
Other settings
Auxiliary data directory
The
auxiliary_data_dir
setting is the path where any required additional auxiliary data files are stored. This location allows us to tell the diagnostic script where to find the files if they can not be downloaded at runtime. This option should not be used for model or observational datasets, but for data files (e.g. shape files) used in plotting such as coastline descriptions and if you want to feed some additional data (e.g. shape files) to your recipe.auxiliary_data_dir: ~/auxiliary_data
See more information in ESMValTool document.
Number of parallel tasks
This option enables you to perform parallel processing. You can choose the number of tasks in parallel as 1/2/3/4/… or you can set it to
null
. That tells ESMValTool to use the maximum number of available CPUs. For the purpose of the tutorial, please set ESMValTool use only 1 cpu:max_parallel_tasks: 1
In general, if you run out of memory, try setting
max_parallel_tasks
to 1. Then, check the amount of memory you need for that by inspecting the filerun/resource_usage.txt
in the output directory. Using the number there you can increase the number of parallel tasks again to a reasonable number for the amount of memory available in your system.
Make your own configuration file
It is possible to have several configuration files with different purposes, for example: config-user_formalised_runs.yml, config-user_debugging.yml. In this case, you have to pass the path of your own configuration file as a command-line option when running the ESMValTool. We will learn how to do this in the next lesson.
Key Points
The
config-user.yml
tells ESMValTool where to find input data.
output_dir
defines the destination directory.
rootpath
defines the root path of the data.
drs
defines the directory structure of the data.
Running your first recipe
Overview
Teaching: 15 min
Exercises: 15 min
Compatibility:Questions
How to run a recipe?
What happens when I run a recipe?
Objectives
Run an existing ESMValTool recipe
Examine the log information
Navigate the output created by ESMValTool
Make small adjustments to an existing recipe
This episode describes how ESMValTool recipes work, how to run a recipe and how to explore the recipe output. By the end of this episode, you should be able to run your first recipe, look at the recipe output, and make small modifications.
Running an existing recipe
The recipe format has briefly been introduced in the Introduction episode. To see all the recipes that are shipped with ESMValTool, type
esmvaltool recipes list
We will start by running examples/recipe_python.yml
esmvaltool run examples/recipe_python.yml
or if you have the user configuration file in your current directory then
esmvaltool run --config_file ./config-user.yml examples/recipe_python.yml
If everything is okay, you should see that ESMValTool is printing a lot of output to the command line. The final message should be “Run was successful”. The exact output varies depending on your machine, but it should look something like the example log output on terminal below.
Example output
2024-05-15 07:04:08,041 UTC [134535] INFO ______________________________________________________________________ _____ ____ __ ____ __ _ _____ _ | ____/ ___|| \/ \ \ / /_ _| |_ _|__ ___ | | | _| \___ \| |\/| |\ \ / / _` | | | |/ _ \ / _ \| | | |___ ___) | | | | \ V / (_| | | | | (_) | (_) | | |_____|____/|_| |_| \_/ \__,_|_| |_|\___/ \___/|_| ______________________________________________________________________ ESMValTool - Earth System Model Evaluation Tool. http://www.esmvaltool.org CORE DEVELOPMENT TEAM AND CONTACTS: Birgit Hassler (Co-PI; DLR, Germany - birgit.hassler@dlr.de) Alistair Sellar (Co-PI; Met Office, UK - alistair.sellar@metoffice.gov.uk) Bouwe Andela (Netherlands eScience Center, The Netherlands - b.andela@esciencecenter.nl) Lee de Mora (PML, UK - ledm@pml.ac.uk) Niels Drost (Netherlands eScience Center, The Netherlands - n.drost@esciencecenter.nl) Veronika Eyring (DLR, Germany - veronika.eyring@dlr.de) Bettina Gier (UBremen, Germany - gier@uni-bremen.de) Remi Kazeroni (DLR, Germany - remi.kazeroni@dlr.de) Nikolay Koldunov (AWI, Germany - nikolay.koldunov@awi.de) Axel Lauer (DLR, Germany - axel.lauer@dlr.de) Saskia Loosveldt-Tomas (BSC, Spain - saskia.loosveldt@bsc.es) Ruth Lorenz (ETH Zurich, Switzerland - ruth.lorenz@env.ethz.ch) Benjamin Mueller (LMU, Germany - b.mueller@iggf.geo.uni-muenchen.de) Valeriu Predoi (URead, UK - valeriu.predoi@ncas.ac.uk) Mattia Righi (DLR, Germany - mattia.righi@dlr.de) Manuel Schlund (DLR, Germany - manuel.schlund@dlr.de) Breixo Solino Fernandez (DLR, Germany - breixo.solinofernandez@dlr.de) Javier Vegas-Regidor (BSC, Spain - javier.vegas@bsc.es) Klaus Zimmermann (SMHI, Sweden - klaus.zimmermann@smhi.se) For further help, please read the documentation at http://docs.esmvaltool.org. Have fun! 2024-05-15 07:04:08,044 UTC [134535] INFO Package versions 2024-05-15 07:04:08,044 UTC [134535] INFO ---------------- 2024-05-15 07:04:08,044 UTC [134535] INFO ESMValCore: 2.10.0 2024-05-15 07:04:08,044 UTC [134535] INFO ESMValTool: 2.10.0 2024-05-15 07:04:08,044 UTC [134535] INFO ---------------- 2024-05-15 07:04:08,044 UTC [134535] INFO Using config file /pfs/lustrep1/users/username/esmvaltool_tutorial/config-user.yml 2024-05-15 07:04:08,044 UTC [134535] INFO Writing program log files to: /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run/main_log.txt /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run/main_log_debug.txt 2024-05-15 07:04:08,503 UTC [134535] INFO Using default ESGF configuration, configuration file /users/username/.esmvaltool/esgf-pyclient.yml not present. 2024-05-15 07:04:08,504 UTC [134535] WARNING ESGF credentials missing, only data that is accessible without logging in will be available. See https://esgf.github.io/esgf-user-support/user_guide.html for instructions on how to create an account if you do not have one yet. Next, configure your system so esmvaltool can use your credentials. This can be done using the keyring package, or you can just enter them in /users/username/.esmvaltool/esgf-pyclient.yml. keyring ======= First install the keyring package (requires a supported backend, see https://pypi.org/project/keyring/): $ pip install keyring Next, set your username and password by running the commands: $ keyring set ESGF hostname $ keyring set ESGF username $ keyring set ESGF password To check that you entered your credentials correctly, run: $ keyring get ESGF hostname $ keyring get ESGF username $ keyring get ESGF password configuration file ================== You can store the hostname, username, and password or your OpenID account in a plain text in the file /users/username/.esmvaltool/esgf-pyclient.yml like this: logon: hostname: "your-hostname" username: "your-username" password: "your-password" or your can configure an interactive log in: logon: interactive: true Note that storing your password in plain text in the configuration file is less secure. On shared systems, make sure the permissions of the file are set so only you can read it, i.e. $ ls -l /users/username/.esmvaltool/esgf-pyclient.yml shows permissions -rw-------. 2024-05-15 07:04:09,067 UTC [134535] INFO Starting the Earth System Model Evaluation Tool at time: 2024-05-15 07:04:09 UTC 2024-05-15 07:04:09,068 UTC [134535] INFO ---------------------------------------------------------------------- 2024-05-15 07:04:09,068 UTC [134535] INFO RECIPE = /LUMI_TYKKY_D1Npoag/miniconda/envs/env1/lib/python3.11/site-packages/esmvaltool/recipes/examples/recipe_python.yml 2024-05-15 07:04:09,068 UTC [134535] INFO RUNDIR = /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run 2024-05-15 07:04:09,069 UTC [134535] INFO WORKDIR = /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/work 2024-05-15 07:04:09,069 UTC [134535] INFO PREPROCDIR = /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/preproc 2024-05-15 07:04:09,069 UTC [134535] INFO PLOTDIR = /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/plots 2024-05-15 07:04:09,069 UTC [134535] INFO ---------------------------------------------------------------------- 2024-05-15 07:04:09,069 UTC [134535] INFO Running tasks using at most 256 processes 2024-05-15 07:04:09,069 UTC [134535] INFO If your system hangs during execution, it may not have enough memory for keeping this number of tasks in memory. 2024-05-15 07:04:09,070 UTC [134535] INFO If you experience memory problems, try reducing 'max_parallel_tasks' in your user configuration file. 2024-05-15 07:04:09,070 UTC [134535] WARNING Using the Dask basic scheduler. This may lead to slow computations and out-of-memory errors. Note that the basic scheduler may still be the best choice for preprocessor functions that are not lazy. In that case, you can safely ignore this warning. See https://docs.esmvaltool.org/projects/ESMValCore/en/latest/quickstart/configure.html#dask-distributed-configuration for more information. 2024-05-15 07:04:09,113 UTC [134535] WARNING 'default' rootpaths '/users/username/climate_data' set in config-user.yml do not exist 2024-05-15 07:04:10,648 UTC [134535] INFO Creating tasks from recipe 2024-05-15 07:04:10,648 UTC [134535] INFO Creating tasks for diagnostic map 2024-05-15 07:04:10,648 UTC [134535] INFO Creating diagnostic task map/script1 2024-05-15 07:04:10,649 UTC [134535] INFO Creating preprocessor task map/tas 2024-05-15 07:04:10,649 UTC [134535] INFO Creating preprocessor 'to_degrees_c' task for variable 'tas' 2024-05-15 07:04:11,066 UTC [134535] INFO Found input files for Dataset: tas, Amon, CMIP6, BCC-ESM1, CMIP, historical, r1i1p1f1, gn, v20181214 2024-05-15 07:04:11,405 UTC [134535] INFO Found input files for Dataset: tas, Amon, CMIP5, bcc-csm1-1, historical, r1i1p1, v1 2024-05-15 07:04:11,406 UTC [134535] INFO PreprocessingTask map/tas created. 2024-05-15 07:04:11,406 UTC [134535] INFO Creating tasks for diagnostic timeseries 2024-05-15 07:04:11,406 UTC [134535] INFO Creating diagnostic task timeseries/script1 2024-05-15 07:04:11,406 UTC [134535] INFO Creating preprocessor task timeseries/tas_amsterdam 2024-05-15 07:04:11,406 UTC [134535] INFO Creating preprocessor 'annual_mean_amsterdam' task for variable 'tas_amsterdam' 2024-05-15 07:04:11,428 UTC [134535] INFO Found input files for Dataset: tas, Amon, CMIP6, BCC-ESM1, CMIP, historical, r1i1p1f1, gn, v20181214 2024-05-15 07:04:11,452 UTC [134535] INFO Found input files for Dataset: tas, Amon, CMIP5, bcc-csm1-1, historical, r1i1p1, v1 2024-05-15 07:04:11,455 UTC [134535] INFO PreprocessingTask timeseries/tas_amsterdam created. 2024-05-15 07:04:11,455 UTC [134535] INFO Creating preprocessor task timeseries/tas_global 2024-05-15 07:04:11,455 UTC [134535] INFO Creating preprocessor 'annual_mean_global' task for variable 'tas_global' 2024-05-15 07:04:11,814 UTC [134535] INFO Found input files for Dataset: tas, Amon, CMIP6, BCC-ESM1, CMIP, historical, r1i1p1f1, gn, v20181214, supplementaries: areacella, fx, 1pctCO2, v20190613 2024-05-15 07:04:12,184 UTC [134535] INFO Found input files for Dataset: tas, Amon, CMIP5, bcc-csm1-1, historical, r1i1p1, v1, supplementaries: areacella, fx, r0i0p0 2024-05-15 07:04:12,186 UTC [134535] INFO PreprocessingTask timeseries/tas_global created. 2024-05-15 07:04:12,187 UTC [134535] INFO These tasks will be executed: timeseries/script1, timeseries/tas_global, map/script1, map/tas, timeseries/tas_amsterdam 2024-05-15 07:04:12,204 UTC [134535] INFO Wrote recipe with version numbers and wildcards to: file:///users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run/recipe_python_filled.yml 2024-05-15 07:04:12,204 UTC [134535] INFO Will download 129.2 MB Will download the following files: 50.85 KB ESGFFile:CMIP6/CMIP/BCC/BCC-ESM1/1pctCO2/r1i1p1f1/fx/areacella/gn/v20190613/areacella_fx_BCC-ESM1_1pctCO2_r1i1p1f1_gn.nc on hosts ['aims3.llnl.gov', 'cmip.bcc.cma.cn', 'esgf-data04.diasjp.net', 'esgf.nci.org.au', 'esgf3.dkrz.de'] 64.95 MB ESGFFile:CMIP6/CMIP/BCC/BCC-ESM1/historical/r1i1p1f1/Amon/tas/gn/v20181214/tas_Amon_BCC-ESM1_historical_r1i1p1f1_gn_185001-201412.nc on hosts ['aims3.llnl.gov', 'cmip.bcc.cma.cn', 'esgf-data04.diasjp.net', 'esgf.ceda.ac.uk', 'esgf.nci.org.au', 'esgf3.dkrz.de'] 44.4 KB ESGFFile:cmip5/output1/BCC/bcc-csm1-1/historical/fx/atmos/fx/r0i0p0/v1/areacella_fx_bcc-csm1-1_historical_r0i0p0.nc on hosts ['aims3.llnl.gov', 'esgf.ceda.ac.uk', 'esgf2.dkrz.de'] 64.15 MB ESGFFile:cmip5/output1/BCC/bcc-csm1-1/historical/mon/atmos/Amon/r1i1p1/v1/tas_Amon_bcc-csm1-1_historical_r1i1p1_185001-201212.nc on hosts ['aims3.llnl.gov', 'esgf.ceda.ac.uk', 'esgf2.dkrz.de'] Downloading 129.2 MB.. 2024-05-15 07:04:14,074 UTC [134535] INFO Downloaded /users/username/climate_data/cmip5/output1/BCC/bcc-csm1-1/historical/fx/atmos/fx/r0i0p0/v1/areacella_fx_bcc-csm1-1_historical_r0i0p0.nc (44.4 KB) in 1.84 seconds (24.09 KB/s) from aims3.llnl.gov 2024-05-15 07:04:14,109 UTC [134535] INFO Downloaded /users/username/climate_data/CMIP6/CMIP/BCC/BCC-ESM1/1pctCO2/r1i1p1f1/fx/areacella/gn/v20190613/areacella_fx_BCC-ESM1_1pctCO2_r1i1p1f1_gn.nc (50.85 KB) in 1.88 seconds (27 KB/s) from aims3.llnl.gov 2024-05-15 07:04:20,505 UTC [134535] INFO Downloaded /users/username/climate_data/CMIP6/CMIP/BCC/BCC-ESM1/historical/r1i1p1f1/Amon/tas/gn/v20181214/tas_Amon_BCC-ESM1_historical_r1i1p1f1_gn_185001-201412.nc (64.95 MB) in 8.27 seconds (7.85 MB/s) from aims3.llnl.gov 2024-05-15 07:04:25,862 UTC [134535] INFO Downloaded /users/username/climate_data/cmip5/output1/BCC/bcc-csm1-1/historical/mon/atmos/Amon/r1i1p1/v1/tas_Amon_bcc-csm1-1_historical_r1i1p1_185001-201212.nc (64.15 MB) in 13.63 seconds (4.71 MB/s) from aims3.llnl.gov 2024-05-15 07:04:25,870 UTC [134535] INFO Downloaded 129.2 MB in 13.67 seconds (9.45 MB/s) 2024-05-15 07:04:25,870 UTC [134535] INFO Successfully downloaded all requested files. 2024-05-15 07:04:25,871 UTC [134535] INFO Using the Dask basic scheduler. 2024-05-15 07:04:25,871 UTC [134535] INFO Running 5 tasks using 5 processes 2024-05-15 07:04:25,956 UTC [144507] INFO Starting task map/tas in process [144507] 2024-05-15 07:04:25,956 UTC [144522] INFO Starting task timeseries/tas_amsterdam in process [144522] 2024-05-15 07:04:25,957 UTC [144534] INFO Starting task timeseries/tas_global in process [144534] 2024-05-15 07:04:26,049 UTC [134535] INFO Progress: 3 tasks running, 2 tasks waiting for ancestors, 0/5 done 2024-05-15 07:04:26,457 UTC [144534] WARNING Long name changed from 'Grid-Cell Area for Atmospheric Variables' to 'Grid-Cell Area for Atmospheric Grid Variables' (for file /users/username/climate_data/CMIP6/CMIP/BCC/BCC-ESM1/1pctCO2/r1i1p1f1/fx/areacella/gn/v20190613/areacella_fx_BCC-ESM1_1pctCO2_r1i1p1f1_gn.nc) 2024-05-15 07:04:26,461 UTC [144507] WARNING /LUMI_TYKKY_D1Npoag/miniconda/envs/env1/lib/python3.11/site-packages/iris/fileformats/netcdf/saver.py:2670: IrisDeprecation: Saving to netcdf with legacy-style attribute handling for backwards compatibility. This mode is deprecated since Iris 3.8, and will eventually be removed. Please consider enabling the new split-attributes handling mode, by setting 'iris.FUTURE.save_split_attrs = True'. warn_deprecated(message) 2024-05-15 07:04:26,856 UTC [144522] INFO Extracting data for Amsterdam, Noord-Holland, Nederland (52.3730796 °N, 4.8924534 °E) 2024-05-15 07:04:27,081 UTC [144507] WARNING /LUMI_TYKKY_D1Npoag/miniconda/envs/env1/lib/python3.11/site-packages/iris/fileformats/netcdf/saver.py:2670: IrisDeprecation: Saving to netcdf with legacy-style attribute handling for backwards compatibility. This mode is deprecated since Iris 3.8, and will eventually be removed. Please consider enabling the new split-attributes handling mode, by setting 'iris.FUTURE.save_split_attrs = True'. warn_deprecated(message) 2024-05-15 07:04:27,085 UTC [144534] WARNING /LUMI_TYKKY_D1Npoag/miniconda/envs/env1/lib/python3.11/site-packages/iris/fileformats/netcdf/saver.py:2670: IrisDeprecation: Saving to netcdf with legacy-style attribute handling for backwards compatibility. This mode is deprecated since Iris 3.8, and will eventually be removed. Please consider enabling the new split-attributes handling mode, by setting 'iris.FUTURE.save_split_attrs = True'. warn_deprecated(message) 2024-05-15 07:04:40,666 UTC [144507] INFO Successfully completed task map/tas (priority 1) in 0:00:14.709864 2024-05-15 07:04:40,805 UTC [134535] INFO Progress: 2 tasks running, 2 tasks waiting for ancestors, 1/5 done 2024-05-15 07:04:40,813 UTC [144547] INFO Starting task map/script1 in process [144547] 2024-05-15 07:04:40,821 UTC [144547] INFO Running command ['/LUMI_TYKKY_D1Npoag/miniconda/envs/env1/bin/python', '/LUMI_TYKKY_D1Npoag/miniconda/envs/env1/lib/python3.11/site-packages/esmvaltool/diag_scripts/examples/diagnostic.py', '/users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run/map/script1/settings.yml'] 2024-05-15 07:04:40,822 UTC [144547] INFO Writing output to /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/work/map/script1 2024-05-15 07:04:40,822 UTC [144547] INFO Writing plots to /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/plots/map/script1 2024-05-15 07:04:40,822 UTC [144547] INFO Writing log to /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run/map/script1/log.txt 2024-05-15 07:04:40,822 UTC [144547] INFO To re-run this diagnostic script, run: cd /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run/map/script1; MPLBACKEND="Agg" /LUMI_TYKKY_D1Npoag/miniconda/envs/env1/bin/python /LUMI_TYKKY_D1Npoag/miniconda/envs/env1/lib/python3.11/site-packages/esmvaltool/diag_scripts/examples/diagnostic.py /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run/map/script1/settings.yml 2024-05-15 07:04:40,906 UTC [134535] INFO Progress: 3 tasks running, 1 tasks waiting for ancestors, 1/5 done 2024-05-15 07:04:47,225 UTC [144522] INFO Extracting data for Amsterdam, Noord-Holland, Nederland (52.3730796 °N, 4.8924534 °E) 2024-05-15 07:04:47,308 UTC [144534] WARNING /LUMI_TYKKY_D1Npoag/miniconda/envs/env1/lib/python3.11/site-packages/iris/fileformats/netcdf/saver.py:2670: IrisDeprecation: Saving to netcdf with legacy-style attribute handling for backwards compatibility. This mode is deprecated since Iris 3.8, and will eventually be removed. Please consider enabling the new split-attributes handling mode, by setting 'iris.FUTURE.save_split_attrs = True'. warn_deprecated(message) 2024-05-15 07:04:47,697 UTC [144534] INFO Successfully completed task timeseries/tas_global (priority 4) in 0:00:21.738941 2024-05-15 07:04:47,845 UTC [134535] INFO Progress: 2 tasks running, 1 tasks waiting for ancestors, 2/5 done 2024-05-15 07:04:48,053 UTC [144522] INFO Generated PreprocessorFile: /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/preproc/timeseries/tas_amsterdam/MultiModelMean_historical_Amon_tas_1850-2000.nc 2024-05-15 07:04:48,058 UTC [144522] WARNING /LUMI_TYKKY_D1Npoag/miniconda/envs/env1/lib/python3.11/site-packages/iris/fileformats/netcdf/saver.py:2670: IrisDeprecation: Saving to netcdf with legacy-style attribute handling for backwards compatibility. This mode is deprecated since Iris 3.8, and will eventually be removed. Please consider enabling the new split-attributes handling mode, by setting 'iris.FUTURE.save_split_attrs = True'. warn_deprecated(message) 2024-05-15 07:04:48,228 UTC [144522] INFO Successfully completed task timeseries/tas_amsterdam (priority 3) in 0:00:22.271045 2024-05-15 07:04:48,346 UTC [134535] INFO Progress: 1 tasks running, 1 tasks waiting for ancestors, 3/5 done 2024-05-15 07:04:48,358 UTC [144558] INFO Starting task timeseries/script1 in process [144558] 2024-05-15 07:04:48,364 UTC [144558] INFO Running command ['/LUMI_TYKKY_D1Npoag/miniconda/envs/env1/bin/python', '/LUMI_TYKKY_D1Npoag/miniconda/envs/env1/lib/python3.11/site-packages/esmvaltool/diag_scripts/examples/diagnostic.py', '/users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run/timeseries/script1/settings.yml'] 2024-05-15 07:04:48,365 UTC [144558] INFO Writing output to /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/work/timeseries/script1 2024-05-15 07:04:48,365 UTC [144558] INFO Writing plots to /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/plots/timeseries/script1 2024-05-15 07:04:48,365 UTC [144558] INFO Writing log to /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run/timeseries/script1/log.txt 2024-05-15 07:04:48,365 UTC [144558] INFO To re-run this diagnostic script, run: cd /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run/timeseries/script1; MPLBACKEND="Agg" /LUMI_TYKKY_D1Npoag/miniconda/envs/env1/bin/python /LUMI_TYKKY_D1Npoag/miniconda/envs/env1/lib/python3.11/site-packages/esmvaltool/diag_scripts/examples/diagnostic.py /users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run/timeseries/script1/settings.yml 2024-05-15 07:04:48,447 UTC [134535] INFO Progress: 2 tasks running, 0 tasks waiting for ancestors, 3/5 done 2024-05-15 07:04:54,019 UTC [144547] INFO Maximum memory used (estimate): 0.4 GB 2024-05-15 07:04:54,021 UTC [144547] INFO Sampled every second. It may be inaccurate if short but high spikes in memory consumption occur. 2024-05-15 07:04:55,174 UTC [144547] INFO Successfully completed task map/script1 (priority 0) in 0:00:14.360271 2024-05-15 07:04:55,366 UTC [144558] INFO Maximum memory used (estimate): 0.4 GB 2024-05-15 07:04:55,368 UTC [144558] INFO Sampled every second. It may be inaccurate if short but high spikes in memory consumption occur. 2024-05-15 07:04:55,566 UTC [134535] INFO Progress: 1 tasks running, 0 tasks waiting for ancestors, 4/5 done 2024-05-15 07:04:56,958 UTC [144558] INFO Successfully completed task timeseries/script1 (priority 2) in 0:00:08.599797 2024-05-15 07:04:57,072 UTC [134535] INFO Progress: 0 tasks running, 0 tasks waiting for ancestors, 5/5 done 2024-05-15 07:04:57,072 UTC [134535] INFO Successfully completed all tasks. 2024-05-15 07:04:57,134 UTC [134535] INFO Wrote recipe with version numbers and wildcards to: file:///users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/run/recipe_python_filled.yml 2024-05-15 07:04:57,399 UTC [134535] INFO Wrote recipe output to: file:///users/username/esmvaltool_tutorial/esmvaltool_output/recipe_python_20240515_070408/index.html 2024-05-15 07:04:57,399 UTC [134535] INFO Ending the Earth System Model Evaluation Tool at time: 2024-05-15 07:04:57 UTC 2024-05-15 07:04:57,400 UTC [134535] INFO Time for running the recipe was: 0:00:48.332409 2024-05-15 07:04:57,756 UTC [134535] INFO Maximum memory used (estimate): 2.5 GB 2024-05-15 07:04:57,757 UTC [134535] INFO Sampled every second. It may be inaccurate if short but high spikes in memory consumption occur. 2024-05-15 07:04:57,759 UTC [134535] INFO Removing `preproc` directory containing preprocessed data 2024-05-15 07:04:57,759 UTC [134535] INFO If this data is further needed, then set `remove_preproc_dir` to `false` in your user configuration file 2024-05-15 07:04:57,782 UTC [134535] INFO Run was successful
Pro tip: ESMValTool search paths
You might wonder how ESMValTool was able find the recipe file, even though it’s not in your working directory. All the recipe paths printed from
esmvaltool recipes list
are relative to ESMValTool’s installation location. This is where ESMValTool will look if it cannot find the file by following the path from your working directory.
Investigating the log messages
Let’s dissect what’s happening here.
Output files and directories
After the banner and general information, the output starts with some important locations.
- Did ESMValTool use the right config file?
- What is the path to the example recipe?
- What is the main output folder generated by ESMValTool?
- Can you guess what the different output directories are for?
- ESMValTool creates two log files. What is the difference?
Answers
- The config file should be the one we edited in the previous episode, something like
/home/<username>/.esmvaltool/config-user.yml
or~/esmvaltool_tutorial/config-user.yml
.- ESMValTool found the recipe in its installation directory, something like
/home/users/username/mambaforge/envs/esmvaltool/bin/esmvaltool/recipes/examples/
or if you are using a pre-installed module on a server, something like/apps/jasmin/community/esmvaltool/ESMValTool_<version> /esmvaltool/recipes/examples/recipe_python.yml
, where<version>
is the latest release.- ESMValTool creates a time-stamped output directory for every run. In this case, it should be something like
recipe_python_YYYYMMDD_HHMMSS
. This folder is made inside the output directory specified in the previous episode:~/esmvaltool_tutorial/esmvaltool_output
.- There should be four output folders:
plots/
: this is where output figures are stored.preproc/
: this is where pre-processed data are stored.run/
: this is where esmvaltool stores general information about the run, such as log messages and a copy of the recipe file.work/
: this is where output files (not figures) are stored.- The log files are:
main_log.txt
is a copy of the command-line outputmain_log_debug.txt
contains more detailed information that may be useful for debugging.
Debugging: No ‘preproc’ directory?
If you’re missing the preproc directory, then your
config-user.yml
file has the valueremove_preproc_dir
set totrue
(this is used to save disk space). Please set this value tofalse
and run the recipe again.
After the output locations, there are two main sections that can be distinguished in the log messages:
- Creating tasks
- Executing tasks
Analyse the tasks
List all the tasks that ESMValTool is executing for this recipe. Can you guess what this recipe does?
Answer
Just after all the ‘creating tasks’ and before ‘executing tasks’, we find the following line in the output:
[134535] INFO These tasks will be executed: map/tas, timeseries/tas_global, timeseries/script1, map/script1, timeseries/tas_amsterdam
So there are three tasks related to timeseries: global temperature, Amsterdam temperature, and a script (tas: near-surface air temperature). And then there are two tasks related to a map: something with temperature, and again a script.
Examining the recipe file
To get more insight into what is happening, we will have a look at the recipe file itself. Use the following command to copy the recipe to your working directory
esmvaltool recipes get examples/recipe_python.yml
Now you should see the recipe file in your working directory (type ls
to
verify). Use the nano
editor to open this file:
nano recipe_python.yml
For reference, you can also view the recipe by unfolding the box below.
recipe_python.yml
# ESMValTool # recipe_python.yml # # See https://docs.esmvaltool.org/en/latest/recipes/recipe_examples.html # for a description of this recipe. # # See https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/overview.html # for a description of the recipe format. --- documentation: description: | Example recipe that plots a map and timeseries of temperature. title: Recipe that runs an example diagnostic written in Python. authors: - andela_bouwe - righi_mattia maintainer: - schlund_manuel references: - acknow_project projects: - esmval - c3s-magic datasets: - {dataset: BCC-ESM1, project: CMIP6, exp: historical, ensemble: r1i1p1f1, grid: gn} - {dataset: bcc-csm1-1, project: CMIP5, exp: historical, ensemble: r1i1p1} preprocessors: # See https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/preprocessor.html # for a description of the preprocessor functions. to_degrees_c: convert_units: units: degrees_C annual_mean_amsterdam: extract_location: location: Amsterdam scheme: linear annual_statistics: operator: mean multi_model_statistics: statistics: - mean span: overlap convert_units: units: degrees_C annual_mean_global: area_statistics: operator: mean annual_statistics: operator: mean convert_units: units: degrees_C diagnostics: map: description: Global map of temperature in January 2000. themes: - phys realms: - atmos variables: tas: mip: Amon preprocessor: to_degrees_c timerange: 2000/P1M caption: | Global map of {long_name} in January 2000 according to {dataset}. scripts: script1: script: examples/diagnostic.py quickplot: plot_type: pcolormesh cmap: Reds timeseries: description: Annual mean temperature in Amsterdam and global mean since 1850. themes: - phys realms: - atmos variables: tas_amsterdam: short_name: tas mip: Amon preprocessor: annual_mean_amsterdam timerange: 1850/2000 caption: Annual mean {long_name} in Amsterdam according to {dataset}. tas_global: short_name: tas mip: Amon preprocessor: annual_mean_global timerange: 1850/2000 caption: Annual global mean {long_name} according to {dataset}. scripts: script1: script: examples/diagnostic.py quickplot: plot_type: plot
Do you recognize the basic recipe structure that was introduced in episode 1?
- Documentation with relevant (citation) information
- Datasets that should be analysed
- Preprocessors groups of common preprocessing steps
- Diagnostics scripts performing more specific evaluation steps
Analyse the recipe
Try to answer the following questions:
- Who wrote this recipe?
- Who should be approached if there is a problem with this recipe?
- How many datasets are analyzed?
- What does the preprocessor called
annual_mean_global
do?- Which script is applied for the diagnostic called
map
?- Can you link specific lines in the recipe to the tasks that we saw before?
- How is the location of the city specified?
- How is the temporal range of the data specified?
Answers
- The example recipe is written by Bouwe Andela and Mattia Righi.
- Manuel Schlund is listed as the maintainer of this recipe.
- Two datasets are analysed:
- CMIP6 data from the model BCC-ESM1
- CMIP5 data from the model bcc-csm1-1
- The preprocessor
annual_mean_global
computes an area mean as well as annual means- The diagnostic called
map
executes a script referred to asscript1
. This is a python script namedexamples/diagnostic.py
- There are two diagnostics:
map
andtimeseries
. Under the diagnosticmap
we find two tasks:
- a preprocessor task called
tas
, applying the preprocessor calledto_degrees_c
to the variabletas
.- a diagnostic task called
script1
, applying the scriptexamples/diagnostic.py
to the preprocessed data (map/tas
).Under the diagnostic
timeseries
we find three tasks:
- a preprocessor task called
tas_amsterdam
, applying the preprocessor calledannual_mean_amsterdam
to the variabletas
.- a preprocessor task called
tas_global
, applying the preprocessor calledannual_mean_global
to the variabletas
.- a diagnostic task called
script1
, applying the scriptexamples/diagnostic.py
to the preprocessed data (timeseries/tas_global
andtimeseries/tas_amsterdam
).- The
extract_location
preprocessor is used to get data for a specific location here. ESMValTool interpolates to the location based on the chosen scheme. Can you tell the scheme used here? For more ways to extract areas, see the Area operations page.- The
timerange
tag is used to extract data from a specific time period here. The start time is01/01/2000
and the span of time to calculate means is1 Month
given byP1M
. For more options on how to specify time ranges, see the timerange documentation.
Pro tip: short names and variable groups
The preprocessor tasks in ESMValTool are called ‘variable groups’. For the diagnostic
timeseries
, we have two variable groups:tas_amsterdam
andtas_global
. Both of them operate on the variabletas
(as indicated by theshort_name
), but they apply different preprocessors. For the diagnosticmap
the variable group itself is namedtas
, and you’ll notice that we do not explicitly provide theshort_name
. This is a shorthand built into ESMValTool.
Output files
Have another look at the output directory created by the ESMValTool run.
Which files/folders are created by each task?
Answer
- map/tas: creates
/preproc/map/tas
, which contains preprocessed data for each of the input datasets, a file calledmetadata.yml
describing the contents of these datasets and provenance information in the form of.xml
files.- timeseries/tas_global: creates
/preproc/timeseries/tas_global
, which contains preprocessed data for each of the input datasets, ametadata.yml
file and provenance information in the form of.xml
files.- timeseries/tas_amsterdam: creates
/preproc/timeseries/tas_amsterdam
, which contains preprocessed data for each of the input datasets, plus a combinedMultiModelMean
, ametadata.yml
file and provenance files.- map/script1: creates
/run/map/script1
with general information and a log of the diagnostic script run. It also creates/plots/map/script1/
and/work/map/script1
, which contain output figures and output datasets, respectively. For each output file, there is also corresponding provenance information in the form of.xml
,.bibtex
and.txt
files.- timeseries/script1: creates
/run/timeseries/script1
with general information and a log of the diagnostic script run. It also creates/plots/timeseries/script1
and/work/timeseries/script1
, which contain output figures and output datasets, respectively. For each output file, there is also corresponding provenance information in the form of.xml
,.bibtex
and.txt
files.
Pro tip: diagnostic logs
When you run ESMValTool, any log messages from the diagnostic script are not printed on the terminal. But they are written to the
log.txt
files in the folder/run/<diag_name>/log.txt
.ESMValTool does print a command that can be used to re-run a diagnostic script. When you use this the output will be printed to the command line.
Modifying the example recipe
Let’s make a small modification to the example recipe. Notice that now that you have copied and edited the recipe, you can use
esmvaltool run recipe_python.yml
to refer to your local file rather than the default version shipped with ESMValTool.
Change your location
Modify and run the recipe to analyse the temperature for your own location.
Solution
In principle, you only have to modify the location in the preprocessor called
annual_mean_amsterdam
. However, it is good practice to also replace all instances ofamsterdam
with the correct name of your location. Otherwise the log messages and output will be confusing. You are free to modify the names of preprocessors or diagnostics.In the
diff
file below you will see the changes we have made to the file. The top 2 lines are the filenames and the lines like@@ -39,9 +39,9 @@
represent the line numbers in the original and modified file, respectively. For more info on this format, see here.--- recipe_python.yml +++ recipe_python_london.yml @@ -39,9 +39,9 @@ convert_units: units: degrees_C - annual_mean_amsterdam: + annual_mean_london: extract_location: - location: Amsterdam + location: London scheme: linear annual_statistics: operator: mean @@ -83,7 +83,7 @@ cmap: Reds timeseries: - description: Annual mean temperature in Amsterdam and global mean since 1850. + description: Annual mean temperature in London and global mean since 1850. themes: - phys realms: @@ -92,9 +92,9 @@ tas_amsterdam: short_name: tas mip: Amon - preprocessor: annual_mean_amsterdam + preprocessor: annual_mean_london timerange: 1850/2000 - caption: Annual mean {long_name} in Amsterdam according to {dataset}. + caption: Annual mean {long_name} in London according to {dataset}. tas_global: short_name: tas mip: Amon
Key Points
ESMValTool recipes work ‘out of the box’ (if input data is available)
There are strong links between the recipe, log file, and output folders
Recipes can easily be modified to re-use existing code for your own use case
Conclusion of the basic tutorial
Overview
Teaching: 10 min
Exercises: 0 min
Compatibility:Questions
What do I do now?
Where can I get help?
What if I find a bug?
Where can I find more information about ESMValtool?
How can I cite ESMValtool?
Objectives
Breathe - you’re finished now!
Congratulations & Thanks!
Find out about the mini-tutorials, and what to do next.
Congratulations!
Congratulations on completing the ESMValTool tutorial! You should be now ready to go and start using ESMValTool independently.
The rest of this tutorial contains individual mini-tutorials to help work through a specific issue (not developed yet).
What next?
From here, there are lots of ways that you can continue to use ESMValTool.
- You can start from the list of existing recipes and run one of those.
- You can learn how to write your own diagnostics and recipes.
- You can contribute your recipe and diagnostics back into ESMValTool.
- You can learn how to prepare observational datasets to be suitable for use by ESMValTool.
Exercise: What do you want to do next?
- Think about what you want to do with ESMValTool.
- Decide what datasets and variables you want to use.
- Is any observational data available?
- How will you preprocess the data?
- What will your diagnostic script need to do?
- What will your final figure show?
Where can I get more information on ESMValTool?
Additional resources:
Where can I get more help?
There are lots of resources available to assist you in using ESMValTool.
The ESMValTool Discussions page is a good place to find information on general issues, or check if your question has already been addressed. If you have a GitHub account, you can also post your questions there.
If you encounter difficulties, a great starting point is to visit issues page issue page to check whether your issues have already been reported or not. If they have been reported before, suggestions provided by developers can help you to solve the issues you encountered. Note that you will need a GitHub account for this.
Additionally, there is an ESMValTool email list. Please see information on how to subscribe to the user mailing list.
What if I find a bug?
If you find a bug, please report it to the ESMValTool team. This will help us fix issues, ensuring not only your uninterrupted workflow but also contributing to the overall stability of ESMValTool for all users.
To report a bug, please create a new issue using the issue page.
In your bug report, please describe the problem as clearly and as completely as possible. You may need to include a recipe or the output log as well.
How do I cite the Tutorial?
Please use citation information available at https://doi.org/10.5281/zenodo.3974591.
Key Points
Individual mini-tutorials help work through a specific issue (not developed yet).
We are constantly improving this tutorial.
Writing your own recipe
Overview
Teaching: 15 min
Exercises: 30 min
Compatibility:Questions
How do I create a new recipe?
Can I use different preprocessors for different variables?
Can I use different datasets for different variables?
How can I combine different preprocessor functions?
Can I run the same recipe for multiple ensemble members?
Objectives
Create a recipe with multiple preprocessors
Use different preprocessors for different variables
Run a recipe with variables from different datasets
Introduction
One of the key strengths of ESMValTool is in making complex analyses reusable and reproducible. But that doesn’t mean everything in ESMValTool needs to be complex. Sometimes, the biggest challenge is in keeping things simple. You probably know the ‘warming stripes’ visualization by Professor Ed Hawkins. On the site https://showyourstripes.info you can find the same visualization for many regions in the world.
Shared by Ed Hawkins under a Creative Commons 4.0 Attribution International licence. Source: https://showyourstripes.info
In this episode, we will reproduce and extend this functionality with ESMValTool. We have prepared a small Python script that takes a NetCDF file with timeseries data, and visualizes it in the form of our desired warming stripes figure.
The diagnostic script that we will use is called warming_stripes.py
and
can be downloaded here.
Download the file and store it in your working directory. If you want, you may also have a look at the contents, but it is not necessary to do so for this lesson.
We will write an ESMValTool recipe that takes some data, performs the necessary preprocessing, and then runs this Python script.
Drawing up a plan
Previously, we saw that running ESMValTool executes a number of tasks. What tasks do you think we will need to execute and what should each of these tasks do to generate the warming stripes?
Answer
In this episode, we will need to do the following two tasks:
- A preprocessing task that converts the gridded temperature data to a timeseries of global temperature anomalies
- A diagnostic tasks that calls our Python script, taking our preprocessed timeseries data as input.
Building a recipe from scratch
The easiest way to make a new recipe is to start from an existing one, and modify it until it does exactly what you need. However, in this episode we will start from scratch. This forces us to think about all the steps involved in processing the data. We will also deal with commonly occurring errors through the development of the recipe.
Remember the basic structure of a recipe, and notice that each component is extensively described in the documentation under the section, “Overview”:
This is the first place to look for help if you get stuck.
Open a new file called recipe_warming_stripes.yml
:
nano recipe_warming_stripes.yml
Let’s add the standard header comments (these do not do anything), and a first description.
# ESMValTool
# recipe_warming_stripes.yml
---
documentation:
description: Reproducing Ed Hawkins' warming stripes visualization
title: Reproducing Ed Hawkins' warming stripes visualization.
Notice that yaml
always requires two spaces
indentation between the different
levels. Pressing ctrl+o
will save the file. Verify the filename at the bottom
and press enter. Then use ctrl+x
to exit the editor.
We will try to run the recipe after every modification we make, to see if it (still) works!
esmvaltool run recipe_warming_stripes.yml
In this case, it gives an error. Below you see the last few lines of the error message.
...
yamale.yamale_error.YamaleError:
Error validating data '/home/users/username/esmvaltool_tutorial/recipe_warming_stripes.yml'
with schema
'/apps/jasmin/community/esmvaltool/miniconda3_py311_23.11.0-2/envs/esmvaltool/lib/python3.11/
site-packages/esmvalcore/_recipe/recipe_schema.yml'
documentation.authors: Required field missing
2024-05-27 13:21:23,805 UTC [41924] INFO
If you have a question or need help, please start a new discussion on
https://github.com/ESMValGroup/ESMValTool/discussions
If you suspect this is a bug, please open an issue on
https://github.com/ESMValGroup/ESMValTool/issues
To make it easier to find out what the problem is, please consider attaching the
files run/recipe_*.yml and run/main_log_debug.txt from the output directory.
We can use the the log message above, to understand why ESMValTool failed. Here, this is because
we missed a required field with author names.
The text documentation.authors: Required field missing
tells us that. We see that ESMValTool always tries to validate the recipe
at an early stage. Note also the suggestion to open a GitHub issue if
you need help debugging the error message. This is something most
users do when they cannot understand the error or are not able to fix it
on their own.
Let’s add some additional information to the recipe. Open the recipe file again, and add an authors section below the description. ESMValTool expects the authors as a list, like so:
authors:
- lastname_firstname
To bypass a number of similar error messages, add a minimal diagnostics section below the documentation. The file should now look like:
# ESMValTool
# recipe_warming_stripes.yml
---
documentation:
description: Reproducing Ed Hawkins' warming stripes visualization
title: Reproducing Ed Hawkins' warming stripes visualization.
authors:
- doe_john
diagnostics:
dummy_diagnostic_1:
scripts: null
This is the minimal recipe layout that is required by ESMValTool. If we now run the recipe again, you will probably see the following error:
ValueError: Tag 'doe_john' does not exist in section
'authors' of /apps/jasmin/community/esmvaltool/ESMValTool_2.10.0/esmvaltool/config-references.yml
Pro tip: config-references.yml
The error message above points to a file named config-references.yml This is where ESMValTool stores all its citation information. To add yourself as an author, add your name in the form
lastname_firstname
in alphabetical order following the existing entries, under the# Development team
section. See the List of authors section in the ESMValTool documentation for more information.
For now, let’s just use one of the existing references. Change the author field to
righi_mattia
, who cannot receive enough credit for all the effort he put into
ESMValTool. If you now run the recipe again, you should see the final message
ERROR No tasks to run!
Although there is no actual error in the recipe, ESMValTool assumes you mistakenly left out a variable name to process and alerts you with this error message.
Adding a dataset entry
Let’s add a datasets section.
Filling in the dataset keys
Use the paths specified in the configuration file to explore the data directory, and look at the explanation of the dataset entry in the ESMValTool documentation. For both the datasets, write down the following properties:
- project
- variable (short name)
- CMIP table
- dataset (model name or obs/reanalysis dataset)
- experiment
- ensemble member
- grid
- start year
- end year
Answers
key file 1 file 2 project CMIP6 CMIP5 short name tas tas CMIP table Amon Amon dataset BCC-ESM1 bcc-csm1-1 experiment historical historical ensemble r1i1p1f1 r1i1p1 grid gn (native grid) N/A start year 1850 1850 end year 2014 2005 Note that the grid key is only required for CMIP6 data, and that the extent of the historical period has changed between CMIP5 and CMIP6.
Let us start with the BCC-ESM1 dataset and add a datasets section to the recipe,
listing this single dataset, as shown below. Note that key fields such
as mip
or start_year
are included in the datasets
section here but are part
of the diagnostic
section in the recipe example seen in
Running your first recipe.
# ESMValTool
# recipe_warming_stripes.yml
---
documentation:
description: Reproducing Ed Hawkins' warming stripes visualization
title: Reproducing Ed Hawkins' warming stripes visualization.
authors:
- doe_john
datasets:
- {dataset: BCC-ESM1, project: CMIP6, mip: Amon, exp: historical,
ensemble: r1i1p1f1, grid: gn, start_year: 1850, end_year: 2014}
diagnostics:
dummy_diagnostic_1:
scripts: null
The recipe should run but produce the same message as in the previous case since we still have not included a variable to actually process. We have not included the short name of the variable in this dataset section because this allows us to reuse this dataset entry with different variable names later on. This is not really necessary for our simple use case, but it is common practice in ESMValTool.
Pro-tip: Automatically populating a recipe with all available datasets
You can select all available models for processing using
glob
patterns or wildcards. An exampledatasets
section that uses all available CMIP6 models and ensemble members for thehistorical
experiment is available here. Note that you will have to set thesearch_esgf
option in theconfig_file
toalways
so that you can download data from ESGF nodes as needed.
Adding the preprocessor section
Above, we already described the preprocessing task that needs to convert the standard, gridded temperature data to a timeseries of temperature anomalies.
Defining the preprocessor
Have a look at the available preprocessors in the documentation. Write down
- Which preprocessor functions do you think we should use?
- What are the parameters that we can pass to these functions?
- What do you think should be the order of the preprocessors?
- A suitable name for the overall preprocessor
Solution
We need to calculate anomalies and global means. There is an
anomalies
preprocessor which takes in as arguments, a time period, a reference period, and whether or not to standardize the data. The global means can be calculated with thearea_statistics
preprocessor, which takes an operator as argument (in our case we want to compute themean
).The default order in which these preprocessors are applied can be seen here:
area_statistics
comes beforeanomalies
. If you want to change this, you can use thecustom_order
preprocessor as described here. For this example, we will keep the default order..Let’s name our preprocessor
global_anomalies
.
Add the following block to your recipe file between the datasets
and diagnostics
block:
preprocessors:
global_anomalies:
area_statistics:
operator: mean
anomalies:
period: month
reference:
start_year: 1981
start_month: 1
start_day: 1
end_year: 2010
end_month: 12
end_day: 31
standardize: false
Completing the diagnostics section
We are now ready to finish our diagnostics section. Remember that we want to create two tasks: a preprocessor task, and a diagnostic task. To illustrate that we can also pass settings to the diagnostic script, we add the option to specify a custom colormap.
Fill in the blanks
Extend the diagnostics section in your recipe by filling in the blanks in the following template:
diagnostics: <... (suitable name for our diagnostic)>: description: <...> variables: <... (suitable name for the preprocessed variable)>: short_name: <...> preprocessor: <...> scripts: <... (suitable name for our python script)>: script: <full path to python script> colormap: <... choose from matplotlib colormaps>
Solution
diagnostics: diagnostic_warming_stripes: description: visualize global temperature anomalies as warming stripes variables: global_temperature_anomalies_global: short_name: tas preprocessor: global_anomalies scripts: warming_stripes_script: script: ~/esmvaltool_tutorial/warming_stripes.py colormap: 'bwr'
You should now be able to run the recipe to get your own warming stripes.
Note: for the purpose of simplicity in this episode, we have not added logging or provenance tracking in the diagnostic script. Once you start to develop your own diagnostic scripts and want to add them to the ESMValTool repositories, this will be required. Writing your own diagnostic script is discussed in a later episode.
Bonus exercises
Below are a few exercises to practice modifying an ESMValTool recipe. For your reference, here’s a copy of the recipe at this point. This will be the point of departure for each of the modifications we’ll make below.
Specific location selection
On showyourstripes.org, you can download stripes for specific locations. Here we show how this can be done with ESMValTool. Instead of the global mean, we can pick a location to plot the stripes for. Can you find a suitable preprocessor to do this?
Solution
You can use
extract_point
orextract_region
to select a location. We usedextract_point
. Here’s a copy of the recipe at this point and this is the difference from the previous recipe:--- recipe_warming_stripes.yml +++ recipe_warming_stripes_local.yml @@ -10,9 +10,11 @@ - {dataset: BCC-ESM1, project: CMIP6, mip: Amon, exp: historical, ensemble: r1i1p1f1, grid: gn, start_year: 1850, end_year: 2014} preprocessors: - global_anomalies: - area_statistics: - operator: mean + anomalies_amsterdam: + extract_point: + latitude: 52.379189 + longitude: 4.899431 + scheme: linear anomalies: period: month reference: @@ -27,9 +29,9 @@ diagnostics: diagnostic_warming_stripes: variables: - global_temperature_anomalies: + temperature_anomalies_amsterdam: short_name: tas - preprocessor: global_anomalies + preprocessor: anomalies_amsterdam scripts: warming_stripes_script: script: ~/esmvaltool_tutorial/warming_stripes.py
Different time periods
Split the diagnostic in two with two different time periods for the same variable. You can choose the time periods yourself. In the example below, we have chosen the recent past and the 20th century and have used variable grouping.
Solution
Here’s a copy of the recipe at this point and this is the difference with the previous recipe:
--- recipe_warming_stripes_local.yml +++ recipe_warming_stripes_periods.yml @@ -7,7 +7,7 @@ datasets: - - {dataset: BCC-ESM1, project: CMIP6, mip: Amon, exp: historical, - ensemble: r1i1p1f1, grid: gn, start_year: 1850, end_year: 2014} + - {dataset: BCC-ESM1, project: CMIP6, mip: Amon, exp: historical, + ensemble: r1i1p1f1, grid: gn} preprocessors: anomalies_amsterdam: @@ -29,9 +29,16 @@ diagnostics: diagnostic_warming_stripes: variables: - temperature_anomalies_amsterdam: + temperature_anomalies_recent: short_name: tas preprocessor: anomalies_amsterdam + start_year: 1950 + end_year: 2014 + temperature_anomalies_20th_century: + short_name: tas + preprocessor: anomalies_amsterdam + start_year: 1900 + end_year: 1999 scripts: warming_stripes_script: script: ~/esmvaltool_tutorial/warming_stripes.py
Different preprocessors
Now that you have different variable groups, we can also use different preprocessors. Add a second preprocessor to add another location of your choosing.
Solution
Here’s a copy of the recipe at this point and this is the difference with the previous recipe:
--- recipe_warming_stripes_periods.yml +++ recipe_warming_stripes_multiple_locations.yml @@ -15,7 +15,7 @@ latitude: 52.379189 longitude: 4.899431 scheme: linear - anomalies: + anomalies: &anomalies period: month reference: start_year: 1981 @@ -25,18 +25,24 @@ end_month: 12 end_day: 31 standardize: false + anomalies_london: + extract_point: + latitude: 51.5074 + longitude: 0.1278 + scheme: linear + anomalies: *anomalies diagnostics: diagnostic_warming_stripes: variables: - temperature_anomalies_recent: + temperature_anomalies_recent_amsterdam: short_name: tas preprocessor: anomalies_amsterdam start_year: 1950 end_year: 2014 - temperature_anomalies_20th_century: + temperature_anomalies_20th_century_london: short_name: tas - preprocessor: anomalies_amsterdam + preprocessor: anomalies_london start_year: 1900 end_year: 1999 scripts:
Pro-tip: YAML anchors
If you want to avoid retyping the arguments used in your preprocessor, you can use YAML anchors as seen in the
anomalies
preprocessor specifications in the recipe above.
Additional datasets
So far we have defined the datasets in the datasets section of the recipe. However, it’s also possible to add specific datasets only for specific variables or variable groups. Take a look at the documentation to learn about the
additional_datasets
keyword here, and add a second dataset only for one of the variable groups.Solution
Here’s a copy of the recipe at this point and this is the difference with the previous recipe:
--- recipe_warming_stripes_multiple_locations.yml +++ recipe_warming_stripes_additional_datasets.yml @@ -45,6 +45,8 @@ preprocessor: anomalies_london start_year: 1900 end_year: 1999 + additional_datasets: + - {dataset: CanESM2, project: CMIP5, mip: Amon, exp: historical, ensemble: r1i1p1} scripts: warming_stripes_script: script: ~/esmvaltool_tutorial/warming_stripes.py
Multiple ensemble members
You can choose data from multiple ensemble members for a model in a single line.
Solution
The
dataset
section allows you to choose more than one ensemble member Here’s a copy of the changed recipe to do that. Changes made are shown in the diff output below:--- recipe_warming_stripes.yml 2024-05-27 15:37:52.340358967 +0100 +++ recipe_warming_stripes_multiens.yml 2024-05-27 22:18:42.035558837 +0100 @@ -10,7 +10,7 @@ - ensemble: r1i1p1f1, grid: gn, start_year: 1850, end_year: 2014} + ensemble: "r(1:2)i1p1f1", grid: gn, start_year: 1850, end_year: 2014}
Pro-tip: Concatenating datasets
Check out the section on a different way to use multiple ensemble members or even multiple experiments at Concatenating data corresponding to multiple facets.
Key Points
A recipe can work with different preprocessors at the same time.
The setting
additional_datasets
can be used to add a different dataset.Variable groups are useful for defining different settings for different variables.
Multiple ensemble members and experiments can be analysed in a single recipe through concatenation.
Development and contribution
Overview
Teaching: 10 min
Exercises: 20 min
Compatibility:Questions
What is a development installation?
How can I test new or improved code?
How can I incorporate my contributions into ESMValTool?
Objectives
Execute a successful ESMValTool installation from the source code.
Contribute to ESMValTool development.
We now know how ESMValTool works, but how do we develop it? ESMValTool is an open-source project in ESMValGroup. We can contribute to its development by:
- a new or updated recipe script, see lesson on Writing your own recipe
- a new or updated diagnostics script, see lesson on Writing your own diagnostic script
- a new or updated cmorizer script, see lesson on CMORization: Using observational datasets
- helping with reviewing process of pull requests, see ESMValTool documentation on Review of pull requests
In this lesson, we first show how to set up a development installation of ESMValTool so you can make changes or additions. We then explain how you can contribute these changes to the community.
Git knowledge
For this episode, you need some knowledge of Git. You can refresh your knowledge in the corresponding Git carpentries course.
Development installation
We’ll explore how ESMValTool can be installed it in a develop
mode.
Even if you aren’t collaborating with the community, this installation is needed
to run your new codes with ESMValTool.
Let’s get started.
1 Source code
The ESMValTool source code is available on a public GitHub repository: https://github.com/ESMValGroup/ESMValTool. To obtain the code, there are two options:
- Download the code from the repository. A ZIP file called
ESMValTool-main.zip
is downloaded. To continue the installation, unzip the file, move to theESMValTool-main
directory and then follow the sequence of steps starting from the section on ESMValTool dependencies below. - Clone the repository if you want to contribute to the ESMValTool development:
git clone https://github.com/ESMValGroup/ESMValTool.git
This command will ask your GitHub username and a personal token as password. Please follow instructions on GitHub token authentication requirements to create a personal access token. Alternatively, you could generate a new SSH key and add it to your GitHub account. After the authentication, the output might look like:
Cloning into 'ESMValTool'...
remote: Enumerating objects: 163, done.
remote: Counting objects: 100% (163/163), done.
remote: Compressing objects: 100% (125/125), done.
remote: Total 95049 (delta 84), reused 76 (delta 30), pack-reused 94886
Receiving objects: 100% (95049/95049), 175.16 MiB | 5.48 MiB/s, done.
Resolving deltas: 100% (68808/68808), done.
Now, a folder called ESMValTool
has been created in your working directory.
This folder contains the source code of the tool.
To continue the installation, we move into the ESMValTool
directory:
cd ESMValTool
Note that the main
branch is checked out by default.
We can see this if we run:
git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
2 ESMValTool dependencies
Please don’t forget if an esmvaltool environment is already created following the lesson Installation, we should choose another name for the new environment in this lesson.
ESMValTool now uses mamba
instead of conda
for the recommended installation.
For a minimal mamba installation, see section Install Mamba in lesson
Installation.
It is good practice to update the version of mamba and conda on your machine before setting up ESMValTool. This can be done as follows:
mamba update --name base mamba conda
To simplify the installation process, an environment file environment.yml
is
provided in the ESMValTool directory. We create an environment by running:
mamba env create --name esmvaltool --file environment.yml
The environment is called esmvaltool
by default.
If an esmvaltool
environment is already created following the lesson
Installation,
we should choose another name for the new environment in this lesson by:
mamba env create --name a_new_name --file environment.yml
This will create a new conda environment and install ESMValTool (with all dependencies that are needed for development purposes) into it with a single command.
For more information see conda managing environments.
Now, we should activate the environment:
conda activate esmvaltool
where esmvaltool
is the name of the environment (replace by a_new_name
in case another environment name was used).
3 ESMValTool installation
ESMValTool can be installed in a develop
mode by running:
pip install --editable '.[develop]'
This will add the esmvaltool
directory to the Python path in editable mode and
install the development dependencies. We should check if the installation
works properly. To do this, run the tool with:
esmvaltool --help
If the installation is successful, ESMValTool prints a help message to the console.
Checking the development installation
We can use the command
mamba list
to list installed packages in theesmvaltool
environment. Use this command to check that ESMValTool is installed in adevelop
mode.Tip: see the documentation on conda list.
Solution
Run:
mamba list esmvaltool
# Name Version Build Channel esmvaltool 2.10.0.dev3+g2dbc2cfcc pypi_0 pypi
4 Updating ESMValTool
The main
branch has the latest features of ESMValTool. Please make sure
that the source code on your machine is up-to-date. If you obtain the source
code using git clone as explained in step 1 Source code, you can run git pull
to update the source code. Then ESMValTool installation will be updated
with changes from the main
branch.
Contribution
We have seen how to install ESMValTool in a develop
mode.
Now, we try to contribute to its development. Let’s see how this can be achieved.
We first discuss our ideas in an
issue in ESMValTool repository.
This can avoid disappointment at a later stage, for example,
if more people are doing the same thing.
It also gives other people an early opportunity to provide input and suggestions,
which results in more valuable contributions.
Then, we create a new branch
locally and start developing new codes.
To create a new branch:
git checkout -b your_branch_name
If needed, a link to a git tutorial can be found in Setup.
Once our development is finished, we can initiate a pull request
.
To this end, we encourage you to join the ESMValTool development team.
For more extensive documentation on contributing code, including a section on the GitHubWorkflow, please see the Contributing code and documentation section in the ESMValtool documentation.
Review process
The pull request will be tested, discussed and merged as part of the “review process”. The process will take some effort and time to learn. However, a few (command line) tools can get you a long way, and we’ll cover those essentials in the next sections.
Tip: we encourage you to keep the pull requests small. Reviewing small incremental changes is more efficient.
Working example
We saw the ‘warming stripes’ diagnostic in lesson Writing your own recipe. Imagine the following task: you want to contribute warming stripes recipe and diagnostics to ESMValTool. You have to add the diagnostics warming_stripes.py and the recipe recipe_warming_stripes.yml to their locations in ESMValTool directory. After these changes, you should also check if everything works fine. This is where we take advantage of the tools that are introduced later.
Let’s get started. Note that since this is an exercise to get familiar with the development and contribution process, we will not create a GitHub issue at this time but proceed as though it has been done.
Check code quality
We aim to adhere to best practices and coding standards. There are several tools that check our code against those standards like:
- flake8 for checking against the PEP8 style guide
- yapf to ensure consistent formatting for the whole project
- isort to consistently sort the import statements
- yamllint to ensure there are no syntax errors in our recipes and config files
- lintr for diagnostic scripts written in R
- codespell to check grammar
The good news is that pre-commit
has been already installed
when we chose development installation.
pre-commit
is a command line and runs all of those tools. It also fixes some of those errors.
To explore other tools, have a look at ESMValTool documentation on
Code style.
Using pre-commit
Let’s checkout our local branch and add the script warming_stripes.py to the
esmvaltool/diag_scripts
directory.cd ESMValTool git checkout your_branch_name cp path_of_warming_stripes.py esmvaltool/diag_scripts/
By default,
pre-commit
only runs on the files that have been staged in git:git status git add esmvaltool/diag_scripts/warming_stripes.py pre-commit run --files esmvaltool/diag_scripts/warming_stripes.py
Inspect the output of
pre-commit
and fix the remaining errors.Solution
The tail of the output of
pre-commit
:Check for added large files..............................................Passed Check python ast.........................................................Passed Check for case conflicts.................................................Passed Check for merge conflicts................................................Passed Debug Statements (Python)................................................Passed Fix End of Files.........................................................Passed Trim Trailing Whitespace.................................................Passed yamllint.............................................(no files to check)Skipped nclcodestyle.........................................(no files to check)Skipped style-files..........................................(no files to check)Skipped lintr................................................(no files to check)Skipped codespell................................................................Passed isort....................................................................Passed yapf.....................................................................Passed docformatter.............................................................Failed - hook id: docformatter - files were modified by this hook flake8...................................................................Failed - hook id: flake8 - exit code: 1 esmvaltool/diag_scripts/warming_stripes.py:20:5: F841 local variable 'nx' is assigned to but never used
As can be seen above, there are two
Failed
check:
docformatter
: it is mentioned that “files were modified by this hook”. We rungit diff
to see the modifications. The output includes the following:+in the form of the popular warming stripes figure by Ed Hawkins."""
The syntax
"""
at the end of docstring is moved by one line. Shifting it to the next line should fix this error.flake8
: the error message is about an unused local variablenx
. We should check our codes regarding the usage ofnx
. For now, let’s assume that it is added by mistake and remove it. Note that you have to rungit add
again to re-stage the file. Then rerun pre-commit and check that it passes.
Run unit tests
Previous section introduced some tools to check code style and quality.
There is lack of mechanism to determine whether or not our code is getting the right answer.
To achieve that, we need to write and run tests for widely-used functions.
ESMValTool comes with a lot of tests that are in the folder tests
.
To run tests, first we make sure that the working directory is ESMValTool
and our local branch is checked out. Then, we can run tests using pytest
locally:
pytest
Tests will also be run automatically by CircleCI, when you submit a pull request.
Running tests
Make sure our local branch is checked out and add the recipe recipe_warming_stripes.yml to the
esmvaltool/recipes
directory:cp path_of_recipe_warming_stripes.yml esmvaltool/recipes/
Run
pytest
and inspect the results, this might take a few minutes. If a test is failed, try to fix it.Solution
Run:
pytest
When
pytest
run is complete, you can inspect the test reports that are printed in the console. Have a look at the second section of the reportFAILURES
:================================ FAILURES ========================================== ______________ test_recipe_valid[recipe_warming_stripes.yml] ______________
The test message shows that the recipe
recipe_warming_stripes.yml
is not a valid recipe. Look for a line that starts with anE
in the rest of the message:E esmvalcore._task.DiagnosticError: Cannot execute script '~/esmvaltool_tutorial/warming_stripes.py' (~/esmvaltool_tutorial/warming_stripes.py): file does not exist.
To fix the recipe, we need to edit the path of the diagnostic script as
warming_stripes.py
:scripts: warming_stripes_script: script: warming_stripes.py
For details, see lesson Writing your own diagnostic script.
Build documentation
When we add or update a code, we also update its corresponding documentation.
The ESMValTool documentation is available on
docs.esmvaltool.org.
The source files are located in ESMValTool/doc/sphinx/source/
.
To build documentation locally, first we make sure that the working directory is ESMValTool
and our local branch is checked out. Then, we run:
sphinx-build -Ea doc/sphinx/source/ doc/sphinx/build/
Similar to code, documentation should be well written and adhere to standards. If the documentation is built properly, the previous command prints a message to the console:
build succeeded.
The HTML pages are in doc/sphinx/build.
The main page of the documentation has been built into index.html
in doc/sphinx/build/
directory.
To preview this page locally, we open the file in a web browser:
xdg-open doc/sphinx/build/index.html
Creating a documentation
In previous exercises, we added the recipe recipe_warming_stripes.yml to ESMValTool. Now, we create a documentation file
recipe_warming_stripes.rst
for this recipe:nano doc/sphinx/source/recipes/recipe_warming_stripes.rst
Add a reference i.e.
.. _recipe_warming_stripes:
, a section title and some text about the recipe like:.. _recipe_warming_stripes: Reproducing Ed Hawkins' warming stripes visualization ====================================================== This recipe produces warming stripes plots.
Save and close the file. We can think of this file as one page of a book. Examples of documentation pages can be found in the folder
ESMValTool/doc/sphinx/source/recipes
. Then, we need to decide where this page should be located inside the book. The table of content is defined byindex.rst
. Let’s have a look at the content:nano doc/sphinx/source/recipes/index.rst
Add the recipe name i.e.
recipe_warming_stripes
to the sectionOther
in this file and preview the recipe documentation page locally.Solution
First, we add the recipe name
recipe_warming_stripes
to the sectionOther
:Other ^^^^^ .. toctree:: :maxdepth: 1 ... ... recipe_warming_stripes
Then, we build and preview the documentation page:
sphinx-build -Ea doc/sphinx/source/ doc/sphinx/build/ xdg-open doc/sphinx/build/recipes/recipe_warming_stripes.html
Congratulations! You are now ready to make a pull request.
Key Points
A development installation is needed if you want to incorporate your code into ESMValTool.
Contributions include adding a new or improved script or helping with a review process.
There are several tools to help improve the quality of your code.
It is possible to run tests on your machine.
You can preview documentation pages locally.
Writing your own diagnostic script
Overview
Teaching: 20 min
Exercises: 30 min
Compatibility:Questions
How do I write a new diagnostic in ESMValTool?
How do I use the preprocessor output in a Python diagnostic?
Objectives
Write a new Python diagnostic script.
Explain how a diagnostic script reads the preprocessor output.
Introduction
The diagnostic script is an important component of ESMValTool and it is where the scientific analysis or performance metric is implemented. With ESMValTool, you can adapt an existing diagnostic or write a new script from scratch. Diagnostics can be written in a number of open source languages such as Python, R, Julia and NCL but we will focus on understanding and writing Python diagnostics in this lesson.
In this lesson, we will explain how to find an existing diagnostic and run it using ESMValTool installed in editable/development mode. For a development installation, see the instructions in the lesson Development and contribution. Also, we will work with the recipe recipe_python.yml and the diagnostic script diagnostic.py called by this recipe that we have seen in the lesson Running your first recipe.
Let’s get started!
Understanding an existing Python diagnostic
If you clone the ESMValTool repository, a folder called ESMValTool
is
created in your home/working directory, see the instructions in the lesson
Development and contribution.
The folder ESMValTool
contains the source code of the tool. We can find the
recipe recipe_python.yml
and the python script diagnostic.py
in these
directories:
- ~/ESMValTool/esmvaltool/recipes/examples/recipe_python.yml
- ~/ESMValTool/esmvaltool/diag_scripts/examples/diagnostic.py
Let’s have look at the code in diagnostic.py
.
For reference, we show the diagnostic code in the dropdown box below.
There are four main sections in the script:
- A description i.e. the
docstring
(line 1). - Import statements (line 2-16).
- Functions that implement our analysis (line 21-102).
- A typical Python top-level script i.e.
if __name__ == '__main__'
(line 105-108).
diagnostic.py
1: """Python example diagnostic.""" 2: import logging 3: from pathlib import Path 4: from pprint import pformat 5: 6: import iris 7: 8: from esmvaltool.diag_scripts.shared import ( 9: group_metadata, 10: run_diagnostic, 11: save_data, 12: save_figure, 13: select_metadata, 14: sorted_metadata, 15: ) 16: from esmvaltool.diag_scripts.shared.plot import quickplot 17: 18: logger = logging.getLogger(Path(__file__).stem) 19: 20: 21: def get_provenance_record(attributes, ancestor_files): 22: """Create a provenance record describing the diagnostic data and plot.""" 23: caption = caption = attributes['caption'].format(**attributes) 24: 25: record = { 26: 'caption': caption, 27: 'statistics': ['mean'], 28: 'domains': ['global'], 29: 'plot_types': ['zonal'], 30: 'authors': [ 31: 'andela_bouwe', 32: 'righi_mattia', 33: ], 34: 'references': [ 35: 'acknow_project', 36: ], 37: 'ancestors': ancestor_files, 38: } 39: return record 40: 41: 42: def compute_diagnostic(filename): 43: """Compute an example diagnostic.""" 44: logger.debug("Loading %s", filename) 45: cube = iris.load_cube(filename) 46: 47: logger.debug("Running example computation") 48: cube = iris.util.squeeze(cube) 49: return cube 50: 51: 52: def plot_diagnostic(cube, basename, provenance_record, cfg): 53: """Create diagnostic data and plot it.""" 54: 55: # Save the data used for the plot 56: save_data(basename, provenance_record, cfg, cube) 57: 58: if cfg.get('quickplot'): 59: # Create the plot 60: quickplot(cube, **cfg['quickplot']) 61: # And save the plot 62: save_figure(basename, provenance_record, cfg) 63: 64: 65: def main(cfg): 66: """Compute the time average for each input dataset.""" 67: # Get a description of the preprocessed data that we will use as input. 68: input_data = cfg['input_data'].values() 69: 70: # Demonstrate use of metadata access convenience functions. 71: selection = select_metadata(input_data, short_name='tas', project='CMIP5') 72: logger.info("Example of how to select only CMIP5 temperature data:\n%s", 73: pformat(selection)) 74: 75: selection = sorted_metadata(selection, sort='dataset') 76: logger.info("Example of how to sort this selection by dataset:\n%s", 77: pformat(selection)) 78: 79: grouped_input_data = group_metadata(input_data, 80: 'variable_group', 81: sort='dataset') 82: logger.info( 83: "Example of how to group and sort input data by variable groups from " 84: "the recipe:\n%s", pformat(grouped_input_data)) 85: 86: # Example of how to loop over variables/datasets in alphabetical order 87: groups = group_metadata(input_data, 'variable_group', sort='dataset') 88: for group_name in groups: 89: logger.info("Processing variable %s", group_name) 90: for attributes in groups[group_name]: 91: logger.info("Processing dataset %s", attributes['dataset']) 92: input_file = attributes['filename'] 93: cube = compute_diagnostic(input_file) 94: 95: output_basename = Path(input_file).stem 96: if group_name != attributes['short_name']: 97: output_basename = group_name + '_' + output_basename 98: if "caption" not in attributes: 99: attributes['caption'] = input_file 100: provenance_record = get_provenance_record( 101: attributes, ancestor_files=[input_file]) 102: plot_diagnostic(cube, output_basename, provenance_record, cfg) 103: 104: 105: if __name__ == '__main__': 106: 107: with run_diagnostic() as config: 108: main(config)
What is the starting point of a diagnostic?
- Can you spot a function called
main
in the code above?- What are its input arguments?
- How many times is this function mentioned?
Answer
- The
main
function is defined in line 65 asmain(cfg)
.- The input argument to this function is the variable
cfg
, a Python dictionary that holds all the necessary information needed to run the diagnostic script such as the location of input data and various settings. We will next parse thiscfg
variable in themain
function and extract information as needed to do our analyses (e.g. in line 68).- The
main
function is called near the very end on line 108. So, it is mentioned twice in our code - once where it is called by the top-level Python script and second where it is defined.
The function run_diagnostic
The function
run_diagnostic
(line 107) is called a context manager provided with ESMValTool and is the main entry point for most Python diagnostics.
Preprocessor-diagnostic interface
In the previous exercise, we have seen that the variable cfg
is the input
argument of the main
function. The first argument passed to the diagnostic
via the cfg
dictionary is a path to a file called settings.yml
.
The ESMValTool documentation page provides an overview of what is in this file, see
Diagnostic script interfaces.
What information do I need when writing a diagnostic script?
From the lesson Configuration, we saw how to change the configuration settings before running a recipe. First we set the option
remove_preproc_dir
tofalse
in the configuration file, then run the reciperecipe_python.yml
:esmvaltool run examples/recipe_python.yml
- Find one example of the file
settings.yml
in therun
directory?- Open the file
settings.yml
and look at theinput_files
list. It contains paths to some filesmetadata.yml
. What information do you think is saved in those files?Answer
- One example of
settings.yml
can be found in the directory: path_to_recipe_output/run/map/script1/settings.yml- The
metadata.yml
files hold information about the preprocessed data. There is one file for each variable having detailed information on your data including project (e.g., CMIP6, CMIP5), dataset names (e.g., BCC-ESM1, CanESM2), variable attributes (e.g., standard_name, units), preprocessor applied and time range of the data. You can use all of this information in your own diagnostic.
Diagnostic shared functions
Looking at the code in diagnostic.py
, we see that input_data
is
read from the cfg
dictionary (line 68). Now we can group the input_data
according to some criteria such as the model or experiment. To do so,
ESMValTool provides many functions such as select_metadata
(line 71),
sorted_metadata
(line 75), and group_metadata
(line 79). As you can see
in line 8, these functions are imported from esmvaltool.diag_scripts.shared
that means these are shared across several diagnostics scripts. A list of
available functions and their description can be found in
The ESMValTool Diagnostic API reference.
Extracting information needed for analyses
We have seen the functions used for selecting, sorting and grouping data in the script. What do these functions do?
Answer
There is a statement after use of
select_metadata
,sorted_metadata
andgroup_metadata
that starts withlogger.info
(lines 72, 76 and 82). These lines print output to the log files. In the previous exercise, we ran the reciperecipe_python.yml
. If you look at the log filerecipe_python_#_#/run/map/script1/log.txt
inesmvaltool_output
directory, you can see the output from each of these functions, for example:2023-06-28 12:47:14,038 [2548510] INFO diagnostic,106 Example of how to group and sort input data by variable groups from the recipe: {'tas': [{'alias': 'CMIP5', 'caption': 'Global map of {long_name} in January 2000 according to ' '{dataset}.\n', 'dataset': 'bcc-csm1-1', 'diagnostic': 'map', 'end_year': 2000, 'ensemble': 'r1i1p1', 'exp': 'historical', 'filename': '~/recipe_python_20230628_124639/preproc/map/tas/ CMIP5_bcc-csm1-1_Amon_historical_r1i1p1_tas_2000-P1M.nc', 'frequency': 'mon', 'institute': ['BCC'], 'long_name': 'Near-Surface Air Temperature', 'mip': 'Amon', 'modeling_realm': ['atmos'], 'preprocessor': 'to_degrees_c', 'product': ['output1', 'output2'], 'project': 'CMIP5', 'recipe_dataset_index': 1, 'short_name': 'tas', 'standard_name': 'air_temperature', 'start_year': 2000, 'timerange': '2000/P1M', 'units': 'degrees_C', 'variable_group': 'tas', 'version': 'v1'}, {'activity': 'CMIP', 'alias': 'CMIP6', 'caption': 'Global map of {long_name} in January 2000 according to ' '{dataset}.\n', 'dataset': 'BCC-ESM1', 'diagnostic': 'map', 'end_year': 2000, 'ensemble': 'r1i1p1f1', 'exp': 'historical', 'filename': '~/recipe_python_20230628_124639/preproc/map/tas/ CMIP6_BCC-ESM1_Amon_historical_r1i1p1f1_tas_gn_2000-P1M.nc', 'frequency': 'mon', 'grid': 'gn', 'institute': ['BCC'], 'long_name': 'Near-Surface Air Temperature', 'mip': 'Amon', 'modeling_realm': ['atmos'], 'preprocessor': 'to_degrees_c', 'project': 'CMIP6', 'recipe_dataset_index': 0, 'short_name': 'tas', 'standard_name': 'air_temperature', 'start_year': 2000, 'timerange': '2000/P1M', 'units': 'degrees_C', 'variable_group': 'tas', 'version': 'v20181214'}]}
This is how we can access preprocessed data within our diagnostic.
Diagnostic computation
After grouping and selecting data, we can read individual attributes (such as filename)
of each item. Here, we have grouped the input data by variables
,
so we loop over the variables (line 88). Following this is a call to the
function compute_diagnostic
(line 93). Let’s look at the
definition of this function in line 42, where the actual analysis of the data is done.
Note that output from the ESMValCore preprocessor is in the form of NetCDF files.
Here, compute_diagnostic
uses
Iris to read data
from a netCDF file and performs an operation squeeze
to remove any dimensions
of length one. We can adapt this function to add our own analysis. As an example,
here we calculate the bias using the average of the data using Iris cubes.
def compute_diagnostic(filename):
"""Compute an example diagnostic."""
logger.debug("Loading %s", filename)
cube = iris.load_cube(filename)
logger.debug("Running example computation")
cube = iris.util.squeeze(cube)
# Calculate a bias using the average of data
cube.data = cube.core_data() - cube.core_data.mean()
return cube
iris cubes
Iris reads data from NetCDF files into data structures called cubes. The data in these cubes can be modified, combined with other cubes’ data or plotted.
Reading data using xarray
Alternately, you can use xarrays to read the data instead of Iris.
Answer
First, import
xarray
package at the top of the script as:import xarray as xr
Then, change the
compute_diagnostic
as:def compute_diagnostic(filename): """Compute an example diagnostic.""" logger.debug("Loading %s", filename) dataset = xr.open_dataset(filename) #do your analyses on the data here return dataset
Caution: If you read data using xarray keep in mind to change accordingly the other functions in the diagnostic which are dealing at the moment with Iris cubes.
Reading data using the netCDF4 package
Yet another option to read the NetCDF file data is to use the netCDF-4 Python interface to the netCDF C library.
Answer
First, import the
netCDF4
package at the top of the script as:import netCDF4
Then, change
compute_diagnostic
as:def compute_diagnostic(filename): """Compute an example diagnostic.""" logger.debug("Loading %s", filename) nc_data = netCDF4.Dataset(filename,'r') #do your analyses on the data here return nc_data
Caution: If you read data using netCDF4 keep in mind to change accordingly the other functions in the diagnostic which are dealing at the moment with Iris cubes.
Diagnostic output
Plotting the output
Often, the end product of a diagnostic script is a plot or figure. The Iris cube
returned from the compute_diagnostic
function (line 93) is passed to the
plot_diagnostic
function (line 102). Let’s have a look at the definition of
this function in line 52. This is where we would plug in our plotting routine in the
diagnostic script.
More specifically, the quickplot
function (line 60) can be replaced with the
function of our choice. As can be seen, this function uses
**cfg['quickplot']
as an input argument. If you look at the diagnostic
section in the recipe recipe_python.yml
, you see quickplot
is a key
there:
script1:
script: examples/diagnostic.py
quickplot:
plot_type: pcolormesh
cmap: Reds
This way, we can pass arguments such as the type of
plot pcolormesh
and the colormap cmap:Reds
from the recipe to the
quickplot
function in the diagnostic.
Passing arguments from the recipe to the diagnostic
Change the type of the plot and its colormap and inspect the output figure.
Answer
In the recipe
recipe_python.yml
, you could changeplot_type
andcmap
. As an example, we chooseplot_type: pcolor
andcmap: BuGn
:script1: script: examples/diagnostic.py quickplot: plot_type: pcolor cmap: BuGn
The plot can be found at path_to_recipe_output/plots/map/script1/png.
ESMValTool gallery
ESMValTool makes it possible to produce a wide array of plots and figures as seen in the gallery.
Saving the output
In our example, the function save_data
in line 56 is used to save the Iris
cube. The saved files can be found under the work
directory in a .nc
format.
There is also the function save_figure
in line 62 to save the plots under the
plot
directory in a .png
format (or preferred format specified in your
configuration settings). Again, you may choose your own method
of saving the output.
Recording the provenance
When developing a diagnostic script, it is good practice to record
provenance. To do so, we use the function get_provenance_record
(line 100).
Let us have a look at the definition of this function in line 21 where we
describe the diagnostic data and plot. Using the dictionary record
, it is
possible to add custom provenance to our diagnostics output.
Provenance is stored in the W3C PROV XML
format and also in an SVG file under the work
and plot
directory. For
more information, see recording provenance.
Congratulations!
You now know the basic diagnostic script structure and some available tools for putting together your own diagnostics. Have a look at existing recipes and diagnostics in the repository for more examples of functions you can use in your diagnostics!
Key Points
ESMValTool provides helper functions to interface a Python diagnostic script with preprocessor output.
Existing diagnostics can be used as templates and modified to write new diagnostics.
Helper functions can be imported from
esmvaltool.diag_scripts.shared
and used in your own diagnostic script.
CMORization: adding new datasets to ESMValTool
Overview
Teaching: 15 min
Exercises: 45 min
Compatibility:Questions
CMORization: what is it and why do we need it?
How to use the existing CMORizer scripts shipped with ESMValTool?
How to add support for new (observational) datasets?
Objectives
Understand what CMORization is and why it is necessary.
Use existing scripts to CMORize your data.
Write a new CMORizer script to support additional data.
Introduction
This episode deals with “CMORization”. ESMValTool is designed to work with data that follow the CMOR standards. Unfortunately, not all datasets follow these standards. In order to use such datasets in ESMValTool we first need to reformat the data. This process is called “CMORization”.
What are the CMOR standards?
The name “CMOR” originates from a tool: the Climate Model Output Rewriter. This tool is used to create “CF-Compliant netCDF files for use in the CMIP projects”. So CMOR extends the CF-standard with additional requirements for the Coupled Model Intercomparison Projects (see e.g. here).
Concretely, the CMOR standards dictate e.g. the variable names and units, coordinate information, how the data should be structured (e.g. 1 variable per file), additional metadata requirements, and file naming conventions a.k.a. the data reference syntax (DRS). All this information is stored in so-called CMOR tables. For example, the CMOR tables for the CMIP6 project can be found here.
ESMValTool offers two ways to CMORize data:
- A reformatting script can be used to create a CMOR-compliant copy. CMORizer scripts for several popular datasets are included in ESMValTool, and ESMValTool also provides a convenient way to execute them.
- ESMValCore can execute CMOR fixes ‘on the fly’. The advantage is that you don’t need to store an additional, reformatted copy of the data. The disadvantage is that these fixes should be implemented inside ESMValCore, which is beyond the scope of this tutorial.
In this lesson, we will re-implement a CMORizer script for the FLUXCOM dataset that contains observations of the Gross Primary Production (GPP), a variable that is important for calculating components of the global carbon cycle. See the next section on how to obtain data.
As in the previous episode (Development and Contribution episode), we will be using the development installation of ESMValTool.
Obtaining the data
The data for this episode is available via the FluxCom Data
Portal. First you’ll need to
register. After registration, in the dropdown boxes, select FLUXCOM as the data
choice and click download. Three files will be displayed. Click the download
button on the “FLUXCOM (RS+METEO) Global Land Carbon Fluxes using CRUNCEP
climate data”. You’ll receive an email with the FTP address to access the
server. Connect to the server, follow the path in your email, and look for the
file raw/monthly/GPP.ANN.CRUNCEPv6.monthly.2000.nc
. Download that file and
save it in a folder called ~/data/RAWOBS/Tier3/FLUXCOM
.
Note: you’ll need a user-friendly ftp client. On Linux, ncftp
works okay.
What is the deal with those “tiers”?
Many datasets come with access restrictions. In this way the data providers can keep track of how their data is used. In many cases “restricted access” just means that one has to register with an email address and accept the terms of use, which typically ask that you acknowledge the data providers.
There are also datasets available that do not need a registration. The “obs4MIPs” or “ana4MIPs” datasets, for example, are specifically produced to facilitate comparisons with model simulations.
To reflect these different levels of access restriction, the ESMValTool team has created a tier-system. The definition of the different tiers are as follows:
- Tier1: obs4MIPs and ana4MIPS datasets (can be used directly with the ESMValTool)
- Tier2: other freely available datasets (most of them will need some kind of cmorization)
- Tier3: datasets with access restrictions (most of these datasets will also need some kind of cmorization)
These access restrictions are also why the ESMValTool developers cannot distribute copies or automate downloading of all observations and reanalysis data used in the recipes. As a compromise, we provide the CMORization scripts so that each user can CMORize their own copy of the access restricted datasets if needed.
Run the existing CMORizer script
Before we develop our own CMORizer script, let’s first see what happens when we run the existing one. There is a specific command available in the ESMValTool to run the CMORizer scripts:
esmvaltool data format --config_file <path to config-user.yml> <dataset-name>
The config-user.yml
is the file in which we define the different data
paths, see the episode on Configuration.
In the rootpath
of your config-user.yml
, make sure to add the right
directory for “RAWOBS” data in which you downloaded the FLUXCOM dataset:
rootpath:
RAWOBS: ~/data/RAWOBS
This enables ESMValTool to find the raw observational datasets stored in the
“RAWOBS” folder. The dataset-name
needs to be identical to the folder
name that was created to store the raw observation data files, i.e.
RAWOBS/TierX/dataset-name
. In our case this would be “FLUXCOM”.
If everything is okay, the output should look something like this:
...
... Starting the CMORization Tool at time: 2022-07-26 14:02:16 UTC
... ----------------------------------------------------------------------
... input_dir = /home/peter/data/RAWOBS
... output_dir = /home/peter/esmvaltool_output/data_formatting_20220726_140216
... ----------------------------------------------------------------------
... Running the CMORization scripts.
... Processing datasets ['FLUXCOM']
... Input data from: /home/peter/data/RAWOBS/Tier3/FLUXCOM
... Output will be written to: /home/peter/esmvaltool_output/
data_formatting_20220726_140216/Tier3/FLUXCOM
... Reformat script: /home/peter/mambaforge/envs/esmvaltool/lib/python3.9/
site-packages/esmvaltool/cmorizers/data/formatters/datasets/fluxcom
... CMORizing dataset FLUXCOM using Python script /home/peter/mambaforge/envs/
esmvaltool/lib/python3.9/site-packages/esmvaltool/cmorizers/data/formatters/
datasets/fluxcom.py
... Found input file '/home/peter/data/RAWOBS/Tier3/FLUXCOM/GPP.ANN.CRUNCEPv6.monthly.*.nc'
... CMORizing variable 'gpp'
... Lmon
... Var is gpp
... ... UserWarning: Ignoring netCDF variable 'GPP' invalid units 'gC m-2 day-1'
... Fixing time...
... Fixing latitude...
... Fixing longitude...
... Flipping dimensional coordinate latitude...
... Saving file
... Saving: /home/peter/esmvaltool_output/data_formatting_20220726_140216/Tier3/
FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
... Cube has lazy data [lazy is preferred]
... CMORization of dataset FLUXCOM finished!
... Formatting successful for dataset FLUXCOM
So you can see that several fixes are applied, and the CMORized file is written
to the ESMValTool output directory, i.e.
~/esmvaltool_output/data_formatting_YYYYMMDD_HHMMSS/TierX/dataset-name/filename.nc
In order to use it, we’ll have to copy it from the output directory to a folder
called ~/data/OBS/Tier3/FLUXCOM
and make sure the path to OBS
is set
correctly in our config-user file:
rootpath:
OBS: ~/data/OBS
You can also see the path where ESMValTool stores the reformatting script:
~/ESMValTool/esmvaltool/data/formatters/datasets/fluxcom.py
. You may
have a look at this file if you want. The script also uses a configuration file:
~/ESMValTool/esmvaltool/cmorizers/data/cmor_config/FLUXCOM.yml
.
Make a test recipe
To verify that the data is correctly CMORized, we will make a simple test recipe. As illustrated in the figure at the top of this episode, one of the steps that ESMValTool executes is a CMOR-check. If the data is not correctly CMORized, ESMValTool will give a warning or error.
Create a test recipe
Create a simple recipe called recipe_check_fluxcom.yml that loads the FLUXCOM data. It should include a datasets section with a single entry for the “FLUXCOM” dataset with the correct dataset keys, and a diagnostics section with two variables: gpp. We don’t need any preprocessors or scripts (set
scripts: null
), but we have to add a documentation section with a description, authors and maintainer, otherwise the recipe will fail.Use the following dataset keys:
- project: OBS
- dataset: FLUXCOM
- type: reanaly
- version: ANN-v1
- mip: Lmon
- start_year: 2000
- end_year: 2000
- tier: 3
Some of these dataset keys are further explained in the callout boxes in this episode.
Answer
Here’s an example recipe
documentation: description: Test recipe for FLUXCOM data title: This is a test recipe for the FLUXCOM data. authors: - kalverla_peter maintainer: - kalverla_peter datasets: - {project: OBS, dataset: FLUXCOM, mip: Lmon, tier: 3, start_year: 2000, end_year: 2000, type: reanaly, version: ANN-v1} diagnostics: check_fluxcom: description: Check that ESMValTool can load the cmorized fluxnet data without errors. variables: gpp: scripts: null
To learn more about writing a recipe, please refer to Writing your own recipe.
Try to run the example recipe with
esmvaltool run recipe_check_fluxcom.yml --config_file <path to config-user.yml> --log_level debug
If everything is okay, the recipe should run without problems.
Starting from scratch
Now that you’ve seen how to use an existing CMORizer script, let’s think about adding a new one. We will remove the existing CMORizer script, and re-implement it from scratch. This exercise allows us to point out all the details of what’s going on. We’ll also remove the CMORized data that we’ve just created, so our test recipe will not be able to use it anymore.
rm ~/data/OBS/Tier3/FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
rm ~/ESMValTool/esmvaltool/cmorizers/data/formatters/datasets/fluxcom.py
rm ~/ESMValTool/esmvaltool/cmorizers/data/cmor_config/FLUXCOM.yml
If you now run the test recipe again it should fail, and somewhere in the output you should find something like:
No input files found for ...
Looked for files matching: /home/peter/data/OBS/Tier3/
FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp[_.]*nc
From this we can see that the first thing our CMORizer should do is to rename the file so that it follows the CMOR filename conventions.
Create a new CMORizer script and a corresponding config file
The first step now is to create a new file in the right folder that will contain
our new CMORizer instructions. Create a file called fluxcom.py
nano ~/ESMValTool/esmvaltool/cmorizers/data/formatters/datasets/fluxcom.py
and fill it with the following boilerplate code:
"""ESMValTool CMORizer for FLUXCOM GPP data.
<We will add some useful info here later>
"""
import logging
from esmvaltool.cmorizers.data import utilities as utils
logger = logging.getLogger(__name__)
def cmorization(in_dir, out_dir, cfg, cfg_user, start_date, end_date):
"""Cmorize the dataset."""
# This is where you'll add the cmorization code
# 1. find the input data
# 2. apply the necessary fixes
# 3. store the data with the correct filename
Here, in_dir
corresponds to the input directory of the raw files,
out_dir
to the output directory of final reformatted data set and cfg
to
a configuration dictionary given by a configuration file that we will get to
shortly. The last three arguments will not be considered in this script but
can be used in other cases. cfg_user
corresponds to the user configuration
file, start_date
to the start of the period to format, and end_date
to
the end of the period to format. When you type the command esmvaltool data format
in the terminal, ESMValTool will call this function with the settings found in
your configuration files.
The ESMValTool CMORizer also needs a dataset configuration file. Create a file
called ~/ESMValTool/esmvaltool/cmorizers/data/cmor_config/FLUXCOM.yml
and fill it with the following boilerplate:
---
# filename: ???
attributes:
project_id: OBS6
# dataset_id: ???
# version: ???
# tier: ???
# modeling_realm: ???
# source: ???
# reference: ???
# comment: ???
# variables:
# ???:
# mip: ???
Note: the name of this file must be identical to dataset-name
.
As you can see, the configuration file contains information about the original filename of the dataset, and some additional metadata that you might recognize from the CMOR filename structure. It also contains a list of variables that’s available for this dataset. We’ll add this information step by step in the following sections.
RAWOBS, OBS, OBS6!?
In the configuration above we’ve already filled in the
project_id
. ESMValTool uses these project IDs to find the data on your hard drive, and also to find more information about the data. TheRAWOBS
andOBS
projects refer to external data before and after CMORization, respectively. Historically, most external data were observations, hence the naming.In going from CMIP5 to CMIP6, the CMOR standards changed a bit. For example, some variables were renamed, which posed a dilemma: should CMORization reformat to the CMIP5 or CMIP6 definition? To solve this, the
OBS6
project was created. SoOBS6
data follow the CMIP6 standards, and that’s what we’ll use for the new CMORizer.
You can try running the CMORizer at this point, and it should work without errors. However, it doesn’t produce any output yet:
esmvaltool data format --config_file <path to config-user.yml> FLUXCOM
1. Find the input data
First we’ll get the CMORizer script to locate our FLUXCOM data. We can use the
information from the in_dir
and cfg
variables. Add the following snippet to
your CMORizer script:
# 1. find the input data
logger.info("in_dir: '%s'", in_dir)
logger.info("cfg: '%s'", cfg)
If you run the CMORizer again, it will print out the content of these variables and the output should contain something like this:
... in_dir: '/home/peter/data/RAWOBS/Tier3/FLUXCOM'
... cfg: '{'attributes': {'project_id': 'OBS6', 'comment': ''},
'cmor_table': <esmvalcore.cmor.table.CMIP6Info object at 0x7fbd0a0f6bf0>}'
Load the data
Try to locate the input data inside the CMORizer script and load it (we’ll use
iris
because ESMValTool includes helper utilities for iris cubes). Confirm that you’ve loaded the data by logging the correct path and (part of the) file content.Solution
There are many ways to do it. In any case, you should have added the original filename to the configuration file (and un-commented this line):
filename: 'GPP.ANN.CRUNCEPv6.monthly.*.nc'
. Note the*
: this is a useful shorthand to find multiple files for different years. In a similar way we can also look for multiple variables, etc.Here’s an example solution (inserted directly under the original comment):
# 1. find the input data filename_pattern = cfg['filename'] matches = Path(in_dir).glob(filename_pattern) for match in matches: input_file = str(match) logger.info("found: %s", input_file) cube = iris.load_cube(input_file) logger.info("content: %s", cube)
To make this work we’ve added
import iris
andfrom pathlib import Path
at the top of the file. Note that we’ve started a loop, since we may find multiple files if there’s more than one year of data available.
2. Save the data with the correct filename
Before we start adding fixes, we’ll first make sure that our CMORizer can also write output files with the correct name. This will enable us to use the test recipe for the CMOR compatibility check.
We can use the save
function from the utils
that we imported at the top. The
call signature looks like this:
utils.save_variables(cube, var, outdir, attrs, **kwargs)
.
We already have the cube
and the outdir
. The variable short name (var
) and
attributes (attrs
) are set through the configuration file. So we need to find
out what the correct short name and attributes are.
The standard attributes for CMIP variables are defined in the CMIP tables. These tables are differentiated according to the “MIP” they belong to. The tables are a copy of the PCMDI guidelines.
Find the variable “gpp” in a CMOR table
Check the available CMOR tables to find the variable “gpp” with the following characteristics:
- standard_name:
gross_primary_productivity_of_biomass_expressed_as_carbon
- frequency:
mon
- modeling_realm:
land
Answers
The variable “gpp” belongs to the land variables. The temporal resolution that we are looking for is “monthly”. This information points to the “Lmon” CMIP table. And indeed, the variable “gpp” can be found in the file here.
If the variable you are interested in is not available in the standard CMOR tables, you could write a custom CMOR table entry for the variable. This, however, is beyond the scope of this tutorial.
Fill the configuration file
Uncomment the following entries in your configuration file and fill them with appropriate values:
- dataset_id
- version
- tier
- modeling_realm
- short_name (the ??? immediately under
variables
)- mip
Answers
The configuration file now look something like this:
--- filename: 'GPP.ANN.CRUNCEPv6.monthly.*.nc' attributes: project_id: OBS6 dataset_id: FLUXCOM version: 'ANN-v1' tier: 3 modeling_realm: reanaly source: '' reference: '' comment: '' variables: gpp: mip: Lmon
Now that we have set this information correctly in the config file, we can call the save function. Add the following python code to your CMORizer script:
# 3. store the data with the correct filename
attributes = cfg['attributes']
variables = cfg['variables']
for short_name, variable_info in variables.items():
all_attributes = {**attributes, **variable_info} # add the mip to the other attributes
utils.save_variable(cube=cube, var=short_name, outdir=out_dir, attrs=all_attributes)
Since we only have one variable (gpp), the loop is not strictly necessary. However, this makes it possible to add more variables later on.
Was the CMORization successful so far?
If you run the CMORizer again, you should see that it creates an output file named
OBS6_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_xxxx01-xxxx12.nc
stored in your ESMValTool output directory~/esmvaltool_output/data_formatting_YYYYMMDD_HHMMSS/Tier3/FLUXCOM/
. The “xxxx” and “yyyy” represent the start and end year of the data.
Great! So we have produced a NetCDF file with the CMORizer that follows the naming convention for ESMValTool datasets. Let’s have a look at the NetCDF file as it was written with the very basic CMORizer from above.
ncdump -h OBS6_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
netcdf OBS6_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012 {
dimensions:
time = 12 ;
lat = 360 ;
lon = 720 ;
variables:
float GPP(time, lat, lon) ;
GPP:_FillValue = 1.e+20f ;
GPP:long_name = "GPP" ;
double time(time) ;
time:axis = "T" ;
time:units = "days since 1582-10-15 00:00:00" ;
time:standard_name = "time" ;
time:calendar = "gregorian" ;
double lat(lat) ;
double lon(lon) ;
// global attributes:
:_NCProperties = "version=2,netcdf=4.7.4,hdf5=1.10.6" ;
:created_by = "Fabian Gans [fgans@bgc-jena.mpg.de], Ulrich Weber
[uweber@bgc-jena.mpg.de]" ;
:flux = "GPP" ;
:forcing = "CRUNCEPv6" ;
:institution = "MPI-BGC-BGI" ;
:invalid_units = "gC m-2 day-1" ;
:method = "Artificial Neural Networks" ;
:provided_by = "Martin Jung [mjung@bgc-jena.mpg.de] on behalf of FLUXCOM team" ;
:reference = "Jung et al. 2016, Nature; Tramontana et al. 2016, Biogeosciences" ;
:temporal_resolution = "monthly" ;
:title = "GPP based on FLUXCOM RS+METEO with CRUNCEPv6 climate " ;
:version = "v1" ;
:Conventions = "CF-1.7" ;
}
The file contains a variable named “GPP” that contains three dimensions: “time”,
“lat”, “lon”. Notice the strange time units, and the invalid_units
in the
global attributes section. Also it seems that there is not information available
about the lat and lon coordinates. These are just some of the things we’ll
address in the next section.
3. Implementing additional fixes
Copy the output of the CMORizer to your folder ~/data/OBS6/Tier3/FLUXCOM/
and change the test recipe to look for OBS6 data instead of OBS (note: we’re
upgrading the CMORizer to newer standards here!). Make sure the path to OBS6
is set correctly in our config-user file:
rootpath:
OBS6: ~/data/OBS6
If we now run the test recipe on our newly ‘CMORized’ data,
esmvaltool run recipe_check_fluxcom.yml --config_file <path to config-user.yml> --log_level debug
it should be able to find the correct file, but it does not succeed yet. The first thing that the ESMValTool CMOR checker brings up is:
iris.exceptions.UnitConversionError: Cannot convert from unknown units. The
"units" attribute may be set directly.
If you look closely at the error messages, you can see that this error concerns the units of the coordinates. ESMValTool tries to fix them automatically, but since no units are defined on the coordinates, this fails.
The cmorizer utilities also include a function called fix_coords
, but before
we can use it, we’ll also need to make sure the coordinates have the correct
standard name. Add the following code to your cmorizer:
# 2. Apply the necessary fixes
# 2a. Fix/add coordinate information and metadata
cube.coord('lat').standard_name = 'latitude'
cube.coord('lon').standard_name = 'longitude'
utils.fix_coords(cube)
With some additional refactoring, our cmorization function might then look something like this:
def cmorization(in_dir, out_dir, cfg, cfg_user, start_date, end_date):
"""Cmorize the dataset."""
# Get general information from the config file
attributes = cfg['attributes']
variables = cfg['variables']
for short_name, variable_info in variables.items():
logger.info("CMORizing variable: %s", short_name)
# 1a. Find the input data (one file for each year)
filename_pattern = cfg['filename']
matches = Path(in_dir).glob(filename_pattern)
for match in matches:
# 1b. Load the input data
input_file = str(match)
logger.info("found: %s", input_file)
cube = iris.load_cube(input_file)
# 2. Apply the necessary fixes
# 2a. Fix/add coordinate information and metadata
cube.coord('lat').standard_name = 'latitude'
cube.coord('lon').standard_name = 'longitude'
utils.fix_coords(cube)
# 3. Save the CMORized data
all_attributes = {**attributes, **variable_info}
utils.save_variable(cube=cube, var=short_name, outdir=out_dir, attrs=all_attributes)
Run the CMORizer script once more. Have a look at the netCDF file, and confirm that the coordinates now have much more metadata added to them. Then, run the test recipe again with the latest CMORizer output. The next error is:
esmvalcore.cmor.check.CMORCheckError: There were errors in variable GPP:
Variable GPP units unknown can not be converted to kg m-2 s-1 in cube:
Okay, so let’s fix the units of the “GPP” variable in the CMORizer. Remember that you can find the correct units in the CMOR table. Add the following three lines to our CMORizer:
# 2b. Fix gpp units
logger.info("Changing units for gpp from gc/m2/day to kg/m2/s")
cube.data = cube.core_data() / (1000 * 86400)
cube.units = 'kg m-2 s-1'
If everything is okay, the test recipe should now pass. We’re getting there. Looking through the output though, there’s still a warning.
WARNING There were warnings in variable GPP:
Standard name for GPP changed from None to gross_primary_productivity_of_biomass_expressed_as_carbon
Long name for GPP changed from GPP to Carbon Mass Flux out of Atmosphere Due to
Gross Primary Production on Land [kgC m-2 s-1]
ESMValTool is able to apply automatic fixes here, but if we are running a CMORizer script anyway, we might as well fix it immediately.
Add the following snippet:
# 2c. Fix metadata
cmor_table = cfg['cmor_table']
cmor_info = cmor_table.get_variable(variable_info['mip'], short_name)
utils.fix_var_metadata(cube, cmor_info)
You can see that we’re using the CMOR table here. This was passed on by
ESMValTool as part of the CFG
input variable. So here we’re making sure that
we’re updating the cubes metadata to conform to the CMOR table.
Finally, the test recipe should run without errors or warnings.
4. Finalizing the CMORizer
Once everything works as expected, there’s a couple of things that we can still do.
- Add download instructions. The header of the CMORizer contains information about where to obtain the data, when it was accessed the last time, which ESMValTool “tier” it is associated with, and more detailed information about the necessary downloading and processing steps.
Fill out the header for the “FLUXCOM” dataset
Fill out the header of the new CMORizer. The different parts that need to be present in the header are the following:
- Caption: the first line of the docstring should summarize what the script does.
- Tier
- Source
- Last access
- Download and processing instructions
Answers
The header for the “FLUXCOM” dataset could look something like this:
"""ESMValTool CMORizer for FLUXCOM GPP data. Tier Tier 3: restricted dataset. Source http://www.bgc-jena.mpg.de/geodb/BGI/Home Last access 20190727 Download and processing instructions From the website, select FLUXCOM as the data choice and click download. Two files will be displayed. One for Land Carbon Fluxes and one for Land Energy fluxes. The Land Carbon Flux file (RS + METEO) using CRUNCEP data file has several data files for different variables. The data for GPP generated using the Artificial Neural Network Method will be in files with name: GPP.ANN.CRUNCEPv6.monthly.\*.nc A registration is required for downloading the data. Users in the UK with a CEDA-JASMIN account may request access to the jules workspace and access the data. Note : This data may require rechunking of the netcdf files. This constraint will not exist once iris is updated to version 2.3.0 Aug 2019 """
- Fill the dataset information list. The file datasets.yml contains the ESMValTool “tier”, the data source, the last access time and download instructions for all supported datasets in ESMValTool. You can simply reuse the information written in the header of the CMORizer.
Fill out the FLUXCOM entry in
datasets.yml
Fill out the FLUXCOM entry in
datasets.yml
. The different parts that need to be present in the entry are the following:
- Dataset-name
- Tier
- Source
- Last access
- Download and processing instructions
Answers
The entry for the “FLUXCOM” dataset should look like:
FLUXCOM: tier: 3 source: http://www.bgc-jena.mpg.de/geodb/BGI/Home last_access: 2019-07-27 info: | From the website, select FLUXCOM as the data choice and click download. Two files will be displayed. One for Land Carbon Fluxes and one for Land Energy fluxes. The Land Carbon Flux file (RS + METEO) using CRUNCEP data file has several data files for different variables. The data for GPP generated using the Artificial Neural Network Method will be in files with name: GPP.ANN.CRUNCEPv6.monthly.*.nc A registration is required for downloading the data. Users in the UK with a CEDA-JASMIN account may request access to the jules workspace and access the data. Note : This data may require rechunking of the netcdf files. This constraint will not exist once iris is updated to version 2.3.0 Aug 2019
Once the datasets.yml
file is filled, you can check that ESMValTool can
display information about the added dataset with:
esmvaltool data info FLUXCOM
If everything is okay, the output should look something like this:
$ esmvaltool data info FLUXCOM
FLUXCOM
Tier: 3
Source: http://www.bgc-jena.mpg.de/geodb/BGI/Home
Automatic download: No
From the website, select FLUXCOM as the data choice and click download.
Two files will be displayed. One for Land Carbon Fluxes and one for
Land Energy fluxes. The Land Carbon Flux file (RS + METEO) using
CRUNCEP data file has several data files for different variables.
The data for GPP generated using the
Artificial Neural Network Method will be in files with name:
GPP.ANN.CRUNCEPv6.monthly.*.nc
A registration is required for downloading the data.
Users in the UK with a CEDA-JASMIN account may request access to the jules
workspace and access the data.
Note : This data may require rechunking of the netcdf files.
This constraint will not exist once iris is updated to
version 2.3.0 Aug 2019
Note that Automatic download: No
means that no automatic downloading script
is available in ESMValTool for this dataset. The implementation of such a
script is beyond the scope of this tutorial. To find out which datasets come
with an automatic download script, you can run: esmvaltool data list
to
list all datasets supported in ESMValTool. More information about the usage
of automatic downloading scripts can be found in the
User Guide.
- Complete the metadata in the config file. We have left a few fields empty in the configuration file, such as ‘source’. By filling out these fields we can make sure the relevant metadata is passed on as attributes in the CMORized data. To make this work, add the following line to the CMORizer script:
# 2d. Update the cubes metadata with all info from the config file
utils.set_global_atts(cube, attributes)
-
Add a reference. Make sure that there is a reference file available for the dataset, see the instruction here.
-
Make a pull request. Since you have gone through all the trouble to reformat the dataset so that the ESMValTool can work with it, it would be great if you could provide the CMORizer, and ultimately with that the dataset, to the rest of the community. For more information, see the episode on Development and contribution.
-
Add documentation. Make sure that you have added the info of your dataset to the User Guide so that people know it is available for the ESMValTool Obtaining input data.
Some final comments
Congratulations! You have just added support for a new dataset to ESMValTool! Adding a new CMORizer is definitely already an advanced task when working with the ESMValTool. You need to have a basic understanding of how the ESMValTool works and how it’s internal structure looks like. In addition, you need to have a basic understanding of NetCDF files and a programming language. In our example we used python for the CMORizing script since we advocate for focusing the code development on only a few different programming languages. This helps to maintain the code and to ensure the compatibility of the code with possible fundamental changes to the structure of the ESMValTool and ESMValCore.
More information about adding observations to the ESMValTool can be found in the documentation.
Key Points
CMORizers are dataset-specific scripts that can be run once to generate CMOR-compliant data.
ESMValTool comes with a set of CMORizers readily available, but you can also add your own.
Debugging
Overview
Teaching: 30 min
Exercises: 15 min
Compatibility:Questions
How can I handle errors/warnings?
Objectives
Fix a broken recipe
Every user encounters errors. Once you know why you get certain types of errors, they become much easier to fix. The good news is that ESMValTool creates a record of the output messages and stores them in log files. They can be used for debugging or monitoring the process. This lesson helps you understand the different types of errors and when you are likely to encounter them.
Log files
Each time we run ESMValTool, it will produce a new output directory. This
directory should contain the run
folder that is automatically generated by
ESMValTool. To examine this, we run a recipe_python.yml
that can be found in
lesson Running your first recipe. Check lesson Configuration on how to set paths.
In a new terminal, run the recipe:
cd esmvaltool_tutorial
esmvaltool run examples/recipe_python.yml
esmvaltool: command not found
ESMValTool encounters this error because the conda environment esmvaltool
has not been activated. To fix the error, before running the recipe, activate
the environment:
conda activate esmvaltool
conda environment
More information about the conda environment can be found at Installation.
Let’s list the files in the run
directory:
ls esmvaltool_output/recipe_python_#_#/run
main_log_debug.txt main_log.txt map recipe_python.yml resource_usage.txt
timeseries
In the main_log_debug.txt
and main_log.txt
, ESMValTool writes the output
messages, warnings and possible errors that might happen during pre-processings.
To inspect them, we can look inside the files. For example:
cat esmvaltool_output/recipe_python_#_#/run/main_log.txt
Now, let’s have a look inside the folder timeseries/script1
:
ls esmvaltool_output/recipe_python_#_#/run/timeseries/script1/
diagnostic_provenance.yml log.txt resource_usage.txt settings.yml
In the log.txt
, ESMValTool writes the output messages,
warnings and possible errors that are related to the diagnostic script.
If you encounter an error and don’t know what it means, it is important to read the log information. Sometimes knowing where the error occurred is enough to fix it, even if you don’t entirely understand the message. However, note that you may not always be able to find the error or fix it. In that case, ESMValTool community helps you figure out what went wrong.
Different log files
In the
run
directory, there are two log filesmain_log_debug.txt
andmain_log.txt
. What are their differences?Solution
The
main_log_debug.txt
contains the output messages from the pre-processor whereas themain_log.txt
shows general errors and warnings that might happen in running the recipe and diagnostics script.
Let’s change some settings in the recipe to run a regional pre-processor.
We use a text editor called nano
to open the recipe file:
nano recipe_python.yml
Text editor side note
No matter what editor you use, you will need to know where it searches for and saves files. If you start it from the shell, it will (probably) use your current working directory as its default location. We use
nano
in examples here because it is one of the least complex text editors. Press ctrl + O to save the file, and then ctrl + X to exitnano
.
See the recipe_python.yml
# ESMValTool # recipe_python.yml --- # See https://docs.esmvaltool.org/en/latest/recipes/recipe_examples.html # for a description of this recipe. # See https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/overview.html # for a description of the recipe format. --- documentation: description: | Example recipe that plots a map and timeseries of temperature. title: Recipe that runs an example diagnostic written in Python. authors: - andela_bouwe - righi_mattia maintainer: - schlund_manuel references: - acknow_project projects: - esmval - c3s-magic datasets: - {dataset: BCC-ESM1, project: CMIP6, exp: historical, ensemble: r1i1p1f1, grid: gn} - {dataset: bcc-csm1-1, version: v1, project: CMIP5, exp: historical, ensemble: r1i1p1} preprocessors: # See https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/preprocessor.html # for a description of the preprocessor functions. to_degrees_c: convert_units: units: degrees_C annual_mean_amsterdam: extract_location: location: Amsterdam scheme: linear annual_statistics: operator: mean multi_model_statistics: statistics: - mean span: overlap convert_units: units: degrees_C annual_mean_global: area_statistics: operator: mean annual_statistics: operator: mean convert_units: units: degrees_C diagnostics: map: description: Global map of temperature in January 2000. themes: - phys realms: - atmos variables: tas: mip: Amon preprocessor: to_degrees_c timerange: 2000/P1M caption: | Global map of {long_name} in January 2000 according to {dataset}. scripts: script1: script: examples/diagnostic.py quickplot: plot_type: pcolormesh cmap: Reds timeseries: description: Annual mean temperature in Amsterdam and global mean since 1850. themes: - phys realms: - atmos variables: tas_amsterdam: short_name: tas mip: Amon preprocessor: annual_mean_amsterdam timerange: 1850/2000 caption: Annual mean {long_name} in Amsterdam according to {dataset}. tas_global: short_name: tas mip: Amon preprocessor: annual_mean_global timerange: 1850/2000 caption: Annual global mean {long_name} according to {dataset}. scripts: script1: script: examples/diagnostic.py quickplot: plot_type: plot
Keys and values in recipe settings
The ESMValTool
pre-processors
cover a broad range of operations on the input data, like time manipulation,
area manipulation, land-sea masking, variable derivation, etc. Let’s add the
preprocessor extract_region
to a new section annual_mean_regional
:
preprocessors:
annual_mean_regional:
annual_statistics:
operator: mean
extract_region:
start_longitude: -10
end_longitude: 40
start_latitude: 27
end_latitude: 70
Also, we change the projects
value esmval
to tutorial
:
projects:
- tutorial
- c3s-magic
Then, we save the file and run the recipe:
esmvaltool run recipe_python.yml
ValueError: Tag 'tutorial' does not exist in section 'projects' of
esmvaltool/config-references.yml 2020-06-29 18:09:56,641 UTC [46055] INFO If you
have a question or need help, please start a new discussion on
https://github.com/ESMValGroup/ESMValTool/discussions If you suspect this is a
bug, please open an issue on https://github.com/ESMValGroup/ESMValTool/issues To
make it easier to find out what the problem is, please consider attaching the
files run/recipe_*.yml and run/main_log_debug.txt from the output directory.
The values for the keys author
, maintainer
, projects
and
references
in the recipe should be known by ESMValTool:
- A list of ESMValTool author, maintainer, and projects can be found in the config-references.yml.
- ESMValTool references in
BibTeX
format can be found in the ESMValTool/esmvaltool/references directory.
ESMValTool can’t locate the data
You are assisting a colleague with ESMValTool. The colleague replaces the
CanESM2
entry indataset: CanESM2, project: CMIP5
toACCESS1-3
and runs the recipe. However, ESMValTool encounters an error like:ERROR No input files found for variable {'short_name': 'tas', 'mip': 'Amon', 'preprocessor': 'annual_mean_amsterdam', 'variable_group': 'tas_amsterdam', 'diagnostic': 'timeseries', 'dataset': 'ACCESS1-3', 'project': 'CMIP5', 'exp': 'historical', 'ensemble': 'r1i1p1', 'recipe_dataset_index': 1, 'institute': ['CSIRO-BOM'], 'product': ['output1', 'output2'], 'timerange': '1850/2000', 'alias': 'CMIP5', 'original_short_name': 'tas', 'standard_name': 'air_temperature', 'long_name': 'Near-Surface Air Temperature', 'units': 'K', 'modeling_realm': ['atmos'], 'frequency': 'mon', 'start_year': 1850, 'end_year': 2000} ERROR Looked for files matching ['tas_Amon_ACCESS1-3_historical_r1i1p1*.nc'], but did not find any existing input directory ERROR Set 'log_level' to 'debug' to get more information
What suggestions would you give the researcher for fixing the error?
Solution
- Check
user-config.yml
to see if the correct directory for input data is introduced- Check the available data, regarding exp, mip, ensemble, start_year, and end_year
- Check the variable names in the diagnostics section in the recipe
Check pre-processed data
The setting save_intermediary_cubes
in the configuration file can be used to
save the pre-processed data. More information about this setting can be found at
Configuration.
save_intermediary_cubes
Note that this setting should only be used for debugging, as it significantly slows down the recipe and increases disk usage because a lot of output files need to be stored.
Check diagnostic script path
The result of the pre-processor is passed to the examples/diagnostic.py
script, that is introduced in the recipe as:
scripts:
script1:
script:
script: examples/diagnostic.py
The diagnostic scripts are located in the folder diag_scripts
in the
ESMValTool installation directory <path_to_esmvaltool>
. To find
where ESMValTool is located on your system, see Installation.
Let’s see what happens if we can change the script path as:
scripts:
script1:
script:
script: diag_scripts/ocean/diagnostic_timeseries.py
esmvaltool run examples/recipe_python.yml
esmvalcore._task.DiagnosticError: Cannot execute script
'diag_scripts/ocean/diagnostic_timeseries.py'
(~/mambaforge/envs/esmvaltool2.6/lib/python3.10/site-packages/esmvaltool/
diag_scripts/diag_scripts/ocean/diagnostic_timeseries.py):
file does not exist. 2022-10-18 11:42:34,136 UTC [39323] INFO If you have a
question or need help, please start a new discussion on
https://github.com/ESMValGroup/ESMValTool/discussions If you suspect this is a
bug, please open an issue on https://github.com/ESMValGroup/ESMValTool/issues To
make it easier to find out what the problem is, please consider attaching the
files run/recipe_*.yml and run/main_log_debug.txt from the output directory.
The script path should be relative to diag_scripts
directory. It means that
the script diagnostic_timeseries.py
is located in
<path_to_esmvaltool>/diag_scripts/ocean/
.
Alternatively, the script path can be an absolute path. To examine this, we can
download the script from the ESMValTool
repository:
wget https://raw.githubusercontent.com/ESMValGroup/ESMValTool/main/esmvaltool/\
diag_scripts/ocean/diagnostic_timeseries.py
One way to get the absolute path is to run:
readlink -f diagnostic_timeseries.py
Then we can update the script path :
scripts:
script1:
script:
script: <path_to_script>/diagnostic_timeseries.py
Then, run the recipe again and examine the output to see Run was successful
!
Available recipe and diagnostic scripts
ESMValTool provides a broad suite of recipes and diagnostic scripts for different disciplines like atmosphere, climate metrics, future projections, IPCC, land, ocean, ….
Re-running a diagnostic
Look at the
main_log.txt
file and answer the following question: How to re-run the diagnostic script?Solution
The
main_log.txt
file contains information on how to re-run the diagnostic script without re-running the pre-processors:2020-06-29 20:36:32,844 UTC [52810] INFO To re-run this diagnostic script, run:
If you run the command in a terminal, you will be able to re-run the diagnostic.
Memory issues
If you run out of memory, try setting
max_parallel_tasks
to 1 in the configuration file. Then, check the amount of memory you need for that by inspecting the filerun/resource_usage.txt
in the output directory. Using the number, there you can increase the number of parallel tasks again to a reasonable number for the amount of memory available in your system.
Key Points
There are three different kinds of log files:
main_log.txt
, andmain_log_debug.txt
andlog.txt
.