EarthCARE: Access and Visualize

Authors: Harshini Girish (UAH), Rajat Shinde (UAH), Alex Mandel (Development Seed), Saskia Brose (ESA)

Date: October 15, 2025

Description: This notebook shows a fast EarthCARE workflow on ESA MAAP: connect to the STAC catalogue, filter by product (e.g., MSI/CPR) and AOI/time, and list matching items. We preview each granule via quicklook images to sanity-check coverage without downloading data.Then we pick the HDF5 asset (h5_url), authenticate with an EO Sign-In token, and stream it over HTTPS using fsspec + xarray.Finally, we visualize key variables directly from the streamed dataset, enabling quick QA and analysis with no bulk downloads.

About the Data

EarthCARE

EarthCARE (ESA/JAXA, launched 2024) studies how clouds and aerosols interact with radiation to improve weather and climate models. It flies four co-aligned instruments: ATLID (lidar), CPR (Cloud Profiling Radar), MSI (Multi-Spectral Imager), and BBR (BroadBand Radiometer). Together they provide global coverage with vertical profiles, cloud/aerosol properties, and top-of-atmosphere radiative fluxes from a sun-synchronous orbit.

MSI COP 2A

MSI COP 2A is an MSI Level-2A product that retrieves cloud optical thickness, effective radius/particle size, and derived fields such as cloud water path, plus ancillary flags/quality layers. Data are organized in along-track frames and published per granule with handy quicklook JPEGs and a main HDF5 science file (often enclosure_1) for analysis. Typical uses: rapid cloud screening, scene classification, and radiative studies—often paired with CPR/ATLID for vertical context and BBR for flux closure.

Source:EarthCARE mission

Run this notebook

To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the Getting started with the MAAP section of our documentation.

Disclaimer: It is highly recommended to run this tutorial within MAAP’s ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors.

Importing Packages

[8]:
import os
import stat
import getpass
from pathlib import Path
from pystac_client import Client
import fsspec
import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Image, display
from tqdm import tqdm
import requests

Connect to ESA MAAP STAC

Opens the ESA MAAP catalogue endpoint and creates a client handle.

[9]:
catalog_url = 'https://catalog.maap.eo.esa.int/catalogue/'
catalog = Client.open(catalog_url)

Inspect matched item

Selects the first item from the filtered results and prints its ID so we know exactly which granule we’re working with. Then lists all asset keys on that item to choose the right file for preview or download.

[12]:
# Access the first item only
item = items[0]

print(f"Item 0 — ID: {item.id if hasattr(item, 'id') else item.get('id')}")

# If item is a pystac.Item
try:
    assets = item.assets
except AttributeError:
    # If item is a dict
    assets = item.get("assets", {})

if assets:
    print("  Available asset keys:")
    for key in assets.keys():
        print("   -", key)
else:
    print("  No assets found for this item.")
Item 0 — ID: ECA_EXBA_MSI_COP_2A_20250305T134540Z_20250912T202910Z_04366E
  Available asset keys:
   - thumbnail
   - product
   - metadata_ogc_10_157r4
   - enclosure_h5
   - metadata_ogc_17_003r2
   - metadata_iso_19139
   - enclosure_hdr
   - quicklook

Build asset table

Converts the STAC ItemCollection to a dict and flattens features with pd.json_normalize, selecting just IDs, product type, timestamps, and key asset HREFs. Renames columns to readable labels and sorts by id for easy scanning.

[13]:
# Using Pandas dataframes for ease of use.
data = search.item_collection_as_dict()

df = pd.json_normalize(data, record_path=['features'])[
    [
        "id",
        "properties.product:type",
        "properties.updated",
        "assets.product.href",
        "assets.quicklook.href",
        "assets.enclosure_h5.href",
        "assets.enclosure_hdr.href",
    ]
]

# Renaming the assets for
df.rename(columns={
    'properties.product:type': 'product_type',
    'properties.updated': 'last_modified',
    'assets.product.href': 'Zipped Product',
    'assets.quicklook.href': 'quicklook_url',
    'assets.enclosure_h5.href': 'h5_url',
    'assets.enclosure_hdr.href': 'HDR_url',
}, inplace=True)

df.sort_values(by='id', ascending=True, inplace=True)
print(f"Asset table: {len(df):,} rows × {len(df.columns)} columns")
display(df.head(10))  # increase/decrease as you like
Asset table: 5 rows × 7 columns
id product_type last_modified Zipped Product quicklook_url h5_url HDR_url
0 ECA_EXBA_MSI_COP_2A_20250305T134540Z_20250912T... MSI_COP_2A 2025-10-13T15:53:59Z https://catalog.maap.eo.esa.int/data/zipper/ea... https://catalog.maap.eo.esa.int/data/earthcare... https://catalog.maap.eo.esa.int/data/earthcare... https://catalog.maap.eo.esa.int/data/earthcare...
2 ECA_EXBA_MSI_COP_2A_20250316T133058Z_20250912T... MSI_COP_2A 2025-10-13T16:01:17Z https://catalog.maap.eo.esa.int/data/zipper/ea... https://catalog.maap.eo.esa.int/data/earthcare... https://catalog.maap.eo.esa.int/data/earthcare... https://catalog.maap.eo.esa.int/data/earthcare...
1 ECA_EXBA_MSI_COP_2A_20250318T131931Z_20250912T... MSI_COP_2A 2025-10-13T16:02:33Z https://catalog.maap.eo.esa.int/data/zipper/ea... https://catalog.maap.eo.esa.int/data/earthcare... https://catalog.maap.eo.esa.int/data/earthcare... https://catalog.maap.eo.esa.int/data/earthcare...
3 ECA_EXBA_MSI_COP_2A_20250330T134519Z_20250912T... MSI_COP_2A 2025-10-13T16:10:24Z https://catalog.maap.eo.esa.int/data/zipper/ea... https://catalog.maap.eo.esa.int/data/earthcare... https://catalog.maap.eo.esa.int/data/earthcare... https://catalog.maap.eo.esa.int/data/earthcare...
4 ECA_EXBA_MSI_COP_2A_20250407T130051Z_20250914T... MSI_COP_2A 2025-10-13T16:15:26Z https://catalog.maap.eo.esa.int/data/zipper/ea... https://catalog.maap.eo.esa.int/data/earthcare... https://catalog.maap.eo.esa.int/data/earthcare... https://catalog.maap.eo.esa.int/data/earthcare...

Note For some JAXA-processed Level-2 products (e.g., AC__CLP_2B, ATL_CLA_2A), the HDF5 product and its header are swapped between enclosure_1 and enclosure_2, or enclosure_h5 and enclosure_hdr. The table below normalizes this by detecting file endings so that h5_url always points to the .h5 file and HDR_url to the .HDR, regardless of which enclosure contains them.

Choose an item and plot

Select a row from the asset table (by index) and grab its quicklook_url so you can visually verify the granule without any heavy download. Rendering the JPEG inline provides a fast QA check (correct product, time window, and coverage) before streaming the .h5 data.

[14]:
# Choose the file you want to view/stream/download

fileno = 2 # Adjust this as desired
ql_url = df.loc[fileno, "quicklook_url"]

display(Image(url= ql_url))

Note: Not all EarthCARE products include a quicklook. When absent, we can use the thumbnail instead—it’s the same image at a lower resolution.

Secure HDF5 access via fsspec

Getting the Token from the ESA MAAP portal

This explains how to retrieve a long-term access token from the ESA MAAP portal using your browser and NASA EDL login.

Prerequisites

  • An active ESA MAAP portal account with access initialized.

  • OAuth2 client credentials for your ESA realm.

  • Python packages: requests, rasterio, numpy, matplotlib (optional: pystac-client, stackstac).

Open the token page in your browser: https://portal.maap.eo.esa.int/ini/services/auth/token/

Steps

  1. Navigate to the URL above.

  2. Choose NASA Earthdata Login (EDL) when prompted and authorize access.

  3. After successful authorization you will see a token page showing your long‑lived access token string. Copy this token.

  4. Create a credentials.txt file in the user’s home directory. If the file already exists, do not create a new one and proceed directly to the next cell. Verify that the existing token is still valid, if it is not, replace it with a new token.

  5. Edit and copy the following snippet to the credentials.txt file.

CLIENT_ID=offline-token

CLIENT_SECRET=p1eL7uonXs6MDxtGbgKdPVRAmnGxHpVE

OFFLINE_TOKEN=your_esamaap_offline_token_here
  1. Run the following cell to retrieve the short-term access token.

Notes

  • Treat tokens as secrets. Do not commit them to version control or share publicly.

[16]:
# --- Path to credentials.txt ---
CREDENTIALS_FILE = (Path.home() / "credentials.txt").resolve()   # Insert the .txt path

def load_credentials(file_path=CREDENTIALS_FILE):
    """Read key-value pairs from a credentials file into a dictionary."""
    creds = {}
    if not file_path.exists():
        raise FileNotFoundError(f"Credentials file not found: {file_path}")
    with open(file_path, "r") as f:
        for line in f:
            line = line.strip()
            if not line or line.startswith("#"):
                continue
            if "=" not in line:
                continue
            key, value = line.split("=", 1)
            creds[key.strip()] = value.strip()
    return creds


# --- ESA MAAP API ---

def get_token():
    """Use OFFLINE_TOKEN to fetch a short-lived access token."""
    creds = load_credentials()

    OFFLINE_TOKEN = creds.get("OFFLINE_TOKEN")
    CLIENT_ID = creds.get("CLIENT_ID")
    CLIENT_SECRET = creds.get("CLIENT_SECRET")

    if not all([OFFLINE_TOKEN, CLIENT_ID, CLIENT_SECRET]):
        raise ValueError("Missing OFFLINE_TOKEN, CLIENT_ID, or CLIENT_SECRET in credentials file")

    url = "https://iam.maap.eo.esa.int/realms/esa-maap/protocol/openid-connect/token"
    data = {
        "client_id": CLIENT_ID,
        "client_secret": CLIENT_SECRET,
        "grant_type": "refresh_token",
        "refresh_token": OFFLINE_TOKEN,
        "scope": "offline_access openid"
    }

    response = requests.post(url, data=data)
    response.raise_for_status()

    response_json = response.json()
    access_token = response_json.get('access_token')

    if not access_token:
        raise RuntimeError("Failed to retrieve access token from IAM response")

    return access_token

ESA_TOKEN = get_token()

Select the HDF5 asset URL

Picks the chosen row from the assets table and extracts its h5_url.

[17]:
enclosures = ["enclosure_h5", "enclosure_hdr"]
ds_url = None


try:
    for enc in enclosures:
        asset = getattr(it, "assets", {}).get(enc)
        if asset and getattr(asset, "href", None) and asset.href.lower().endswith(".h5"):
            ds_url = asset.href
            print(f"Found .h5 file in {enc}: {ds_url}")
            break
except NameError:

    pass


if not ds_url:
    if "h5_url" in df.columns:
        cand = df.loc[fileno, "h5_url"]
        if isinstance(cand, str) and cand.lower().endswith(".h5"):
            ds_url = cand
            print(f"Falling back to df['h5_url']: {ds_url}")


if not ds_url:
    raise FileNotFoundError("No .h5 file found via enclosure_1/enclosure_2 or df['h5_url'].")


print("Using HDF5 URL:", ds_url)

Falling back to df['h5_url']: https://catalog.maap.eo.esa.int/data/earthcare-pdgs-01/EarthCARE/MSI_COP_2A/BA/2025/03/16/ECA_EXBA_MSI_COP_2A_20250316T133058Z_20250912T203345Z_04537E/ECA_EXBA_MSI_COP_2A_20250316T133058Z_20250912T203345Z_04537E/ECA_EXBA_MSI_COP_2A_20250316T133058Z_20250912T203345Z_04537E.h5
Using HDF5 URL: https://catalog.maap.eo.esa.int/data/earthcare-pdgs-01/EarthCARE/MSI_COP_2A/BA/2025/03/16/ECA_EXBA_MSI_COP_2A_20250316T133058Z_20250912T203345Z_04537E/ECA_EXBA_MSI_COP_2A_20250316T133058Z_20250912T203345Z_04537E/ECA_EXBA_MSI_COP_2A_20250316T133058Z_20250912T203345Z_04537E.h5

Tune streaming caches

Defines lightweight buffering for remote HDF5 reads: a blockcache in fsspec with 8 MiB blocks and h5py’s raw-data chunk cache set to ~8 MiB. This improves HTTPS streaming and random access while keeping memory use modest; bump the sizes if you see many re-reads.

[18]:
io_params = {
    "fsspec_params": {
        "cache_type": "blockcache",
        "block_size": 8 * 1024 * 1024
    },
    "h5py_params": {
        "driver_kwds": {
            "rdcc_nbytes": 8 * 1024 * 1024
        }
    }
}

Stream HDF5 with token and plot variables

Creates an HTTPS fsspec filesystem with your ESA Bearer token, streams the selected .h5 into an xarray Dataset, and then plots two variables side-by-side. Swap variable names (e.g., cloud_water_path, cloud_top_temperature) to what your file exposes, and drop the group argument if that group isn’t present.

[19]:
fs = fsspec.filesystem(
    "https",
    headers={"Authorization": f"Bearer {ESA_TOKEN}"},
    **io_params["fsspec_params"]  )

# Open the file and read it into an xarray Dataset
with fs.open(ds_url, "rb") as f:
    ds = xr.open_dataset(f,
                         engine="h5netcdf",
                         **io_params["h5py_params"],
                         group="ScienceData")

    # Do something with ds! Here we plot two variables as an example.
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))

    # Plot Cloud Water Path
    ds["cloud_water_path"].plot(ax=axes[0], cmap="Blues")
    axes[0].set_title("Cloud Water Path")

    # Plot Cloud Top Temperature
    ds["cloud_top_temperature"].plot(ax=axes[1], cmap="plasma")
    axes[1].set_title("Cloud Top Temperature")

    plt.tight_layout()
    plt.show()


../../_images/science_EarthCARE_EarthCARE_access_and_visualize_27_0.png