ESA CCI v4 Access and Visualize

Authors: Emile Tenezakis (DevSeed), Rajat Shinde (UAH)

Date: October 2, 2023

Description: In this tutorial, we explore accessing and visualizing ESA CCI Version 4 data from the MAAP STAC catalog. We make use of the stackstac package that allows us to turn a stack collection imported from the catalog with the `pystac_client <https://pystac-client.readthedocs.io/en/stable/>`__ to an xarray dataset, and we plot the time series of the mean aboveground biomass for a selected tile of the dataset across the available temporal range.

Run This Notebook

To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the “Getting started with the MAAP” section of our documentation.

Disclaimer: it is highly recommended to run a tutorial within MAAP’s ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors.

About The Dataset

This dataset comprises estimates of forest Above-Ground Biomass (AGB) for the years 2017, 2018, 2019 and 2020, version 4. They are derived from a combination of Earth Observation (EO) data, depending on the year, from the Copernicus Sentinel-1 mission, Envisat’s ASAR instrument and JAXA’s Advanced Land Observing Satellite (ALOS-1 and ALOS-2), along with additional information from Earth observation sources. The data has been produced as part of the European Space Agency’s (ESA’s) Climate Change Initiative (CCI) programme by the Biomass CCI team.

Additional Resources

Importing and Installing Packages

First off, we will install and import the required Python packages if they are not already installed in the workspace.

[1]:

# !mamba install -y -c conda-forge stackstac
# !pip install pystac_client

[2]:

from stackstac import stack, mosaic
import pystac_client

Accessing and Filtering the Items

After installing the require packages, we create a client to access the STAC test catalog.

[3]:

URL = "https://stac.maap-project.org"
catalog = pystac_client.Client.open(URL)

Creating an AOI

Now, we define a bounding box of interest to find the tile that covers a small region around Manaus, Brazil (Amazon rainforest).

[4]:

# BBox for filtering the items in the collection
bbox = [-55,-6,-54.8,-5.8]

We proceed to an item search in the catalog using the pystac-client, filtering items covering our area of interest.

[5]:

stac_collection = catalog.search(
    collections=["ESACCI_Biomass_L4_AGB_V4_100m"],
    bbox=bbox
)

Let’s take a quick look at the results of the search. We can access the link of the location where the data is stored using href attribute as shown below.

[6]:

stac_collection.get_all_items()[0].assets['estimates'].href

/opt/conda/lib/python3.10/site-packages/pystac_client/item_search.py:850: FutureWarning: get_all_items() is deprecated, use item_collection() instead.
  warnings.warn(

[6]:

's3://nasa-maap-data-store/file-staging/nasa-map/ESACCI_Biomass_L4_AGB_V4_100m_2020/N00W060_ESACCI-BIOMASS-L4-AGB-MERGED-100m-2020-fv4.0.tif'

We turn the resulting set of items (we expect one item here in the result, given the size of our bounding box) into an xarray DataArray using stackstac.

[7]:

#Creating a stack of the filtered items
arr = stack(stac_collection.get_all_items())

/opt/conda/lib/python3.10/site-packages/stackstac/prepare.py:408: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  times = pd.to_datetime(

We can see that our array has the following dimensions : time (we have four data points, one for each year - 2017, 2018, 2019 and 2020), latitude, longitude and band (we have two bands : estimates with the AGB estimate values, and std_dev storing the AGB standard deviation values).

[8]:

arr

[8]:

<xarray.DataArray 'stackstac-5517ee4c53a7b2c13f76bf727c00d1fc' (time: 4,
                                                                band: 2,
                                                                y: 11250,
                                                                x: 11251)>
dask.array<fetch_raster_window, shape=(4, 2, 11250, 11251), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/13)
  * time            (time) datetime64[ns] 2017-01-01 2018-01-01 ... 2020-01-01
    id              (time) <U52 'N00W060_ESACCI-BIOMASS-L4-AGB-MERGED-100m-20...
  * band            (band) <U18 'estimates' 'standard_deviation'
  * x               (x) float64 -60.0 -60.0 -60.0 -60.0 ... -50.0 -50.0 -50.0
  * y               (y) float64 0.0 -0.0008889 -0.001778 ... -9.998 -9.999
    proj:epsg       int64 4326
    ...              ...
    proj:transform  object {0.0, 1.0, -60.0, -0.00088888888888, 0.00088888888...
    proj:shape      object {11250}
    proj:geometry   object {'type': 'Polygon', 'coordinates': [[[-60.0, -9.99...
    title           (band) <U49 'Cloud Optimized GeoTIFF of AGB estimates' 'C...
    description     (band) <U49 'Cloud Optimized GeoTIFF of AGB estimates' 'C...
    epsg            int64 4326
Attributes:
    spec:        RasterSpec(epsg=4326, bounds=(-60.00088888828888, -9.9999999...
    crs:         epsg:4326
    transform:   | 0.00, 0.00,-60.00|\n| 0.00,-0.00, 0.00|\n| 0.00, 0.00, 1.00|
    resolution:  0.00088888888888

Now, we will perform a spatial sub-setting operation to select a subset of the created stack based on the user input for running our experiments.

[9]:

min_x = -55
max_y = -5.8
max_x = -54.8
min_y = -6.0

arr_sub = arr.sel(x=slice(min_x, max_x), y=slice(max_y, min_y))
arr_sub

[9]:

<xarray.DataArray 'stackstac-5517ee4c53a7b2c13f76bf727c00d1fc' (time: 4,
                                                                band: 2,
                                                                y: 225, x: 225)>
dask.array<getitem, shape=(4, 2, 225, 225), dtype=float64, chunksize=(1, 1, 225, 225), chunktype=numpy.ndarray>
Coordinates: (12/13)
  * time            (time) datetime64[ns] 2017-01-01 2018-01-01 ... 2020-01-01
    id              (time) <U52 'N00W060_ESACCI-BIOMASS-L4-AGB-MERGED-100m-20...
  * band            (band) <U18 'estimates' 'standard_deviation'
  * x               (x) float64 -55.0 -55.0 -55.0 -55.0 ... -54.8 -54.8 -54.8
  * y               (y) float64 -5.801 -5.802 -5.803 ... -5.998 -5.999 -6.0
    proj:epsg       int64 4326
    ...              ...
    proj:transform  object {0.0, 1.0, -60.0, -0.00088888888888, 0.00088888888...
    proj:shape      object {11250}
    proj:geometry   object {'type': 'Polygon', 'coordinates': [[[-60.0, -9.99...
    title           (band) <U49 'Cloud Optimized GeoTIFF of AGB estimates' 'C...
    description     (band) <U49 'Cloud Optimized GeoTIFF of AGB estimates' 'C...
    epsg            int64 4326
Attributes:
    spec:        RasterSpec(epsg=4326, bounds=(-60.00088888828888, -9.9999999...
    crs:         epsg:4326
    transform:   | 0.00, 0.00,-60.00|\n| 0.00,-0.00, 0.00|\n| 0.00, 0.00, 1.00|
    resolution:  0.00088888888888

Accessing the Subset for further Processing

As it is evident from the above output that the size of the subset xarray has reduced significantly and it can be used further for our analysis. In this tutorial, we are going to perform following two operations:

Masking some values based on a condition
Plotting Mean time-series for the AGB estimates

Masking Values Based on a Condition

Below, we will explore how to mask few values based on a condition and plot the time series for the filtered array.

[10]:

arr_est = arr_sub.sel(band='estimates')

Now, we are going to filter the values which are less than 10 (and keep values greater than 10) using the DataArray.where function. The .where(<cond>, <True directive>, <False directive>) function checks a condition and performs specific directives if the condition results in True or False. In the resulting xarray, all the values less than 10 are going to be nan (the default value used if the filtering condition is True). We can also replace nan by any other value by mentioning it in the <True directive>.

[11]:

arr_masked_lt = arr_est.where(arr_est >= 10)
arr_masked_lt.values

[11]:

array([[[222., 132., 193., ..., 360., 246., 271.],
        [153., 113., 152., ..., 331., 180., 261.],
        [189., 112., 173., ..., 381., 269., 260.],
        ...,
        [231., 193., 310., ..., 145., 392., 392.],
        [333., 255., 297., ..., 133., 392., 348.],
        [370., 189., 266., ..., 251., 293., 338.]],

       [[219., 100., 138., ..., 363., 271., 198.],
        [188., 114., 101., ..., 345., 198., 254.],
        [142., 144., 179., ..., 325., 306., 372.],
        ...,
        [353., 302., 379., ..., 162., 385., 385.],
        [332., 348., 321., ..., 151., 385., 314.],
        [375., 243., 283., ..., 201., 336., 342.]],

       [[271., 255., 219., ..., 363., 278., 286.],
        [253., 321., 116., ..., 226., 333., 233.],
        [222., 277., 176., ..., 169., 355., 190.],
        ...,
        [304., 336., 275., ..., 318., 308., 234.],
        [340., 234., 348., ..., 315., 356., 330.],
        [243., 323., 264., ..., 288., 307., 318.]],

       [[157., 217., 171., ..., 206., 161., 253.],
        [190., 229., 226., ..., 214., 210., 314.],
        [269., 212., 152., ..., 268., 402., 324.],
        ...,
        [247., 292., 246., ..., 373., 321., 159.],
        [329., 200., 318., ..., 278., 284., 255.],
        [237., 361., 238., ..., 245., 292., 316.]]])

Similarly, we could also show filtering values greater than 10 (and keeping values less than 10) based on above approach and updating the condition to arr_masked_lt = arr_est.where(arr_est <= 10).

Plotting Mean Time-Series for the AGB Estimates

Below we plot the mean time-series of the masked array.

[12]:

arr_masked_lt.mean(['x','y']).plot()

[12]:

[<matplotlib.lines.Line2D at 0x7fd668b54e80>]

../../_images/science_ESA_CCI_ESA_CCI_V4_29_1.png