Searching the STAC Catalog

This tutorial provides a basic introduction to searching the MAAP STAC catalog (https://stac.maap-project.org/) using pystac-client.

Another method of searching the STAC catalog is via the STAC browser.

Drawing

About the STAC Catalog

At this time, the STAC catalog provides discovery of a subset of MAAP datasets. These datasets were selected because MAAP CMR analytics indicated selected datasets were being searched for the most. The data files have not been moved at all in the process of publishing datasets to STAC.

Data will continue to be added to the STAC catalog with priority given to datasets which are known to be in-use by MAAP UWG members through CMR metrics, S3 metrics, direct collaboration with data team members and by request.

Prerequisites

  • pystac-client
  • rioxarray (for opening a raster as an xarray dataset)

Authorship

[1]:
# Uncomment the next line to install pystac-client if you haven't already.
# !pip install pystac-client
[2]:
from pystac_client import Client

STAC Client

We first connect to an API by retrieving the root catalog, or landing page, of the API with the Client.open function.

[3]:
# STAC API root URL
URL = 'https://stac.maap-project.org/'

# custom headers
headers = []

cat = Client.open(URL, headers=headers)
cat
[3]:
<Client id=stac-fastapi>

CollectionClient

As with a static catalog the get_collections function will iterate through the Collections in the Catalog. Notice that because this is an API it can get all the Collections through a single call, rather than having to fetch each one individually.

[4]:
for collection in cat.get_all_collections():
    print(collection)
<CollectionClient id=AfriSAR_UAVSAR_Coreg_SLC>
<CollectionClient id=Landsat8_SurfaceReflectance>
<CollectionClient id=AfriSAR_AGB_Maps_1681>
<CollectionClient id=ABLVIS1B>
<CollectionClient id=GEDI02_B>
<CollectionClient id=AFLVIS2>
<CollectionClient id=BIOSAR1>
<CollectionClient id=GEDI02_A>
<CollectionClient id=AFRISAR_DLR>
[5]:
collection = cat.get_collection('AFRISAR_DLR')
collection
[5]:
<CollectionClient id=AFRISAR_DLR>

STAC Items

The main functions for getting items return iterators, where pystac-client will handle retrieval of additional pages when needed. Note that one request is made for the first ten items, then a second request for the next ten.

[6]:
items = collection.get_items()

# flush stdout so we can see the exact order that things happen
def get_ten_items(items):
    for i, item in enumerate(items):
        print(f"{i}: {item}", flush=True)
        if i == 9:
            return

print('First page', flush=True)
get_ten_items(items)

print('Second page', flush=True)
get_ten_items(items)
First page
0: <Item id=afrisar_dlr_roi_RAB100q>
1: <Item id=afrisar_dlr_roi_RAB099q>
2: <Item id=afrisar_dlr_roi_RAB098q>
3: <Item id=afrisar_dlr_roi_RAB097q>
4: <Item id=afrisar_dlr_roi_RAB096q>
5: <Item id=afrisar_dlr_roi_RAB095q>
6: <Item id=afrisar_dlr_roi_RAB094q>
7: <Item id=afrisar_dlr_roi_RAB093q>
8: <Item id=afrisar_dlr_roi_RAB092q>
9: <Item id=afrisar_dlr_roi_RAB091q>
Second page
0: <Item id=afrisar_dlr_roi_RAB090q>
1: <Item id=afrisar_dlr_roi_RAB089q>
2: <Item id=afrisar_dlr_roi_RAB088q>
3: <Item id=afrisar_dlr_roi_RAB087q>
4: <Item id=afrisar_dlr_roi_RAB086q>
5: <Item id=afrisar_dlr_roi_RAB085q>
6: <Item id=afrisar_dlr_roi_RAB084q>
7: <Item id=afrisar_dlr_roi_RAB083q>
8: <Item id=afrisar_dlr_roi_RAB082q>
9: <Item id=afrisar_dlr_roi_RAB081q>

Discover the URL of one item using xarray

[7]:
item = collection.get_item('afrisar_dlr_H4-2_SLC_VV')
item.assets['data'].href
[7]:
'https://bmap-catalogue-data.oss.eu-west-0.prod-cloud-ocb.orange-business.com/Campaign_data/afrisar_dlr/afrisar_dlr_H4-2_SLC_VV.tiff'