Searching the STAC Catalog

Authors: Aimee Barciauskas (Development Seed)

Date: December 13, 2022

Description: This tutorial provides a basic introduction to searching the MAAP STAC catalog using pystac-client.

Another method of searching the STAC catalog is via the STAC browser.

Drawing

About the STAC Catalog

The MAAP STAC catalog provides discovery of a subset of MAAP datasets. These collections are hosted specifically through the MAAP STAC catalog and are typically not available on NASA’s CMR. The data files have not been moved at all in the process of publishing datasets to STAC.

Data will continue to be added to the STAC catalog with priority given to datasets which are known to be in-use by MAAP UWG members through S3 metrics, direct collaboration with data team members, and by request.

Additional Resources

Importing and Installing Packages

In order to run this notebook you’ll need the following packages:

[1]:
%%capture
%pip install -U pystac-client
[2]:
from pystac_client import Client

STAC Client

We first connect to an API by retrieving the root catalog, or landing page, of the API with the Client.open function.

[3]:
# STAC API root URL
URL = 'https://stac.maap-project.org/'
cat = Client.open(URL)
cat
[3]:

Searching Collections

As with a static catalog the get_collections function will iterate through the Collections in the Catalog. Notice that because this is an API it can get all the Collections through a single call, rather than having to fetch each one individually.

[4]:
stac_collections = list(cat.get_collections())
stac_collections
[4]:
[<CollectionClient id=Landsat8_SurfaceReflectance>,
 <CollectionClient id=Global_PALSAR2_PALSAR_FNF>,
 <CollectionClient id=Global_Forest_Change_2000-2017>,
 <CollectionClient id=AFRISAR_DLR2>,
 <CollectionClient id=AfriSAR_UAVSAR_KZ>,
 <CollectionClient id=AfriSAR_UAVSAR_Ungeocoded_Covariance>,
 <CollectionClient id=AfriSAR_UAVSAR_Normalization_Area>,
 <CollectionClient id=AfriSAR_UAVSAR_Geocoded_SLC>,
 <CollectionClient id=AfriSAR_UAVSAR_Geocoded_Covariance>,
 <CollectionClient id=GlobCover_09>,
 <CollectionClient id=GlobCover_05_06>,
 <CollectionClient id=GEDI_CalVal_Field_Data>,
 <CollectionClient id=AfriSAR_UAVSAR_Coreg_SLC>,
 <CollectionClient id=GEDI_CalVal_Lidar_Data_Compressed>,
 <CollectionClient id=GEDI_CalVal_Lidar_Data>,
 <CollectionClient id=ABoVE_UAVSAR_PALSAR>,
 <CollectionClient id=AFRISAR_DLR>,
 <CollectionClient id=BIOSAR1>,
 <CollectionClient id=icesat2-boreal>,
 <CollectionClient id=ICESat2_Boreal_AGB_tindex_average>,
 <CollectionClient id=NCEO_Africa_AGB_100m_2017>,
 <CollectionClient id=Paraguay_Country_Pilot>,
 <CollectionClient id=ESACCI_Biomass_L4_AGB_V4_100m>,
 <CollectionClient id=NASA_JPL_global_agb_mean_2020>,
 <CollectionClient id=SRTMGL1_COD>]
[5]:
collection = cat.get_collection(stac_collections[0].id)
collection
[5]:

Searching STAC Items

Query the /search endpoint of the STAC catalog to find items in our collection. This method will return an ItemSearch instance which we can then turn into a list.

Read more about additional parameters to the search() method at pystac-client.readthedocs.io.

[6]:
collection_items = list(cat.search(collections=[collection.id], max_items=10).items())
collection_items
[6]:
[<Item id=LC080090662019122401T1-SC20200127151508.tar>,
 <Item id=LC080090652019122401T1-SC20200127151451.tar>,
 <Item id=LC080090642019122401T2-SC20200127163402.tar>,
 <Item id=LC080080662019121701T2-SC20200127163324.tar>,
 <Item id=LC080080652019121701T2-SC20200127163702.tar>,
 <Item id=LC080080642019121701T1-SC20200127162047.tar>,
 <Item id=LC080090662019120801T1-SC20200127162000.tar>,
 <Item id=LC080090652019120801T1-SC20200127151500.tar>,
 <Item id=LC080090642019120801T1-SC20200127163722.tar>,
 <Item id=LC080080662019120101T2-SC20200127161947.tar>]

We can get a specific item by supplying one of the IDs from an item in our previous collection search. We are then able to get the HREF of the first asset in our item.

[7]:
item = collection.get_item(collection_items[0].id)
item.assets[list(item.assets.keys())[0]].href
[7]:
's3://nasa-maap-data-store/file-staging/nasa-map/Landsat8_SurfaceReflectance___1/LC080090662019122401T1-SC20200127151508.tar.gz'

Here’s a simplified example:

[ ]:
# Retrieve a specific collection
collection = cat.get_collection("ESACCI_Biomass_L4_AGB_V4_100m")

# Search for items in the collection
collection_items = list(cat.search(collections=["ESACCI_Biomass_L4_AGB_V4_100m"], max_items=10).items())

# Retrieve a specific item
item = collection.get_item("S50W080_ESACCI-BIOMASS-L4-AGB-MERGED-100m-2020-fv4.0")

# List the item's asset href
item.assets["estimates"].href