BETA - Collection Discovery: searching for collections across multiple APIs using the Federated Collection Discovery API

Author: Henry Rodman (DevSeed)

Date: September 13, 2024

Description: These examples show how to use the Federated Collection Discovery API to search for collections across multiple STAC APIs and/or CMR APIs. There is also an interactive search application for using the API which you can use at https://discover.maap-project.org.

Note: The Federated Collection Discovery API is not mature and is not yet supported by standard clients like pystac_client yet! Work has begun to upstream the collection filtering capabilities into pystac_client, though.

Background

It can be challenging to find the data that you need for an analysis when any of the following are true: - you don’t know the collection ID for a collection that you know exists - you don’t know which exact API the data can be accessed from - you don’t know which collections you even need

Fear not! The Federated Collection Discovery API can help you find the data you need by running your search for collections across several STAC APIs and/or CMR APIs at once.

Additional resources

Federated Collection Discovery app

Federated Collection Discovery API

The Federated Collection Discovery API provides a STAC API-esque interface for finding collections that match your search criteria

The application will query a list of STAC APIs and/or CMR APIs and, if the Collection Search STAC API extension is not implemented, it will do a client-side filter that mimics the filters proposed by that extension.

search parameters: - bbox: bounding box coordinates (EPSG:4326) - datetime: datetime extent - q: free-text search

other parametes: - hint_lang: programming language for item-level search hint - only python right now :/

[1]:
from datetime import datetime, timezone

import httpx
import pandas as pd
from IPython.display import display, HTML

API_URL = "https://discover-api.maap-project.org"

The API is configured to search across several STAC APIs by default:

[2]:
default_api_urls = httpx.get(f"{API_URL}/apis", timeout=20).json()
default_api_urls
[2]:
{'stac_api': ['https://stac.maap-project.org/',
  'https://openveda.cloud/api/stac/',
  'https://catalogue.dataspace.copernicus.eu/stac'],
 'cmr': []}

free-text filter

Perform a search with a free-text filter for collections that include ‘elevation’ OR ‘DEM’ but not ‘biomass’. The API will scan the ‘title’, ‘description’, and ‘keywords’ attributes of all of the collections in the catalogs.

The free-text query parameter will follow the logic outlined in the STAC API free-text extension. Here is a table that outlines the types of queries that are possible (borrowed from the STAC API free-text extension readme): | q | Summary | Detail | | ———– | ——- | —— | | sentinel | Free-text query against all properties | This will search for any matching items that CONTAIN "sentinel" | | "climate model" | Free-text search using exact | This will search for any matching items that CONTAIN the exact phrase "climate model" | |climate model| Using OR term match (Default) | This will search for any matching items that CONTAIN "climate" OR "model"| |climate OR model| Using OR term match (Default) | This will search for any matching items that CONTAIN "climate" OR "model"| |climate AND model| Using AND term match | This will search for any matching items that CONTAIN "climate" AND "model"| | (quick OR brown) AND fox | Parentheses can be used to group terms | This will search for matching items that CONTAIN "quick" OR "brown" AND "fox" | | quick +brown -fox | Indicate included and excluded terms using +/- | This will search for items that INCLUDES "brown" EXCLUDES "fox" OR CONTAIN "quick" |

[3]:
search_request = httpx.get(
    f"{API_URL}/search",
    params={
        "q": "(elevation OR DEM) -biomass",
        "hint_lang": "python",
    },
    timeout=20,
)
search_request.raise_for_status()
search_results = search_request.json()

results_df = pd.DataFrame(search_results["results"])
display(HTML(results_df[["id", "catalog_url", "title"]].to_html()))
id catalog_url title
0 ABoVE_UAVSAR_PALSAR https://stac.maap-project.org/ Arctic-Boreal Vulnerability Experiment Uninhabited Aerial Vehicle Synthetic Aperture Radar Polarimetric SAR
1 SRTMGL1_COD https://stac.maap-project.org/ NASA Shuttle Radar Topography Mission Global 1
2 COP-DEM https://catalogue.dataspace.copernicus.eu/stac COP-DEM

The results contain a list of collection-level metadata with some basic properties that you can review further.

[4]:
collection_info = results_df.iloc[0]
print(collection_info)

print("\ndescription:\n", collection_info.description)
id                                               ABoVE_UAVSAR_PALSAR
catalog_url                           https://stac.maap-project.org/
title              Arctic-Boreal Vulnerability Experiment Uninhab...
spatial_extent     [[-166.788382, 69.708769, -110.947561, 59.7293...
temporal_extent       [[2017-06-13T22:03:35Z, 2017-06-22T19:25:35Z]]
short_name                                                      None
description        The Arctic-Boreal Vulnerability Experiment (AB...
keywords                                                          []
hint               import pystac_client\n\ncatalog = pystac_clien...
Name: 0, dtype: object

description:
 The Arctic-Boreal Vulnerability Experiment (ABoVE) is a NASA Terrestrial Ecology Program field campaign conducted from June through September 2017 over Alaska and Western Canada. ABoVE is a large-scale study of environmental change and to assess the vulnerability and resilience of Arctic and boreal ecosystems and provide scientific bases for societal response decision making.  ABoVE utilized the Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) Polarimetric SAR (PALSAR) instrument to provide image transects to survey the land surface, hydrological systems and vegetation.  SAR products in this collection include the Digital Elevation Model (DEM), the local incidence angle, the terrain slope product, ground projected complex cross products, the compressed stokes matrix, pauli decompositions, multi-look cross products, and scene annotation files.

You can also get a code snippet for performing an item-level search against the home API for a particular collection if you provide the hint_lang parameter in the request.

[5]:
print(collection_info.hint)
import pystac_client

catalog = pystac_client.Client.open("https://stac.maap-project.org/")
search = catalog.search(collections="ABoVE_UAVSAR_PALSAR")
item_collection = search.item_collection()

bbox filter

Perform a search for collections that intersect Finland’s bounding box with a free-text filter for ‘biomass’

[6]:
finland_bbox = (18.061, 59.348, 31.181, 70.576)
search_request = httpx.get(
    f"{API_URL}/search",
    params={
        "q": "biomass",
        "bbox": ",".join(str(coord) for coord in finland_bbox),
        "hint_lang": "python",
    },
    timeout=20,
)
search_request.raise_for_status()
search_results = search_request.json()

results_df = pd.DataFrame(search_results["results"])
display(HTML(results_df[["id", "catalog_url", "title"]].to_html()))
id catalog_url title
0 GEDI_CalVal_Field_Data https://stac.maap-project.org/ Global Ecosystem Dynamics Investigation (GEDI) Calibration/Validation Field Survey Dataset
1 BIOSAR1 https://stac.maap-project.org/ BIOSAR1
2 ICESat2_Boreal_AGB_tindex_average https://stac.maap-project.org/ ICESat2-Boreal Above Ground Biomass T-Index Average
3 ESACCI_Biomass_L4_AGB_V4_100m https://stac.maap-project.org/ ESA CCI Above-Ground Biomass Product Level 4 Version 4
4 icesat2-boreal https://stac.maap-project.org/ Gridded Boreal Aboveground Biomass Density c.2020 at 30m resolution
[7]:
collection_info = results_df.iloc[4]

print(collection_info.hint)
import pystac_client

catalog = pystac_client.Client.open("https://stac.maap-project.org/")
search = catalog.search(
    collections="icesat2-boreal",
    bbox=(18.061, 59.348, 31.181, 70.576),
)
item_collection = search.item_collection()

datetime filter

You can use the datetime parameter to filter down to collections with temporal extents that overlap a provided range. For example, to find collections with a temporal extent that includes the term ‘spectral’ and has data as recent as September 15, 2024, you can run the following search:

[8]:
recent_date = datetime(year=2024, month=9, day=15, tzinfo=timezone.utc)

search_request = httpx.get(
    f"{API_URL}/search",
    params={
        "datetime": f"{recent_date.isoformat()}/..",
        "q": "spectral",
        "hint_lang": "python",
    },
    timeout=20,
)
search_request.raise_for_status()
search_results = search_request.json()

results_df = pd.DataFrame(search_results["results"])
display(HTML(results_df[["id", "catalog_url", "temporal_extent"]].to_html()))
id catalog_url temporal_extent
0 TERRAAQUA https://catalogue.dataspace.copernicus.eu/stac [[2000-02-16T00:00:00Z, None]]
1 LANDSAT-8 https://catalogue.dataspace.copernicus.eu/stac [[2013-03-24T00:00:00Z, None]]
2 SENTINEL-2 https://catalogue.dataspace.copernicus.eu/stac [[2015-07-01T00:00:00Z, None]]

Note: For open datetime ranges, use .. to represent either the beginning or ending timestamp.

[9]:
collection_info = results_df.iloc[2]

print(collection_info.hint)
import pystac_client

catalog = pystac_client.Client.open("https://catalogue.dataspace.copernicus.eu/stac")
search = catalog.search(
    collections="SENTINEL-2",
    datetime="2024-09-15T00:00:00Z/..",
)
item_collection = search.item_collection()

Specify APIs with stac_api_urls and/or cmr_urls

You can specify a set of different STAC APIs to search through with the stac_api_urls parameter. This will override the default STAC API URLs.

[15]:
additional_stac_api_urls = [
    "https://stac.eoapi.dev",
    "https://earth-search.aws.element84.com/v1"
]
search_request = httpx.get(
    f"{API_URL}/search",
    params={
        "stac_api_urls": ",".join(additional_stac_api_urls),
        "q": "fire"
    },
    timeout=30,
)
search_request.raise_for_status()
search_results = search_request.json()

results_df = pd.DataFrame(search_results["results"])
display(HTML(results_df[["id", "catalog_url", "title"]].to_html()))
id catalog_url title
0 MAXAR_Marshall_Fire_21_Update https://stac.eoapi.dev Marshall Fire
1 MAXAR_McDougallCreekWildfire_BC_Canada_Aug_23 https://stac.eoapi.dev McDougall Creek Wildfire
2 MAXAR_NWT_Canada_Aug_23 https://stac.eoapi.dev Northwest Territories Fires

By adding the NASA Operational CMR Search API URL in the cmr_urls parameter you can run include the entire CMR catalog in your search and have

[16]:
search_request = httpx.get(
    f"{API_URL}/search",
    params={
        "cmr_urls": "https://cmr.earthdata.nasa.gov/search/",
        "q": "HLS"
    },
    timeout=20,
)
search_request.raise_for_status()
search_results = search_request.json()

results_df = pd.DataFrame(search_results["results"])
display(HTML(results_df[["id", "catalog_url", "title"]].to_html()))
id catalog_url title
0 hls-ndvi https://openveda.cloud/api/stac/ Normalized difference vegetation index from HLS
1 hls-l30-002-ej-reprocessed https://openveda.cloud/api/stac/ HLSL30.002 Environmental Justice Events
2 darnah-flood https://openveda.cloud/api/stac/ False Color Pre and Post Flood
3 hls-s30-002-ej-reprocessed https://openveda.cloud/api/stac/ HLSS30.002 Environmental Justice Events
4 hls-ndvi-difference https://openveda.cloud/api/stac/ HLS-derived NDVI difference for Assessing Impacts from Hurricane Iann
5 hls-entropy-difference https://openveda.cloud/api/stac/ HLS-derived entropy difference for Assessing impacts from Hurricane Ian
6 hls-bais2-v2 https://openveda.cloud/api/stac/ HLS-calculated BAIS2 burned area
7 hls-swir-falsecolor-composite https://openveda.cloud/api/stac/ HLS SWIR FalseColor Composite
8 C2021957657-LPCLOUD https://cmr.earthdata.nasa.gov/search/ HLS Landsat Operational Land Imager Surface Reflectance and TOA Brightness Daily Global 30m v2.0
9 C2021957295-LPCLOUD https://cmr.earthdata.nasa.gov/search/ HLS Sentinel-2 Multi-spectral Instrument Surface Reflectance Daily Global 30m v2.0
10 C2746980408-LPCLOUD https://cmr.earthdata.nasa.gov/search/ OPERA Land Surface Disturbance Alert from Harmonized Landsat Sentinel-2 product (Version 1)
11 C2617126679-POCLOUD https://cmr.earthdata.nasa.gov/search/ OPERA Dynamic Surface Water Extent from Harmonized Landsat Sentinel-2 product (Version 1)
12 C2076106409-LPCLOUD https://cmr.earthdata.nasa.gov/search/ ECOSTRESS Tiled Evapotranspiration Instantaneous and Daytime L3 Global 70 m V002
13 C2074877891-LPCLOUD https://cmr.earthdata.nasa.gov/search/ ECOSTRESS Tiled Downscaled Meteorology Instantaneous L3 Global 70 m V002
14 C2074852168-LPCLOUD https://cmr.earthdata.nasa.gov/search/ ECOSTRESS Tiled Surface Energy Balance Instantaneous L3 Global 70 m V002
15 C2519119034-LPCLOUD https://cmr.earthdata.nasa.gov/search/ OPERA Land Surface Disturbance Annual from Harmonized Landsat Sentinel-2 product (Version 1)
16 C2090073749-LPCLOUD https://cmr.earthdata.nasa.gov/search/ ECOSTRESS Tiled Ancillary NDVI and Albedo L2 Global 70 m V002
17 C2595678301-LPCLOUD https://cmr.earthdata.nasa.gov/search/ ECOSTRESS Tiled Top of Atmosphere Calibrated Radiance Instantaneous L1C Global 70 m V002
18 C2756302505-ORNL_CLOUD https://cmr.earthdata.nasa.gov/search/ Aboveground Biomass Density for High Latitude Forests from ICESat-2, 2020
19 C2775078742-ORNL_CLOUD https://cmr.earthdata.nasa.gov/search/ Phenology derived from Satellite Data and PhenoCam across CONUS and Alaska, 2019-2020
20 C2102664483-LPDAAC_ECS https://cmr.earthdata.nasa.gov/search/ MuSLI Multi-Source Land Surface Phenology Yearly North America 30 m V011
[ ]: