BETA - Collection Discovery: searching for collections across multiple APIs using the Federated Collection Discovery API
Author: Henry Rodman (DevSeed)
Date: September 13, 2024
Description: These examples show how to use the Federated Collection Discovery API to search for collections across multiple STAC APIs and/or CMR APIs. There is also an interactive search application for using the API which you can use at https://discover.maap-project.org.
Note: The Federated Collection Discovery API is not mature and is not yet supported by standard clients like pystac_client yet! Work has begun to upstream the collection filtering capabilities into pystac_client, though.
Background
It can be challenging to find the data that you need for an analysis when any of the following are true: - you don’t know the collection ID for a collection that you know exists - you don’t know which exact API the data can be accessed from - you don’t know which collections you even need
Fear not! The Federated Collection Discovery API can help you find the data you need by running your search for collections across several STAC APIs and/or CMR APIs at once.
Additional resources
Federated Collection Discovery API
The Federated Collection Discovery API provides a STAC API-esque interface for finding collections that match your search criteria
The application will query a list of STAC APIs and/or CMR APIs and, if the Collection Search STAC API extension is not implemented, it will do a client-side filter that mimics the filters proposed by that extension.
search parameters: - bbox
: bounding box coordinates (EPSG:4326) - datetime
: datetime extent - q
: free-text search
other parametes: - hint_lang
: programming language for item-level search hint - only python
right now :/
[1]:
from datetime import datetime, timezone
import httpx
import pandas as pd
from IPython.display import display, HTML
API_URL = "https://discover-api.maap-project.org"
The API is configured to search across several STAC APIs by default:
[2]:
default_api_urls = httpx.get(f"{API_URL}/apis", timeout=20).json()
default_api_urls
[2]:
{'stac_api': ['https://stac.maap-project.org/',
'https://openveda.cloud/api/stac/',
'https://catalogue.dataspace.copernicus.eu/stac'],
'cmr': []}
free-text
filter
Perform a search with a free-text filter for collections that include ‘elevation’ OR ‘DEM’ but not ‘biomass’. The API will scan the ‘title’, ‘description’, and ‘keywords’ attributes of all of the collections in the catalogs.
The free-text query parameter will follow the logic outlined in the STAC API free-text extension. Here is a table that outlines the types of queries that are possible (borrowed from the STAC API free-text extension readme): | q | Summary | Detail | | ———– | ——- | —— | | sentinel
| Free-text query against all properties | This will search for any matching items that CONTAIN "sentinel"
| | "climate model"
| Free-text search using exact | This will search for any matching items that CONTAIN the exact phrase "climate model"
| |climate model
| Using OR
term match (Default) | This will search for any matching items that CONTAIN "climate"
OR "model"
| |climate OR model
| Using OR
term match (Default) | This will search for any matching items that CONTAIN "climate"
OR "model"
| |climate AND model
| Using
AND
term match | This will search for any matching items that CONTAIN "climate"
AND "model"
| | (quick OR brown) AND fox
| Parentheses can be used to group terms | This will search for matching items that CONTAIN "quick"
OR "brown"
AND "fox"
| | quick +brown -fox
| Indicate included and excluded terms using +
/-
| This will search for items that INCLUDES "brown"
EXCLUDES "fox"
OR CONTAIN "quick"
|
[3]:
search_request = httpx.get(
f"{API_URL}/search",
params={
"q": "(elevation OR DEM) -biomass",
"hint_lang": "python",
},
timeout=20,
)
search_request.raise_for_status()
search_results = search_request.json()
results_df = pd.DataFrame(search_results["results"])
display(HTML(results_df[["id", "catalog_url", "title"]].to_html()))
id | catalog_url | title | |
---|---|---|---|
0 | ABoVE_UAVSAR_PALSAR | https://stac.maap-project.org/ | Arctic-Boreal Vulnerability Experiment Uninhabited Aerial Vehicle Synthetic Aperture Radar Polarimetric SAR |
1 | SRTMGL1_COD | https://stac.maap-project.org/ | NASA Shuttle Radar Topography Mission Global 1 |
2 | COP-DEM | https://catalogue.dataspace.copernicus.eu/stac | COP-DEM |
The results
contain a list of collection-level metadata with some basic properties that you can review further.
[4]:
collection_info = results_df.iloc[0]
print(collection_info)
print("\ndescription:\n", collection_info.description)
id ABoVE_UAVSAR_PALSAR
catalog_url https://stac.maap-project.org/
title Arctic-Boreal Vulnerability Experiment Uninhab...
spatial_extent [[-166.788382, 69.708769, -110.947561, 59.7293...
temporal_extent [[2017-06-13T22:03:35Z, 2017-06-22T19:25:35Z]]
short_name None
description The Arctic-Boreal Vulnerability Experiment (AB...
keywords []
hint import pystac_client\n\ncatalog = pystac_clien...
Name: 0, dtype: object
description:
The Arctic-Boreal Vulnerability Experiment (ABoVE) is a NASA Terrestrial Ecology Program field campaign conducted from June through September 2017 over Alaska and Western Canada. ABoVE is a large-scale study of environmental change and to assess the vulnerability and resilience of Arctic and boreal ecosystems and provide scientific bases for societal response decision making. ABoVE utilized the Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) Polarimetric SAR (PALSAR) instrument to provide image transects to survey the land surface, hydrological systems and vegetation. SAR products in this collection include the Digital Elevation Model (DEM), the local incidence angle, the terrain slope product, ground projected complex cross products, the compressed stokes matrix, pauli decompositions, multi-look cross products, and scene annotation files.
You can also get a code snippet for performing an item-level search against the home API for a particular collection if you provide the hint_lang
parameter in the request.
[5]:
print(collection_info.hint)
import pystac_client
catalog = pystac_client.Client.open("https://stac.maap-project.org/")
search = catalog.search(collections="ABoVE_UAVSAR_PALSAR")
item_collection = search.item_collection()
bbox
filter
Perform a search for collections that intersect Finland’s bounding box with a free-text filter for ‘biomass’
[6]:
finland_bbox = (18.061, 59.348, 31.181, 70.576)
search_request = httpx.get(
f"{API_URL}/search",
params={
"q": "biomass",
"bbox": ",".join(str(coord) for coord in finland_bbox),
"hint_lang": "python",
},
timeout=20,
)
search_request.raise_for_status()
search_results = search_request.json()
results_df = pd.DataFrame(search_results["results"])
display(HTML(results_df[["id", "catalog_url", "title"]].to_html()))
id | catalog_url | title | |
---|---|---|---|
0 | GEDI_CalVal_Field_Data | https://stac.maap-project.org/ | Global Ecosystem Dynamics Investigation (GEDI) Calibration/Validation Field Survey Dataset |
1 | BIOSAR1 | https://stac.maap-project.org/ | BIOSAR1 |
2 | ICESat2_Boreal_AGB_tindex_average | https://stac.maap-project.org/ | ICESat2-Boreal Above Ground Biomass T-Index Average |
3 | ESACCI_Biomass_L4_AGB_V4_100m | https://stac.maap-project.org/ | ESA CCI Above-Ground Biomass Product Level 4 Version 4 |
4 | icesat2-boreal | https://stac.maap-project.org/ | Gridded Boreal Aboveground Biomass Density c.2020 at 30m resolution |
[7]:
collection_info = results_df.iloc[4]
print(collection_info.hint)
import pystac_client
catalog = pystac_client.Client.open("https://stac.maap-project.org/")
search = catalog.search(
collections="icesat2-boreal",
bbox=(18.061, 59.348, 31.181, 70.576),
)
item_collection = search.item_collection()
datetime
filter
You can use the datetime
parameter to filter down to collections with temporal extents that overlap a provided range. For example, to find collections with a temporal extent that includes the term ‘spectral’ and has data as recent as September 15, 2024, you can run the following search:
[8]:
recent_date = datetime(year=2024, month=9, day=15, tzinfo=timezone.utc)
search_request = httpx.get(
f"{API_URL}/search",
params={
"datetime": f"{recent_date.isoformat()}/..",
"q": "spectral",
"hint_lang": "python",
},
timeout=20,
)
search_request.raise_for_status()
search_results = search_request.json()
results_df = pd.DataFrame(search_results["results"])
display(HTML(results_df[["id", "catalog_url", "temporal_extent"]].to_html()))
id | catalog_url | temporal_extent | |
---|---|---|---|
0 | TERRAAQUA | https://catalogue.dataspace.copernicus.eu/stac | [[2000-02-16T00:00:00Z, None]] |
1 | LANDSAT-8 | https://catalogue.dataspace.copernicus.eu/stac | [[2013-03-24T00:00:00Z, None]] |
2 | SENTINEL-2 | https://catalogue.dataspace.copernicus.eu/stac | [[2015-07-01T00:00:00Z, None]] |
Note: For open datetime ranges, use .. to represent either the beginning or ending timestamp.
[9]:
collection_info = results_df.iloc[2]
print(collection_info.hint)
import pystac_client
catalog = pystac_client.Client.open("https://catalogue.dataspace.copernicus.eu/stac")
search = catalog.search(
collections="SENTINEL-2",
datetime="2024-09-15T00:00:00Z/..",
)
item_collection = search.item_collection()
Specify APIs with stac_api_urls
and/or cmr_urls
You can specify a set of different STAC APIs to search through with the stac_api_urls
parameter. This will override the default STAC API URLs.
[15]:
additional_stac_api_urls = [
"https://stac.eoapi.dev",
"https://earth-search.aws.element84.com/v1"
]
search_request = httpx.get(
f"{API_URL}/search",
params={
"stac_api_urls": ",".join(additional_stac_api_urls),
"q": "fire"
},
timeout=30,
)
search_request.raise_for_status()
search_results = search_request.json()
results_df = pd.DataFrame(search_results["results"])
display(HTML(results_df[["id", "catalog_url", "title"]].to_html()))
id | catalog_url | title | |
---|---|---|---|
0 | MAXAR_Marshall_Fire_21_Update | https://stac.eoapi.dev | Marshall Fire |
1 | MAXAR_McDougallCreekWildfire_BC_Canada_Aug_23 | https://stac.eoapi.dev | McDougall Creek Wildfire |
2 | MAXAR_NWT_Canada_Aug_23 | https://stac.eoapi.dev | Northwest Territories Fires |
By adding the NASA Operational CMR Search API URL in the cmr_urls
parameter you can run include the entire CMR catalog in your search and have
[16]:
search_request = httpx.get(
f"{API_URL}/search",
params={
"cmr_urls": "https://cmr.earthdata.nasa.gov/search/",
"q": "HLS"
},
timeout=20,
)
search_request.raise_for_status()
search_results = search_request.json()
results_df = pd.DataFrame(search_results["results"])
display(HTML(results_df[["id", "catalog_url", "title"]].to_html()))
id | catalog_url | title | |
---|---|---|---|
0 | hls-ndvi | https://openveda.cloud/api/stac/ | Normalized difference vegetation index from HLS |
1 | hls-l30-002-ej-reprocessed | https://openveda.cloud/api/stac/ | HLSL30.002 Environmental Justice Events |
2 | darnah-flood | https://openveda.cloud/api/stac/ | False Color Pre and Post Flood |
3 | hls-s30-002-ej-reprocessed | https://openveda.cloud/api/stac/ | HLSS30.002 Environmental Justice Events |
4 | hls-ndvi-difference | https://openveda.cloud/api/stac/ | HLS-derived NDVI difference for Assessing Impacts from Hurricane Iann |
5 | hls-entropy-difference | https://openveda.cloud/api/stac/ | HLS-derived entropy difference for Assessing impacts from Hurricane Ian |
6 | hls-bais2-v2 | https://openveda.cloud/api/stac/ | HLS-calculated BAIS2 burned area |
7 | hls-swir-falsecolor-composite | https://openveda.cloud/api/stac/ | HLS SWIR FalseColor Composite |
8 | C2021957657-LPCLOUD | https://cmr.earthdata.nasa.gov/search/ | HLS Landsat Operational Land Imager Surface Reflectance and TOA Brightness Daily Global 30m v2.0 |
9 | C2021957295-LPCLOUD | https://cmr.earthdata.nasa.gov/search/ | HLS Sentinel-2 Multi-spectral Instrument Surface Reflectance Daily Global 30m v2.0 |
10 | C2746980408-LPCLOUD | https://cmr.earthdata.nasa.gov/search/ | OPERA Land Surface Disturbance Alert from Harmonized Landsat Sentinel-2 product (Version 1) |
11 | C2617126679-POCLOUD | https://cmr.earthdata.nasa.gov/search/ | OPERA Dynamic Surface Water Extent from Harmonized Landsat Sentinel-2 product (Version 1) |
12 | C2076106409-LPCLOUD | https://cmr.earthdata.nasa.gov/search/ | ECOSTRESS Tiled Evapotranspiration Instantaneous and Daytime L3 Global 70 m V002 |
13 | C2074877891-LPCLOUD | https://cmr.earthdata.nasa.gov/search/ | ECOSTRESS Tiled Downscaled Meteorology Instantaneous L3 Global 70 m V002 |
14 | C2074852168-LPCLOUD | https://cmr.earthdata.nasa.gov/search/ | ECOSTRESS Tiled Surface Energy Balance Instantaneous L3 Global 70 m V002 |
15 | C2519119034-LPCLOUD | https://cmr.earthdata.nasa.gov/search/ | OPERA Land Surface Disturbance Annual from Harmonized Landsat Sentinel-2 product (Version 1) |
16 | C2090073749-LPCLOUD | https://cmr.earthdata.nasa.gov/search/ | ECOSTRESS Tiled Ancillary NDVI and Albedo L2 Global 70 m V002 |
17 | C2595678301-LPCLOUD | https://cmr.earthdata.nasa.gov/search/ | ECOSTRESS Tiled Top of Atmosphere Calibrated Radiance Instantaneous L1C Global 70 m V002 |
18 | C2756302505-ORNL_CLOUD | https://cmr.earthdata.nasa.gov/search/ | Aboveground Biomass Density for High Latitude Forests from ICESat-2, 2020 |
19 | C2775078742-ORNL_CLOUD | https://cmr.earthdata.nasa.gov/search/ | Phenology derived from Satellite Data and PhenoCam across CONUS and Alaska, 2019-2020 |
20 | C2102664483-LPDAAC_ECS | https://cmr.earthdata.nasa.gov/search/ | MuSLI Multi-Source Land Surface Phenology Yearly North America 30 m V011 |
[ ]: