Searching for Granules in NASA’s Operational CMR using maap-py

Authors: Kel Markert (UAH), Katrina Virts (UAH), Samuel Ayers (UAH), Alex Mandel (DevSeed)

Date: February 27, 2020 (updated in 2022)

Description: These examples will walk through the MAAP API functionality of searching granules within a collection in NASA’s Common Metadata Repository (CMR) based on specific parameters. Granules are individual files from a sensor where a group of granules make a collection within CMR. The granules are the raw data that will be used for processing.

Run This Notebook

To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the “Getting started with the MAAP” section of our documentation.

Disclaimer: it is highly recommended to run a tutorial within MAAP’s ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors.

Additional Resources

Importing and Installing Packages

We begin by importing the maap and pprint packages. Then invoke the MAAP constructor, setting the maap_host argument to 'api.maap-project.org'.

[1]:
# import the MAAP package
from maap.maap import MAAP

# import printing package to help display outputs
from pprint import pprint

# invoke the MAAP constructor using the maap_host argument
maap = MAAP(maap_host='api.maap-project.org')

About searchGranule

Here we view the specific arguments and keywords for the maap.searchGranule function.

[2]:
help(maap.searchGranule)
Help on method searchGranule in module maap.maap:

searchGranule(limit=20, **kwargs) method of maap.maap.MAAP instance
    Search the CMR granules

    :param limit: limit of the number of results
    :param kwargs: search parameters
    :return: list of results (<Instance of Result>)

As we can see from the result, maap.searchGranule accepts a limit keyword which limits the number of results from CMR. maap.searchGranule() also accepts any additional search parameters that are included in CMR. For a list of accepted parameters, please refer to the CMR Search Granules API reference.

It is important to note that the default limit on results from the MAAP API is 20. To increase the number of results we will specify a variable and use it in later queries.

[3]:
# get at max 500 results from CMR
MAX_RESULTS = 500

In this example we will explore search options including:

  1. Searching by collection concept ID

  2. Searching by temporal filter

  3. Searching by spatial filter

  4. Using the results from one search as inputs into another

  5. Searching by additional attributes

For the next couple of examples, we will focus on the ICESat-2/ATLAS Land and Vegetation Height dataset.

Searching by Collection Short Name, Version

Here we will search by a short name and version which should uniquely identify a collection CMR. HOWEVER, some datasets exist both in the cloud and on-prem, so in the following example we actually get 2 results.

[4]:
atl08_collections = maap.searchCollection(
    short_name='ATL08',
    version='005',
    cmr_host='cmr.earthdata.nasa.gov'
)
len(atl08_collections)
[4]:
2

If you inspect the results, you will see the second result has distribution information which points to an S3 bucket location. You can see this information with the follow code: atl08_collections[1]['Collection']['DirectDistributionInformation'].

A simpler solution to finding just the cloud-hosted dataset is to add the cloud_hosted="true" parameter to our search.

[5]:
atl08_collections = maap.searchCollection(
    short_name='ATL08',
    version='005',
    cmr_host='cmr.earthdata.nasa.gov',
    cloud_hosted="true"
)
len(atl08_collections)
[5]:
1

Now we can look up the collection concept id to find only granules in the cloud-hosted ATL08 v005 dataset.

[6]:
COLLECTION_ID = atl08_collections[0]['concept-id']

results = maap.searchGranule(
    concept_id=COLLECTION_ID,
    cmr_host='cmr.earthdata.nasa.gov',
    limit=MAX_RESULTS)
pprint(f'Got {len(results)} results')
'Got 500 results'

We were able to get 500 results! There are most likely more than 500 granules in search results, but remember we limited the results to 500 granules. The result from the MAAP API is a list of granules where each element in the list is the metadata for that particular granule.

Now let’s look at the metadata for the first result.

[7]:
# print the first granule's metadata
# we use the depth parameter to set the layer of metadata detail in the results, with (1) having the least detail
# (1) displays the collection concept ID, concept ID, format, and revision ID
# adjust the depth to a larger value (6) if you would like to view all of the metadata
pprint(results[0], depth=2)
{'Granule': {'AssociatedBrowseImageUrls': {...},
             'Collection': {...},
             'DataGranule': {...},
             'GranuleUR': 'ATL08_20181014001049_02350102_005_01.h5',
             'InsertTime': '2021-11-14T23:43:07.741Z',
             'LastUpdate': '2021-11-14T23:43:07.741Z',
             'OnlineAccessURLs': {...},
             'OnlineResources': {...},
             'OrbitCalculatedSpatialDomains': {...},
             'Spatial': {...},
             'Temporal': {...}},
 'collection-concept-id': 'C2153574670-NSIDC_CPRD',
 'concept-id': 'G2166182816-NSIDC_CPRD',
 'format': 'application/echo10+xml',
 'revision-id': '1'}

There is a lot of information in the metadata so let’s break it down…

The Granule key has all of the granule information including attributes, browse imagery URLs, spatial, and temporal information. The collection-concept-id should match what you searched by and be the same for each granule. Lastly the granule specific concept-id is a unique identifier for this granule. This information can be used to further refine search results from CMR, specifically the granule information.

Searching by Temporal Filter

Here we will combine a search from earlier using the Collection Concept ID with a temporal filter to fine tune our results using the temporal keyword in our search.

The temporal keyword takes datetime information in a specific format. The date format used is YYYY-MM-DDThh:mm:ssZ and temporal search criteria may be either a single date or a date range. If one date is provided then it can be inferred as start or end date. To define a start date and return all granules after the date, put a comma after the date (YYYY-MM-DDThh:mm:ssZ,). To define an end date and return all granules prior to the data, put a comma before the date (,YYYY-MM-DDThh:mm:ssZ). Lastly, to get a date range, provide the start date and end date separated by a comma (YYYY-MM-DDThh:mm:ssZ,YYYY-MM-DDThh:mm:ssZ).

In this example we will search for one month of data.

[8]:
date_range = '2018-12-01T00:00:00Z,2018-12-31T23:59:59Z' # specify a date range to search for data for Dec. 2018

results = maap.searchGranule(
    temporal=date_range,
    concept_id=COLLECTION_ID,
    limit=MAX_RESULTS,
    cmr_host="cmr.earthdata.nasa.gov"
)
pprint(f'Got {len(results)} results')
'Got 500 results'
[9]:
granuleFilename = results[0]['Granule']['DataGranule']['ProducerGranuleId'] # get the granule file name
granuleDate = results[0]['Granule']['Temporal']['RangeDateTime']['BeginningDateTime'] # get the granule start time

pprint(f'Granule {granuleFilename} was acquired starting at {granuleDate}',width=100)
'Granule ATL08_20181201001339_09680103_005_01.h5 was acquired starting at 2018-12-01T00:13:48.477Z'

It looks like the first result correctly matches with the beginning temporal search parameter. Keep in mind that the results are limited to 500 so the final granule returned may not match the end date that was searched for.

Searching by Spatial Filter

Here we will illustrate how to search for granules by a spatial filter. There are a couple of spatial filters available to search by in CMR including point, line, polygon, and bounding box. The most simple to use is the bounding box which is a sequence of four latitude and longitude values in the order of [W,S,E,N]. In this example, we are going to search for data over Gabon using the bounding_box keyword.

[10]:
granule_bbox = '8.79799563969,-3.97882659263,14.4254557634,2.32675751384' # specify bounding box to search by

COLLECTION_ID = 'C1000000240-LPDAAC_ECS' # Collection title: "NASA Shuttle Radar Topography Mission Global 1 arc second V003"

results = maap.searchGranule(
    concept_id=COLLECTION_ID,
    bounding_box=granule_bbox,
    cmr_host="cmr.earthdata.nasa.gov"
)
pprint(f'Got {len(results)} results')
'Got 20 results'
[11]:
granule_filename = results[0]['Granule']['DataGranule']['ProducerGranuleId'] # get the granule file name
geometry = results[0]['Granule']['Spatial']['HorizontalSpatialDomain']['Geometry'] # grab the spatial information from granule

pprint(f'Granule {granule_filename} was acquired within the following geometry: ', width=100)
pprint(geometry)
'Granule S03E012.SRTMGL1.hgt.zip was acquired within the following geometry: '
{'BoundingRectangle': {'EastBoundingCoordinate': '13.00027778',
                       'NorthBoundingCoordinate': '-1.99972222',
                       'SouthBoundingCoordinate': '-3.00027778',
                       'WestBoundingCoordinate': '11.99972222'}}

We can see from the first granule that the spatial coordinates of the granule intersect our search box.

The MAAP API provides rich functionality to interact with the CMR instance within the MAAP platform. Users can search datasets programmatically by many parameters and even combine parameters such as spatial and temporal filters to refine results.

Generating ID List from Search Results

Each element in the results list contains the metadata for the granules returned by the search. Within this metadata is the key concept-id, which is the unique identifier for each granule. To create a list of granule IDs, we create a new list and add the concept-id from each element of results into the that list.

[12]:
granuleID_list = [result['concept-id'] for result in results]

# View some of the results
granuleID_list[:10]
[12]:
['G1004577874-LPDAAC_ECS',
 'G1004578009-LPDAAC_ECS',
 'G1004578073-LPDAAC_ECS',
 'G1004578089-LPDAAC_ECS',
 'G1004578257-LPDAAC_ECS',
 'G1004578334-LPDAAC_ECS',
 'G1004578381-LPDAAC_ECS',
 'G1004578586-LPDAAC_ECS',
 'G1004578726-LPDAAC_ECS',
 'G1004578728-LPDAAC_ECS']