Searching for Collections in NASA’s Operational CMR using maap-py¶
These examples walk through the MAAP API functionality of searching for collections based on specific parameters. Collections are groupings of files that share the same product specification. Searching for collections can be useful for finding individual files, known as granules, which are used for processing.
We begin by importing the maap
package and creating a new MAAP class.
[1]:
# import the MAAP package to handle queries
from maap.maap import MAAP
# import printing package to help display outputs
from pprint import pprint
# invoke the MAAP search client
maap = MAAP()
We can use the maap.searchCollection
function to return a list of desired collections. Before using this function, let’s use the help
function to view the specific arguments and keywords for maap.searchCollection
.
[2]:
# view help for the searchCollection function
help(maap.searchCollection)
Help on method searchCollection in module maap.maap:
searchCollection(limit=100, **kwargs) method of maap.maap.MAAP instance
Search the CMR collections
:param limit: limit of the number of results
:param kwargs: search parameters
:return: list of results (<Instance of Result>)
The help text is showing that maap.searchCollection
accepts a limit and search parameters. The limit parameter limits the number of resulting collections returned by maap.searchCollection
. Note that limit=100
means that the default limit for results from the MAAP API is 100. maap.searchCollection
accepts any additional search parameters that are included in the CMR. For a list of accepted parameters, please refer to the CMR Search Collections API
reference.
In this example we will explore search options including:
- Finding all Collections
- Searching by temporal filter
- Searching by spatial filter
- Using the results from one search as inputs into another
- Searching by additional attributes
Finding all Collections¶
Here we will demonstrate how to create a list containing all of the collections contained within the CMR. To do this, we will use the maap.searchCollection
function without any additional search parameters.
[3]:
# search all collections
results = maap.searchCollection(cmr_host="cmr.earthdata.nasa.gov")
# print the number of collections
pprint(f'Got {len(results)} results')
'Got 100 results'
We get 100 results because of the default page limit. The result from the MAAP API is a list of collections where each element in the list is the metadata for that particular collection. To change the limit, type limit=
and then a value within the parentheses after maap.searchCollection()
.
Let’s look at the metadata for the first collection in our list of results (results[0]
) using pprint
. For formatting purposes, we can use the depth
parameter to control the number of levels of metadata detail to display. By default, there is no constraint on the depth. By setting a depth
parameter (in this case depth=2
), we can ensure that the next contained level is replaced by an ellipsis.
[9]:
# print the metadata for the first collection
# we use the depth parameter to set the layer of metadata detail in the results, with (1) having the least detail
# (1) displays the concept ID, format, and revision ID
# adjust the depth to a larger value (6) if you would like to view all of the metadata
pprint(results[0], depth=2)
{'Collection': {'Campaigns': {...},
'CollectionState': 'COMPLETE',
'Contacts': {...},
'DOI': {...},
'DataSetId': '10 Days Synthesis of SPOT VEGETATION Images '
'(VGT-S10)',
'Description': 'The VGT-S10 are near-global or continental, '
'10-daily composite images which are '
"synthesised from the 'best available' "
'observations registered in the course of every '
"'dekad' by the orbiting earth observation "
'system SPOT-VEGETATION. The products provide '
'data from all spectral bands (SWIR, NIR, RED, '
'BLUE), the NDVI and auxiliary data on image '
'acquisition parameters. The VEGETATION system '
'allows operational and near real-time '
'applications, at global, continental and '
'regional scales, in very broad environmentally '
'and socio-economically critical fields. The '
'VEGETATION instrument is operational since '
'April 1998, first with VGT1, from March 2003 '
'onwards, with VGT2. More information is '
'available on: '
'https://docs.terrascope.be/#/DataProducts/SPOT-VGT/Level3/Level3',
'InsertTime': '2023-04-07T08:27:18.514Z',
'LastUpdate': '2023-04-07T08:27:18.514Z',
'LongName': 'Not provided',
'OnlineAccessURLs': {...},
'OnlineResources': {...},
'Platforms': {...},
'ProcessingLevelId': 'NA',
'ScienceKeywords': {...},
'ShortName': 'urn:ogc:def:EOP:VITO:VGT_S10',
'Spatial': {...},
'Temporal': {...},
'UseConstraints': {...},
'VersionId': '1'},
'concept-id': 'C2207472890-FEDEO',
'format': 'application/echo10+xml',
'revision-id': '5'}
The Collection
key has all of the collection information including attributes, the archive center, spatial, and temporal information. The concept-id
is a unique identifier for this collection. It can be used to further refine search results from the CMR, such as when searching for granule information.
Searching by Temporal Filter¶
Here we use a temporal filter to narrow down our results using the temporal
keyword in our search. The temporal keyword takes datetime information in a specific format. The date format used is YYYY-MM-DDThh:mm:ssZ
and temporal search criteria may be either a single date or a date range. If one date is provided then it can be inferred as the start or end date. To define a start date and return all collections
after the date, put a comma after the date (YYYY-MM-DDThh:mm:ssZ,
). To define a end date and return all granules prior to the data, put a comma before the date (,YYYY-MM-DDThh:mm:ssZ
). Lastly, to get a date range, provide the start date and end date separated by a comma (YYYY-MM-DDThh:mm:ssZ,YYYY-MM-DDThh:mm:ssZ
).
In this example we will search for one month of data.
[5]:
datetimeRange = '2000-01-01T00:00:00Z,2000-01-31T23:59:59Z' # specify datetime range to search for data in January 2000
results = maap.searchCollection(
cmr_host = "cmr.earthdata.nasa.gov",
temporal = datetimeRange
)
pprint(f'Got {len(results)} results')
'Got 100 results'
[6]:
collectionName = results[0]['Collection']['ShortName'] # get the collection short name
collectionDate = results[0]['Collection']['Temporal']['RangeDateTime']['BeginningDateTime'] # get the collection start time
pprint(
f'Collection {collectionName} was acquired starting at {collectionDate}', width=100)
'Collection GLDAS_NOAH025_3H was acquired starting at 2000-01-01T00:00:00.000Z'
It appears the first result correctly matches with the beginning and ending temporal search parameters. Keep in mind that the results are limited to 100 so the final collection returned may not match the end date that was searched for.
Searching by Spatial Filter¶
Here we will illustrate how to search for collections by a spatial filter. There are a couple of spatial filters available to search by in the CMR including point, line, polygon, and bounding box. In this example, we will explore filtering with a bounding box which is a sequence of four latitude and longitude values in the order of [W,S,E,N]
.
[7]:
collectionDomain = '-42,10,42,20' # specify bounding box to search by
results = maap.searchCollection(
cmr_host = "cmr.earthdata.nasa.gov",
bounding_box = collectionDomain
)
pprint(f'Got {len(results)} results')
'Got 100 results'
[8]:
collectionName = results[3]['Collection']['ShortName'] # get a collection short name
collectionGeometry = results[3]['Collection']['Spatial']['HorizontalSpatialDomain']['Geometry'] # grab the spatial information from collection
pprint(f'Collection {collectionName} was acquired within the following geometry: ', width=100)
pprint(collectionGeometry)
'Collection gov.noaa.nodc:0000029 was acquired within the following geometry: '
{'BoundingRectangle': {'EastBoundingCoordinate': '-16.25',
'NorthBoundingCoordinate': '46.263167',
'SouthBoundingCoordinate': '0.766667',
'WestBoundingCoordinate': '-124.041667'},
'CoordinateSystem': 'CARTESIAN'}
We can see from the first collection that the spatial coordinates of the collection intersect our search box.