Searching for Data in NASA’s CMR in R
Authors: Sheyenne Kirkland (UAH), Alex Mandel (DevSeed), Henry Rodman (DevSeed), Zac Deziel (DevSeed)
Date: 11/21/24
Description: In this notebook, we’ll demonstrate how to access data from NASA’s CMR within R using maap-py
. Users will learn how to search for collections, granules and links, then compile a list of granule IDs and links.
Run This Notebook
To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the “Getting started with the MAAP” section of our documentation.
Disclaimer: it is highly recommended to run a tutorial within MAAP’s ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within the “R/Python” workspace.
Additional Resources
-
How to get started with the package
reticulate
, which is used in this notebook. This package allows us to use python-based libraries in R.
Searching for Granules in NASA’s Operational CMR using maap-py
The Python version of this notebook, also published in the MAAP Docs.
-
A resource from NASA Openscapes, showing users how to search for NASA data in R and get authentication using the package
earthdatalogin
. Additionally, it shows users how to find data stored in NASA STACs (spatio-temporal asset catalogs).
Common Metadata Repository (CMR) API Documentation
A resource that shows users how to search for collections and granules by parameter with the NASA CMR API.
NASA’s Operational CMR (MAAP Docs)
A section in the MAAP Docs that provides general information and resources to search and access NASA’s CMR.
Import/Install Packages
Let’s load the packages needed for this notebook.
[1]:
library(reticulate)
Search Collections
Before beginning our search, let’s invoke the MAAP
constructor. This will allow us to use the python-based maap-py
library from R.
[2]:
maap_py <- import("maap.maap")
maap <- maap_py$MAAP()
Now let’s search for a collection. The specific collection we have in mind is ATL08, so we will search for collections with that short name. Additionally, we want our data to be hosted within the cloud, so we will add the parameter cloud_hosted=true
. If you are not sure of the version, that line can be commented out. However, we know the current version is 006.
[3]:
atl08_collections = maap$searchCollection(
short_name='ATL08',
version='006',
cmr_host='cmr.earthdata.nasa.gov',
cloud_hosted='true'
)
length(atl08_collections)
One collection was returned to us. To grab the concept ID of the collection, we’ll use the code in the following cell.
[4]:
collection_id = atl08_collections[[1]]['concept-id']
print(collection_id)
[1] "C2613553260-NSIDC_CPRD"
Search Granules
Temporal Extent
Now that we have our collection ID, let’s search for granules within the collection. We’ll also add a temporal filter to our search. If you would like to search for granules without the temporal filter, simply comment out or remove the temporal=date_range
line.
[5]:
date_range <- '2018-12-01T00:00:00Z,2018-12-31T23:59:59Z'
results = maap$searchGranule(
temporal=date_range,
concept_id=collection_id,
limit=as.integer(100),
cmr_host='cmr.earthdata.nasa.gov'
)
length(results)
100 results were returned. There are thousands of granules within this date range, but because we set our limit to 100, we only get 100 back.
Spatial Extent
Another filter we can apply is a spatial filter.
[6]:
collection_id = 'C2763266360-LPCLOUD'
granule_bbox = '8.79799563969,-3.97882659263,14.4254557634,2.32675751384' # specify bounding box to search by
results = maap$searchGranule(
concept_id=collection_id,
bounding_box=granule_bbox,
limit=as.integer(100),
cmr_host="cmr.earthdata.nasa.gov"
)
length(results)
43 granules in the collection fell within our specified bounding coordinates. Let’s grab the granule file name and the geometry.
[7]:
granule_filename = results[[1]]['Granule']['DataGranule']['ProducerGranuleId']
print(granule_filename)
geometry = results[[1]]['Granule']['Spatial']['HorizontalSpatialDomain']['Geometry']
print(geometry)
[1] "N00E013.SRTMGL1.hgt"
{'BoundingRectangle': {'WestBoundingCoordinate': '12.99972222', 'NorthBoundingCoordinate': '1.00027778', 'EastBoundingCoordinate': '14.00027778', 'SouthBoundingCoordinate': '-0.00027778'}}
Granule Link Search
After searching for your desired granule(s), you can also find the links for data access.
[8]:
granule_link = results[[1]]['Granule']['OnlineAccessURLs'][[1]]
print(granule_link)
[{'URL': 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/SRTMGL1.003/N00E013.SRTMGL1.hgt/N00E013.SRTMGL1.hgt.zip', 'URLDescription': 'Download N00E013.SRTMGL1.hgt.zip'}, {'URL': 's3://lp-prod-protected/SRTMGL1.003/N00E013.SRTMGL1.hgt/N00E013.SRTMGL1.hgt.zip', 'URLDescription': 'This link provides direct download access via S3 to the granule'}]
Notice we have two links - one is https, and the other is S3. Let’s pull both URLs associated with this granule.
[9]:
granule_https <- granule_link[0]['URL']
granule_s3 <- granule_link[1]['URL']
print(granule_https)
print(granule_s3)
[1] "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/SRTMGL1.003/N00E013.SRTMGL1.hgt/N00E013.SRTMGL1.hgt.zip"
[1] "s3://lp-prod-protected/SRTMGL1.003/N00E013.SRTMGL1.hgt/N00E013.SRTMGL1.hgt.zip"
Granule ID List
If you need multiple granules, you can also compile a list with multiple granule IDs from our search results.
[14]:
granule_list <- c()
for (result in results) {
granule_list <- c(granule_list, (result['concept-id']))
}
print(granule_list[1:5])
[1] "G2821018750-LPCLOUD" "G2821036920-LPCLOUD" "G2821037023-LPCLOUD"
[4] "G2821037092-LPCLOUD" "G2821037143-LPCLOUD"
Granule Link List
Similarly, let’s create a list of links to the granules. For this example, we’ll just compile a list of S3 URLs.
[13]:
link_list <- c()
for (result in results) {
link_list <- c(link_list, (result['Granule']['OnlineAccessURLs'][[1]][1]['URL']))
}
print(link_list[1:5])
[1] "s3://lp-prod-protected/SRTMGL1.003/N00E013.SRTMGL1.hgt/N00E013.SRTMGL1.hgt.zip"
[2] "s3://lp-prod-protected/SRTMGL1.003/N02E011.SRTMGL1.hgt/N02E011.SRTMGL1.hgt.zip"
[3] "s3://lp-prod-protected/SRTMGL1.003/N02E010.SRTMGL1.hgt/N02E010.SRTMGL1.hgt.zip"
[4] "s3://lp-prod-protected/SRTMGL1.003/N01E014.SRTMGL1.hgt/N01E014.SRTMGL1.hgt.zip"
[5] "s3://lp-prod-protected/SRTMGL1.003/N02E012.SRTMGL1.hgt/N02E012.SRTMGL1.hgt.zip"