Accessing Data from NASA’s CMR in R
Authors: Harshini Girish (UAH), Sheyenne Kirkland (UAH), Alex Mandel (Development Seed), Henry Rodman (Development Seed), Zac Deziel (Development Seed)
Date: March 26, 2025
Description: This notebook serves as a follow-up to “Searching for Data in NASA’s CMR in R”. In this guide, users will learn how to:
Access data from a NASA Distributed Active Archive Center (DAAC) directly.
Use
pawsto download data from a NASA DAAC locally.
Additional Resources
-
Current R Documentation within the MAAP Docs.
NASA’s Operational CMR (MAAP Docs)
A section in the MAAP Docs offering an overview of resources to search and access NASA’s CMR.
`ncdf4Reference Manual <https://cran.r-project.org/web/packages/ncdf4/ncdf4.pdf>`__Documentation for reading and writing netCDF files using the
ncdf4package.
-
A list of drivers for raster data.
`pawsReference Manual <https://cran.r-project.org/web/packages/paws/paws.pdf>`__Documentation for using the
pawspackage.
Run This Notebook
To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the “Getting started with the MAAP” section of our documentation.
Disclaimer: it is highly recommended to run a tutorial within MAAP’s ADE, which already includes packages specific to MAAP. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within an “R/Python” workspace.
Install and Load Required Libraries
Let’s load the packages needed for this notebook.
[75]:
library("reticulate") # to use maap-py python
library("paws") # to access S3 files
library("ncdf4") # to read HDF4/netcdf files locally
library("terra") # to open raster files
Additionally, we’ll invoke the MAAP constructor. This will allow us to use the python-based maapy-py library from R.
[76]:
maap_py <- import("maap.maap")
maap <- maap_py$MAAP()
## Searching for Data
In the example below, we’ll demonstrate searching and accessing data from ORNL DAAC. We’ll search for a GEDI L4B dataset, extract the associated links to access the data, and then open a file.
For more information on searching for NASA CMR collections and granules in R, see “Searching for Data in NASA’s CMR in R”.
[77]:
# Search for a dataset in NASA's CMR
gedi_collection <- maap$searchCollection(
short_name = "GEDI_L4B_Gridded_Biomass_V2_1_2299",
cmr_host = "cmr.earthdata.nasa.gov",
cloud_hosted = "true"
)
# Extract the collection’s concept ID
collection_id <- gedi_collection[[1]]["concept-id"]
print(paste("Collection ID:", collection_id))
# Retrieve granules (up to 5 granules)
gedi_granules <- maap$searchGranule(
concept_id = collection_id,
limit = as.integer(5),
cmr_host = "cmr.earthdata.nasa.gov"
)
granule_names <- sapply(gedi_granules, function(names) names["Granule"]["GranuleUR"])
cat("Granules:\n")
print(granule_names)
[1] "Collection ID: C2792577683-ORNL_CLOUD"
Granules:
[1] "GEDI_L4B_Gridded_Biomass_V2_1.GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif"
[2] "GEDI_L4B_Gridded_Biomass_V2_1.GEDI04_B_MW019MW223_02_002_02_R01000M_V2.tif"
[3] "GEDI_L4B_Gridded_Biomass_V2_1.GEDI04_B_MW019MW223_02_002_02_R01000M_MU.tif"
[4] "GEDI_L4B_Gridded_Biomass_V2_1.GEDI04_B_MW019MW223_02_002_02_R01000M_QF.tif"
[5] "GEDI_L4B_Gridded_Biomass_V2_1.GEDI04_B_MW019MW223_02_002_02_R01000M_NS.tif"
Now that we have our granules, let’s extract the URLs associated with the first granule. There are two links: an S3 link, and an https link.
[78]:
https_link <- gedi_granules[[1]]["Granule"]["OnlineAccessURLs"][[1]][0]["URL"]
print(paste("https Link:", https_link))
s3_link <- gedi_granules[[1]]["Granule"]["OnlineAccessURLs"][[1]][2]["URL"]
print(paste("S3 Link:", s3_link))
[1] "https Link: https://data.ornldaac.earthdata.nasa.gov/protected/gedi/GEDI_L4B_Gridded_Biomass_V2_1/data/GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif"
[1] "S3 Link: s3://ornl-cumulus-prod-protected/gedi/GEDI_L4B_Gridded_Biomass_V2_1/data/GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif"
Data Access
Direct Access
Let’s use the sf package to read the metadata associated with the TIFF file above. To read an item from S3 directly, /vsis3/ needs to precede the S3 path. To do this, we’ll use the sub function to replace s3:// with /vsis3/.
[79]:
tiff_path <- sub("s3://", "/vsis3/", s3_link)
tiff_read <- sf::gdal_utils("info", tiff_path)
Driver: GTiff/GeoTIFF
Files: /vsis3/ornl-cumulus-prod-protected/gedi/GEDI_L4B_Gridded_Biomass_V2_1/data/GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif
Size is 34704, 14616
Coordinate System is:
PROJCRS["WGS 84 / NSIDC EASE-Grid 2.0 Global",
BASEGEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]],
CONVERSION["US NSIDC EASE-Grid 2.0 Global",
METHOD["Lambert Cylindrical Equal Area",
ID["EPSG",9835]],
PARAMETER["Latitude of 1st standard parallel",30,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8823]],
PARAMETER["Longitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["False easting",0,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["easting (X)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["northing (Y)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Environmental science - used as basis for EASE grid."],
AREA["World between 86°S and 86°N."],
BBOX[-86,-180,86,180]],
ID["EPSG",6933]]
Data axis to CRS axis mapping: 1,2
Origin = (-17367530.445161499083042,7314540.830638599582016)
Pixel Size = (1000.895023349667440,-1000.895023349667440)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=BAND
Corner Coordinates:
Upper Left (-17367530.445, 7314540.831) (180d 0' 0.00"W, 85d 2'40.44"N)
Lower Left (-17367530.445,-7314540.831) (180d 0' 0.00"W, 85d 2'40.44"S)
Upper Right (17367530.445, 7314540.831) (180d 0' 0.00"E, 85d 2'40.44"N)
Lower Right (17367530.445,-7314540.831) (180d 0' 0.00"E, 85d 2'40.44"S)
Center ( 0.0000019, -0.0000008) ( 0d 0' 0.00"E, 0d 0' 0.00"S)
Band 1 Block=256x256 Type=Float32, ColorInterp=Gray
NoData Value=-9999
Overviews: 17352x7308, 8676x3654, 4338x1827, 2169x914, 1085x457, 543x229
Since this is a TIFF file, we can use the Terra package to access the data.
[80]:
gedi_data <- terra::rast(tiff_path)
gedi_data
class : SpatRaster
dimensions : 14616, 34704, 1 (nrow, ncol, nlyr)
resolution : 1000.895, 1000.895 (x, y)
extent : -17367530, 17367530, -7314541, 7314541 (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / NSIDC EASE-Grid 2.0 Global (EPSG:6933)
source : GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif
name : GEDI04_B_MW019MW223_02_002_02_R01000M_SE
Download a File Locally
When data cannot or should not be directly accessed, the file can also be downloaded locally. For more examples on when (or when not) to directly access the data, see “MAAP AWS Access in R”. “When to ‘Cloud’” is a more general resource, but also provides good questions to ask yourself when using cloud access.
For this example, let’s search for a MODIS dataset provided by LP DAAC. Similar to above, we’ll search for the collection and retrieve the associated granules, then extract the S3 link from the first granule.
[81]:
# Search for a dataset in NASA's CMR
modis_collection <- maap$searchCollection(
short_name = "MOD13A1",
cmr_host = "cmr.earthdata.nasa.gov",
cloud_hosted = "true"
)
# Extract the collection’s concept ID
collection_id <- modis_collection[[1]]["concept-id"]
# Retrieve granules (up to 5 granules)
modis_granules <- maap$searchGranule(
concept_id = collection_id,
limit = as.integer(5),
cmr_host = "cmr.earthdata.nasa.gov"
)
# Retrieve S3 link
s3_link <- modis_granules[[1]]["Granule"]["OnlineAccessURLs"][[1]][1]["URL"]
print(paste("S3 Link:", s3_link))
[1] "S3 Link: s3://lp-prod-protected/MOD13A1.061/MOD13A1.A2000049.h02v06.061.2020041151125/MOD13A1.A2000049.h02v06.061.2020041151125.hdf"
To download the data locally, temporary credentials for LP DAAC are needed.
[82]:
# Get AWS S3 credentials for LP DAAC
credentials <- maap$aws$earthdata_s3_credentials(
"https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials"
)
# Configure AWS S3 client using paws
s3 <- paws::s3(
credentials = list(
creds = list(
access_key_id = credentials$accessKeyId,
secret_access_key = credentials$secretAccessKey,
session_token = credentials$sessionToken
)),
region = "us-west-2")
Before downloading, let’s do some final prepping. First, we’ll create a directory to download our file to. Then, from our S3 link, we can get the bucket, key, and a filename.
[83]:
# Create new directory
dir_name = "./data"
if(!dir.exists(dir_name)){dir.create(dir_name)}
[84]:
# Get bucket from file path
s3_parts <- strsplit(sub("s3://","", s3_link), "/", fixed = TRUE)[[1]] # drop the s3 prefix
bucket <- s3_parts[1] # grab the 1st item which is the bucket name
bucket
# Create file name for download
filename <- tail(s3_parts, n=1) # grab the last part of the path
filename
# Get key from file path
key <- paste(tail(s3_parts, n=-1), collapse='/') # grab everything in the path, except the 1st item
key
Now we can download our file.
[85]:
modis_file <- s3$download_file(Bucket = bucket, Key = key, Filename = paste("./data/", filename))
Access the Downloaded File
The data has been downloaded and we can open the file. Since this is an HDF4 file, we can use the ncdf4 package to open and work with it.
[86]:
modis_file <- nc_open(paste("./data/", filename))
The desired information can now be obtained from the opened file. For example, let’s print the variable names.
[87]:
names(modis_file$var)
- '500m 16 days NDVI'
- '500m 16 days EVI'
- '500m 16 days VI Quality'
- '500m 16 days red reflectance'
- '500m 16 days NIR reflectance'
- '500m 16 days blue reflectance'
- '500m 16 days MIR reflectance'
- '500m 16 days view zenith angle'
- '500m 16 days sun zenith angle'
- '500m 16 days relative azimuth angle'
- '500m 16 days composite day of the year'
- '500m 16 days pixel reliability'