{ "cells": [ { "cell_type": "markdown", "id": "5986c32c-58ce-4e31-9a2d-e0ca018b07bf", "metadata": {}, "source": [ "# Accessing Data from NASA's CMR in R\n", "\n", "Authors: Harshini Girish (UAH), Sheyenne Kirkland (UAH), Alex Mandel (Development Seed), Henry Rodman (Development Seed), Zac Deziel (Development Seed)\n", "\n", "Date: March 26, 2025\n", "\n", "Description: This notebook serves as a follow-up to [\"Searching for Data in NASA's CMR in R\"](https://docs.maap-project.org/en/develop/technical_tutorials/working_with_r/cmr_search_in_r.html). In this guide, users will learn how to:\n", "- Access data from a NASA Distributed Active Archive Center (DAAC) directly.\n", "- Use `paws` to download data from a NASA DAAC locally." ] }, { "cell_type": "markdown", "id": "df352544-7428-421d-827a-510141080010", "metadata": {}, "source": [ "## Additional Resources\n", "- [Working with R in MAAP](https://docs.maap-project.org/en/develop/technical_tutorials/working_with_r.html) \n", " - Current R Documentation within the MAAP Docs.\n", "- [NASA's Operational CMR (MAAP Docs)](https://docs.maap-project.org/en/latest/technical_tutorials/search/catalog.html#nasa-s-operational-cmr) \n", " - A section in the MAAP Docs offering an overview of resources to search and access NASA's CMR.\n", "- [`ncdf4` Reference Manual](https://cran.r-project.org/web/packages/ncdf4/ncdf4.pdf)\n", " - Documentation for reading and writing netCDF files using the `ncdf4` package.\n", "- [GDAL Raster Drivers](https://gdal.org/en/latest/drivers/raster/index.html)\n", " - A list of drivers for raster data.\n", "- [`paws` Reference Manual](https://cran.r-project.org/web/packages/paws/paws.pdf)\n", " - Documentation for using the `paws` package." ] }, { "cell_type": "markdown", "id": "9f1d15ad-170f-4286-84ec-55a1b45b3d2e", "metadata": {}, "source": [ "## Run This Notebook\n", "To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the [“Getting started with the MAAP”](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.\n", "\n", "Disclaimer: it is highly recommended to run a tutorial within MAAP’s ADE, which already includes packages specific to MAAP. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within an \"R/Python\" workspace." ] }, { "cell_type": "markdown", "id": "201805c7-bd3b-42ce-a299-95acac3c7638", "metadata": {}, "source": [ "## Install and Load Required Libraries\n", "Let's load the packages needed for this notebook." ] }, { "cell_type": "code", "execution_count": 75, "id": "dc1b93e1-fb92-45e5-ade0-7b4c19a9c867", "metadata": {}, "outputs": [], "source": [ "library(\"reticulate\") # to use maap-py python \n", "library(\"paws\") # to access S3 files\n", "library(\"ncdf4\") # to read HDF4/netcdf files locally\n", "library(\"terra\") # to open raster files" ] }, { "cell_type": "markdown", "id": "64131b26-1a47-4bb3-b105-d2d269afc06d", "metadata": {}, "source": [ "Additionally, we'll invoke the `MAAP` constructor. This will allow us to use the python-based `maapy-py` library from R." ] }, { "cell_type": "code", "execution_count": 76, "id": "02382469-7ded-4f13-a826-7a9728b0a86f", "metadata": {}, "outputs": [], "source": [ "maap_py <- import(\"maap.maap\")\n", "maap <- maap_py$MAAP()" ] }, { "cell_type": "markdown", "id": "7ea49dea-e3f6-4919-bb6e-e39815426cc5", "metadata": {}, "source": [ " ## Searching for Data\n", "\n", "In the example below, we'll demonstrate searching and accessing data from ORNL DAAC. We'll search for a GEDI L4B dataset, extract the associated links to access the data, and then open a file.\n", "\n", "For more information on searching for NASA CMR collections and granules in R, see [\"Searching for Data in NASA's CMR in R\"](https://docs.maap-project.org/en/develop/technical_tutorials/working_with_r/cmr_search_in_r.html). " ] }, { "cell_type": "code", "execution_count": 77, "id": "c7a649a4-62e0-43a7-9090-c9556cd09d63", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Collection ID: C2792577683-ORNL_CLOUD\"\n", "Granules:\n", "[1] \"GEDI_L4B_Gridded_Biomass_V2_1.GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif\"\n", "[2] \"GEDI_L4B_Gridded_Biomass_V2_1.GEDI04_B_MW019MW223_02_002_02_R01000M_V2.tif\"\n", "[3] \"GEDI_L4B_Gridded_Biomass_V2_1.GEDI04_B_MW019MW223_02_002_02_R01000M_MU.tif\"\n", "[4] \"GEDI_L4B_Gridded_Biomass_V2_1.GEDI04_B_MW019MW223_02_002_02_R01000M_QF.tif\"\n", "[5] \"GEDI_L4B_Gridded_Biomass_V2_1.GEDI04_B_MW019MW223_02_002_02_R01000M_NS.tif\"\n" ] } ], "source": [ "# Search for a dataset in NASA's CMR\n", "gedi_collection <- maap$searchCollection(\n", " short_name = \"GEDI_L4B_Gridded_Biomass_V2_1_2299\", \n", " cmr_host = \"cmr.earthdata.nasa.gov\",\n", " cloud_hosted = \"true\"\n", ")\n", "\n", "# Extract the collection’s concept ID\n", "collection_id <- gedi_collection[[1]][\"concept-id\"]\n", "print(paste(\"Collection ID:\", collection_id))\n", "\n", "# Retrieve granules (up to 5 granules)\n", "gedi_granules <- maap$searchGranule(\n", " concept_id = collection_id,\n", " limit = as.integer(5),\n", " cmr_host = \"cmr.earthdata.nasa.gov\"\n", ")\n", "\n", "granule_names <- sapply(gedi_granules, function(names) names[\"Granule\"][\"GranuleUR\"])\n", "cat(\"Granules:\\n\")\n", "print(granule_names)" ] }, { "cell_type": "markdown", "id": "b9746a7a-37d8-46be-8559-c0875dc4d99f", "metadata": {}, "source": [ "Now that we have our granules, let's extract the URLs associated with the first granule. There are two links: an S3 link, and an https link." ] }, { "cell_type": "code", "execution_count": 78, "id": "cf10ad4a-2aa9-4126-bc85-6d74bdeb51ed", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"https Link: https://data.ornldaac.earthdata.nasa.gov/protected/gedi/GEDI_L4B_Gridded_Biomass_V2_1/data/GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif\"\n", "[1] \"S3 Link: s3://ornl-cumulus-prod-protected/gedi/GEDI_L4B_Gridded_Biomass_V2_1/data/GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif\"\n" ] } ], "source": [ "https_link <- gedi_granules[[1]][\"Granule\"][\"OnlineAccessURLs\"][[1]][0][\"URL\"]\n", "print(paste(\"https Link:\", https_link))\n", "s3_link <- gedi_granules[[1]][\"Granule\"][\"OnlineAccessURLs\"][[1]][2][\"URL\"]\n", "print(paste(\"S3 Link:\", s3_link))" ] }, { "cell_type": "markdown", "id": "f5d7be93-dbdf-4e24-8c79-60cbe625cea1", "metadata": {}, "source": [ "## Data Access" ] }, { "cell_type": "markdown", "id": "ccab6a1d-3404-45c9-b3e3-a02274d37ea9", "metadata": {}, "source": [ "### Direct Access\n", "\n", "Let's use the `sf` package to read the metadata associated with the TIFF file above. To read an item from S3 directly, `/vsis3/` needs to precede the S3 path. To do this, we'll use the `sub` function to replace `s3://` with `/vsis3/`." ] }, { "cell_type": "code", "execution_count": 79, "id": "d084dd3b-bcb0-4cbc-9982-257270d1156e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Driver: GTiff/GeoTIFF\n", "Files: /vsis3/ornl-cumulus-prod-protected/gedi/GEDI_L4B_Gridded_Biomass_V2_1/data/GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif\n", "Size is 34704, 14616\n", "Coordinate System is:\n", "PROJCRS[\"WGS 84 / NSIDC EASE-Grid 2.0 Global\",\n", " BASEGEOGCRS[\"WGS 84\",\n", " ENSEMBLE[\"World Geodetic System 1984 ensemble\",\n", " MEMBER[\"World Geodetic System 1984 (Transit)\"],\n", " MEMBER[\"World Geodetic System 1984 (G730)\"],\n", " MEMBER[\"World Geodetic System 1984 (G873)\"],\n", " MEMBER[\"World Geodetic System 1984 (G1150)\"],\n", " MEMBER[\"World Geodetic System 1984 (G1674)\"],\n", " MEMBER[\"World Geodetic System 1984 (G1762)\"],\n", " MEMBER[\"World Geodetic System 1984 (G2139)\"],\n", " ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n", " LENGTHUNIT[\"metre\",1]],\n", " ENSEMBLEACCURACY[2.0]],\n", " PRIMEM[\"Greenwich\",0,\n", " ANGLEUNIT[\"degree\",0.0174532925199433]],\n", " ID[\"EPSG\",4326]],\n", " CONVERSION[\"US NSIDC EASE-Grid 2.0 Global\",\n", " METHOD[\"Lambert Cylindrical Equal Area\",\n", " ID[\"EPSG\",9835]],\n", " PARAMETER[\"Latitude of 1st standard parallel\",30,\n", " ANGLEUNIT[\"degree\",0.0174532925199433],\n", " ID[\"EPSG\",8823]],\n", " PARAMETER[\"Longitude of natural origin\",0,\n", " ANGLEUNIT[\"degree\",0.0174532925199433],\n", " ID[\"EPSG\",8802]],\n", " PARAMETER[\"False easting\",0,\n", " LENGTHUNIT[\"metre\",1],\n", " ID[\"EPSG\",8806]],\n", " PARAMETER[\"False northing\",0,\n", " LENGTHUNIT[\"metre\",1],\n", " ID[\"EPSG\",8807]]],\n", " CS[Cartesian,2],\n", " AXIS[\"easting (X)\",east,\n", " ORDER[1],\n", " LENGTHUNIT[\"metre\",1]],\n", " AXIS[\"northing (Y)\",north,\n", " ORDER[2],\n", " LENGTHUNIT[\"metre\",1]],\n", " USAGE[\n", " SCOPE[\"Environmental science - used as basis for EASE grid.\"],\n", " AREA[\"World between 86°S and 86°N.\"],\n", " BBOX[-86,-180,86,180]],\n", " ID[\"EPSG\",6933]]\n", "Data axis to CRS axis mapping: 1,2\n", "Origin = (-17367530.445161499083042,7314540.830638599582016)\n", "Pixel Size = (1000.895023349667440,-1000.895023349667440)\n", "Metadata:\n", " AREA_OR_POINT=Area\n", "Image Structure Metadata:\n", " COMPRESSION=LZW\n", " INTERLEAVE=BAND\n", "Corner Coordinates:\n", "Upper Left (-17367530.445, 7314540.831) (180d 0' 0.00\"W, 85d 2'40.44\"N)\n", "Lower Left (-17367530.445,-7314540.831) (180d 0' 0.00\"W, 85d 2'40.44\"S)\n", "Upper Right (17367530.445, 7314540.831) (180d 0' 0.00\"E, 85d 2'40.44\"N)\n", "Lower Right (17367530.445,-7314540.831) (180d 0' 0.00\"E, 85d 2'40.44\"S)\n", "Center ( 0.0000019, -0.0000008) ( 0d 0' 0.00\"E, 0d 0' 0.00\"S)\n", "Band 1 Block=256x256 Type=Float32, ColorInterp=Gray\n", " NoData Value=-9999\n", " Overviews: 17352x7308, 8676x3654, 4338x1827, 2169x914, 1085x457, 543x229\n" ] } ], "source": [ "tiff_path <- sub(\"s3://\", \"/vsis3/\", s3_link)\n", "\n", "tiff_read <- sf::gdal_utils(\"info\", tiff_path)" ] }, { "cell_type": "markdown", "id": "20f506ac-ab39-4502-bde2-3204f4c768bb", "metadata": {}, "source": [ "Since this is a TIFF file, we can use the `Terra` package to access the data." ] }, { "cell_type": "code", "execution_count": 80, "id": "429e2868-a0fe-4ffa-b42e-bab80e6a5178", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "class : SpatRaster \n", "dimensions : 14616, 34704, 1 (nrow, ncol, nlyr)\n", "resolution : 1000.895, 1000.895 (x, y)\n", "extent : -17367530, 17367530, -7314541, 7314541 (xmin, xmax, ymin, ymax)\n", "coord. ref. : WGS 84 / NSIDC EASE-Grid 2.0 Global (EPSG:6933) \n", "source : GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif \n", "name : GEDI04_B_MW019MW223_02_002_02_R01000M_SE " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gedi_data <- terra::rast(tiff_path)\n", "gedi_data" ] }, { "cell_type": "markdown", "id": "e6af389b-5b6b-446b-b2a8-589e9d860c6a", "metadata": {}, "source": [ "### Download a File Locally" ] }, { "cell_type": "markdown", "id": "cc21b20c-5832-41a4-a344-b056687c84b5", "metadata": {}, "source": [ "When data cannot or should not be directly accessed, the file can also be downloaded locally. For more examples on when (or when not) to directly access the data, see [\"MAAP AWS Access in R\"](https://docs.maap-project.org/en/develop/technical_tutorials/working_with_r/access_aws_maap.html). [\"When to 'Cloud'\"](https://nasa-openscapes.github.io/earthdata-cloud-cookbook/when-to-cloud.html) is a more general resource, but also provides good questions to ask yourself when using cloud access.\n", "\n", "For this example, let's search for a MODIS dataset provided by LP DAAC. Similar to above, we'll search for the collection and retrieve the associated granules, then extract the S3 link from the first granule." ] }, { "cell_type": "code", "execution_count": 81, "id": "9b664986-b1ac-4fad-aef3-c4524ed20a12", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"S3 Link: s3://lp-prod-protected/MOD13A1.061/MOD13A1.A2000049.h02v06.061.2020041151125/MOD13A1.A2000049.h02v06.061.2020041151125.hdf\"\n" ] } ], "source": [ "# Search for a dataset in NASA's CMR\n", "modis_collection <- maap$searchCollection(\n", " short_name = \"MOD13A1\", \n", " cmr_host = \"cmr.earthdata.nasa.gov\",\n", " cloud_hosted = \"true\"\n", ")\n", "\n", "# Extract the collection’s concept ID\n", "collection_id <- modis_collection[[1]][\"concept-id\"]\n", "\n", "# Retrieve granules (up to 5 granules)\n", "modis_granules <- maap$searchGranule(\n", " concept_id = collection_id,\n", " limit = as.integer(5),\n", " cmr_host = \"cmr.earthdata.nasa.gov\"\n", ")\n", "\n", "# Retrieve S3 link\n", "s3_link <- modis_granules[[1]][\"Granule\"][\"OnlineAccessURLs\"][[1]][1][\"URL\"]\n", "print(paste(\"S3 Link:\", s3_link))" ] }, { "cell_type": "markdown", "id": "3b9dc9bb-4eee-4659-9e85-0e562eb377c6", "metadata": {}, "source": [ "To download the data locally, temporary credentials for LP DAAC are needed." ] }, { "cell_type": "code", "execution_count": 82, "id": "87aa49fd-c3a0-47dc-b433-904a4283a9ae", "metadata": {}, "outputs": [], "source": [ "# Get AWS S3 credentials for LP DAAC\n", "credentials <- maap$aws$earthdata_s3_credentials(\n", " \"https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials\"\n", ")\n", "\n", "# Configure AWS S3 client using paws\n", "s3 <- paws::s3(\n", " credentials = list(\n", " creds = list(\n", " access_key_id = credentials$accessKeyId,\n", " secret_access_key = credentials$secretAccessKey,\n", " session_token = credentials$sessionToken\n", " )),\n", " region = \"us-west-2\")" ] }, { "cell_type": "markdown", "id": "ef2830d7-31b6-4dfc-83dc-f9eddd5cc5cc", "metadata": {}, "source": [ "Before downloading, let's do some final prepping. First, we'll create a directory to download our file to. Then, from our S3 link, we can get the bucket, key, and a filename." ] }, { "cell_type": "code", "execution_count": 83, "id": "f1920448-d661-4c55-99d3-d4f669b924d6", "metadata": {}, "outputs": [], "source": [ "# Create new directory\n", "dir_name = \"./data\"\n", "if(!dir.exists(dir_name)){dir.create(dir_name)}" ] }, { "cell_type": "code", "execution_count": 84, "id": "ee30ae3c-c1a6-43e1-bcec-8dca6e648307", "metadata": {}, "outputs": [ { "data": { "text/html": [ "'lp-prod-protected'" ], "text/latex": [ "'lp-prod-protected'" ], "text/markdown": [ "'lp-prod-protected'" ], "text/plain": [ "[1] \"lp-prod-protected\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "'MOD13A1.A2000049.h02v06.061.2020041151125.hdf'" ], "text/latex": [ "'MOD13A1.A2000049.h02v06.061.2020041151125.hdf'" ], "text/markdown": [ "'MOD13A1.A2000049.h02v06.061.2020041151125.hdf'" ], "text/plain": [ "[1] \"MOD13A1.A2000049.h02v06.061.2020041151125.hdf\"" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "'MOD13A1.061/MOD13A1.A2000049.h02v06.061.2020041151125/MOD13A1.A2000049.h02v06.061.2020041151125.hdf'" ], "text/latex": [ "'MOD13A1.061/MOD13A1.A2000049.h02v06.061.2020041151125/MOD13A1.A2000049.h02v06.061.2020041151125.hdf'" ], "text/markdown": [ "'MOD13A1.061/MOD13A1.A2000049.h02v06.061.2020041151125/MOD13A1.A2000049.h02v06.061.2020041151125.hdf'" ], "text/plain": [ "[1] \"MOD13A1.061/MOD13A1.A2000049.h02v06.061.2020041151125/MOD13A1.A2000049.h02v06.061.2020041151125.hdf\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Get bucket from file path\n", "s3_parts <- strsplit(sub(\"s3://\",\"\", s3_link), \"/\", fixed = TRUE)[[1]] # drop the s3 prefix\n", "bucket <- s3_parts[1] # grab the 1st item which is the bucket name\n", "bucket\n", "\n", "# Create file name for download\n", "filename <- tail(s3_parts, n=1) # grab the last part of the path\n", "filename\n", "\n", "# Get key from file path\n", "key <- paste(tail(s3_parts, n=-1), collapse='/') # grab everything in the path, except the 1st item\n", "key" ] }, { "cell_type": "markdown", "id": "5570ef5d-69fd-4c81-9795-63fdba015532", "metadata": {}, "source": [ "Now we can download our file." ] }, { "cell_type": "code", "execution_count": 85, "id": "7c410f1f-3359-4f27-aef5-b3519cbdd6da", "metadata": {}, "outputs": [], "source": [ "modis_file <- s3$download_file(Bucket = bucket, Key = key, Filename = paste(\"./data/\", filename))" ] }, { "cell_type": "markdown", "id": "8e9e076c-0cd0-489d-92ee-6553987ca154", "metadata": {}, "source": [ "### Access the Downloaded File" ] }, { "cell_type": "markdown", "id": "7f0fd644-dd63-4216-917c-da68e9930c87", "metadata": {}, "source": [ "The data has been downloaded and we can open the file. Since this is an HDF4 file, we can use the `ncdf4` package to open and work with it." ] }, { "cell_type": "code", "execution_count": 86, "id": "c781de3c-e2ab-4e03-b2bb-ba6f200f4417", "metadata": {}, "outputs": [], "source": [ "modis_file <- nc_open(paste(\"./data/\", filename))" ] }, { "cell_type": "markdown", "id": "0a657df0-4520-49b9-b607-45bc13721ace", "metadata": {}, "source": [ "The desired information can now be obtained from the opened file. For example, let's print the variable names." ] }, { "cell_type": "code", "execution_count": 87, "id": "b7cc0e93-81e3-47fe-a15d-d0bd82d5fa6d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. '500m 16 days NDVI'
  2. '500m 16 days EVI'
  3. '500m 16 days VI Quality'
  4. '500m 16 days red reflectance'
  5. '500m 16 days NIR reflectance'
  6. '500m 16 days blue reflectance'
  7. '500m 16 days MIR reflectance'
  8. '500m 16 days view zenith angle'
  9. '500m 16 days sun zenith angle'
  10. '500m 16 days relative azimuth angle'
  11. '500m 16 days composite day of the year'
  12. '500m 16 days pixel reliability'
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item '500m 16 days NDVI'\n", "\\item '500m 16 days EVI'\n", "\\item '500m 16 days VI Quality'\n", "\\item '500m 16 days red reflectance'\n", "\\item '500m 16 days NIR reflectance'\n", "\\item '500m 16 days blue reflectance'\n", "\\item '500m 16 days MIR reflectance'\n", "\\item '500m 16 days view zenith angle'\n", "\\item '500m 16 days sun zenith angle'\n", "\\item '500m 16 days relative azimuth angle'\n", "\\item '500m 16 days composite day of the year'\n", "\\item '500m 16 days pixel reliability'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. '500m 16 days NDVI'\n", "2. '500m 16 days EVI'\n", "3. '500m 16 days VI Quality'\n", "4. '500m 16 days red reflectance'\n", "5. '500m 16 days NIR reflectance'\n", "6. '500m 16 days blue reflectance'\n", "7. '500m 16 days MIR reflectance'\n", "8. '500m 16 days view zenith angle'\n", "9. '500m 16 days sun zenith angle'\n", "10. '500m 16 days relative azimuth angle'\n", "11. '500m 16 days composite day of the year'\n", "12. '500m 16 days pixel reliability'\n", "\n", "\n" ], "text/plain": [ " [1] \"500m 16 days NDVI\" \n", " [2] \"500m 16 days EVI\" \n", " [3] \"500m 16 days VI Quality\" \n", " [4] \"500m 16 days red reflectance\" \n", " [5] \"500m 16 days NIR reflectance\" \n", " [6] \"500m 16 days blue reflectance\" \n", " [7] \"500m 16 days MIR reflectance\" \n", " [8] \"500m 16 days view zenith angle\" \n", " [9] \"500m 16 days sun zenith angle\" \n", "[10] \"500m 16 days relative azimuth angle\" \n", "[11] \"500m 16 days composite day of the year\"\n", "[12] \"500m 16 days pixel reliability\" " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "names(modis_file$var)" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.3.3" } }, "nbformat": 4, "nbformat_minor": 5 }