{ "cells": [ { "cell_type": "markdown", "id": "7cd9f447-0ffa-4655-8e78-4059687388f5", "metadata": {}, "source": [ "# GEDI Data Access \n", "\n", "Authors: Harshini Girish (UAH), Sheyenne Kirkland (UAH), Alex Mandel (Development Seed), Henry Rodman (Development Seed), Zac Deziel (Development Seed)\n", "\n", "Date: April 15, 2025\n", "\n", "Description: In this notebook, users will learn how to search for GEDI data using `maap-py`, download it, and then open it using `rhdf5`." ] }, { "cell_type": "markdown", "id": "b7c5913a-a549-49c0-96d7-64b3b5c57972", "metadata": {}, "source": [ "## Run This Notebook\n", "\n", "To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the [\"Getting started with the MAAP\"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.\n", "\n", "Disclaimer: it is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within an \"R/Python\" workspace." ] }, { "cell_type": "markdown", "id": "b3029ee9-143e-4fb9-bccd-fb2ab8fd7c46", "metadata": {}, "source": [ "## Additional Resources\n", "- [rhdf5](https://www.bioconductor.org/packages/release/bioc/html/rhdf5.html)\n", " - The `rhdf5` package page, with installation instructions, documentation, and more.\n", " \n", "- [NASA's Operational CMR (MAAP Docs)](https://docs.maap-project.org/en/latest/technical_tutorials/search/catalog.html#nasa-s-operational-cmr) \n", " - A section in the MAAP Docs offering an overview of resources to search and access NASA's CMR.\n", "\n", "- [GEDI02_A v2 Dataset Landing Page](https://lpdaac.usgs.gov/products/gedi02_av002/)\n", " - Learn more about NASA's GEDI L2A dataset, which is accessed in this notebook.\n" ] }, { "cell_type": "markdown", "id": "481f3d0b-fa32-4830-8cfd-fa5749661dcb", "metadata": {}, "source": [ "## Install and Load Required Libraries\n", "Let’s install and load the packages necessary for this tutorial." ] }, { "cell_type": "code", "execution_count": 64, "id": "424804c9-0236-434e-adef-57b0067e0293", "metadata": {}, "outputs": [], "source": [ "library(\"rhdf5\") # to read HDF5 files \n", "library(\"reticulate\") # to use maap-py python" ] }, { "cell_type": "markdown", "id": "f75c8e55-655b-4af4-8a32-e5f056451a1d", "metadata": {}, "source": [ "Let's also invoke the `MAAP` constructor. This will allow us to use the python-based `maap-py` library from R, which will be used to get credentials and conduct a NASA CMR search." ] }, { "cell_type": "code", "execution_count": 65, "id": "49caadc2-6d06-4190-8a0e-0389ba10343a", "metadata": {}, "outputs": [], "source": [ "maap_py <- import(\"maap.maap\")\n", "maap <- maap_py$MAAP()" ] }, { "cell_type": "markdown", "id": "8b8999df-9875-4d26-8787-67fe10a7c3f4", "metadata": {}, "source": [ "## Collection and Granule Search\n", "\n", "Using `maap-py`, we can conduct a collection and granule search for data within NASA's CMR. For this example, we'll use data available within the GEDI L2A collection. For more information on CMR searching in R, see [\"Searching for Data in NASA's CMR in R\"](https://docs.maap-project.org/en/develop/technical_tutorials/working_with_r/cmr_search_in_r.html). " ] }, { "cell_type": "code", "execution_count": 66, "id": "00960e01-9d92-4cd4-8243-5a2276a50a6e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collection Concept ID: C2237824918-ORNL_CLOUD \n", "Granules:\n", " [1] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_01_T02638_02_002_02_V002.h5\"\n", " [2] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_02_T02638_02_002_02_V002.h5\"\n", " [3] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_03_T02638_02_002_02_V002.h5\"\n", " [4] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_04_T02638_02_002_02_V002.h5\"\n", " [5] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_01_T03909_02_002_02_V002.h5\"\n", " [6] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_02_T03909_02_002_02_V002.h5\"\n", " [7] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_03_T03909_02_002_02_V002.h5\"\n", " [8] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_04_T03909_02_002_02_V002.h5\"\n", " [9] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108015253_O01960_01_T03910_02_002_02_V002.h5\"\n", "[10] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108015253_O01960_02_T03910_02_002_02_V002.h5\"\n" ] } ], "source": [ "# search for a GEDI collection\n", "gedi_collections <- maap$searchCollection(\n", " short_name = \"GEDI_L4A_AGB_Density_V2_1_2056\",\n", " version = \"2.1\",\n", " cmr_host = \"cmr.earthdata.nasa.gov\",\n", " cloud_hosted = \"true\"\n", ")\n", "\n", "# get collection ID for granule search\n", "collection_concept_id <- gedi_collections[[1]][[\"concept-id\"]]\n", "cat(\"Collection Concept ID:\", collection_concept_id, \"\\n\")\n", "\n", "# search for the first granules\n", "gedi_granules <- maap$searchGranule(\n", " collection_concept_id = collection_concept_id,\n", " limit = as.integer(10),\n", " cmr_host = \"cmr.earthdata.nasa.gov\"\n", ")\n", "\n", "granule_names <- sapply(gedi_granules, function(names) names[[\"Granule\"]][[\"GranuleUR\"]])\n", "cat(\"Granules:\\n\")\n", "print(granule_names)" ] }, { "cell_type": "markdown", "id": "46a5ee49-1aea-47db-b00e-84ae80d79fc8", "metadata": {}, "source": [ "Let's get the S3 URL from the first granule from our granule search." ] }, { "cell_type": "code", "execution_count": 67, "id": "2af4e627-e280-4db0-ae3f-615a371c579e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"s3://ornl-cumulus-prod-protected/gedi/GEDI_L4A_AGB_Density_V2_1/data/GEDI04_A_2019107224731_O01958_01_T02638_02_002_02_V002.h5\"\n" ] } ], "source": [ "s3_link <- gedi_granules[[1]][\"Granule\"][\"OnlineAccessURLs\"][[1]][1][\"URL\"]\n", "print(s3_link)" ] }, { "cell_type": "markdown", "id": "e030f8f1-9d8a-4efd-95a9-359a9015029b", "metadata": {}, "source": [ "## Get Credentials\n", "\n", "Since we will be downloading the GEDI data, we will need temporary credentials for NASA ORNL DAAC." ] }, { "cell_type": "code", "execution_count": 68, "id": "2e712519-fba8-4c14-8721-5836140e40a1", "metadata": {}, "outputs": [], "source": [ "credentials <- maap$aws$earthdata_s3_credentials(\n", " \"https://data.ornldaac.earthdata.nasa.gov/s3credentials\"\n", ")\n", "\n", "s3 <- paws::s3(\n", " credentials = list(\n", " creds = list(\n", " access_key_id = credentials[\"accessKeyId\"],\n", " secret_access_key = credentials[\"secretAccessKey\"],\n", " session_token = credentials[\"sessionToken\"]\n", " )),\n", " region = \"us-west-2\")" ] }, { "cell_type": "markdown", "id": "f530a65b-390e-4423-a299-adb09cda8665", "metadata": {}, "source": [ "## Download File" ] }, { "cell_type": "markdown", "id": "389b790c-196e-43a4-b850-341ab2557f79", "metadata": {}, "source": [ "Before downloading, lets do some prepping. First we'll create a directory to download our file to. Then from our S3 link, we can get the bucket, key, and a filename." ] }, { "cell_type": "code", "execution_count": 69, "id": "d27a1cb9-8f1b-4a83-a7ae-1f1647b36196", "metadata": {}, "outputs": [], "source": [ "# create directory\n", "download_dir = file.path(getwd(), \"data\")\n", "dir.create(download_dir, showWarnings = FALSE, recursive = TRUE)" ] }, { "cell_type": "code", "execution_count": 70, "id": "672d1720-0eea-40dc-abbb-8fab3c9749b7", "metadata": {}, "outputs": [], "source": [ "# get bucket from file path\n", "s3_parts <- strsplit(sub(\"s3://\",\"\", s3_link), \"/\", fixed = TRUE)[[1]] # drop the s3 prefix\n", "bucket <- s3_parts[1] # grab the 1st item which is the bucket name\n", "\n", "# create file name for download\n", "filename <- tail(s3_parts, n=1) # grab the last part of the path\n", "download_file <- file.path(download_dir, filename)\n", "\n", "# get key from file path\n", "key <- paste(tail(s3_parts, n=-1), collapse='/') # grab everything in the path, except the 1st item" ] }, { "cell_type": "markdown", "id": "a2a9ebf2-d5e6-4c01-9570-851541a524ab", "metadata": {}, "source": [ "Now we can download our file." ] }, { "cell_type": "code", "execution_count": 71, "id": "176e33df-0b43-429d-b5d4-71344ab881d6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
    \n", "
\n" ], "text/latex": [ "\\begin{enumerate}\n", "\\end{enumerate}\n" ], "text/markdown": [ "\n", "\n" ], "text/plain": [ "list()" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "s3$download_file(Bucket = bucket, Key = key, Filename = download_file)" ] }, { "cell_type": "markdown", "id": "350ad311-0d79-42e8-aac1-c43945762cd1", "metadata": {}, "source": [ "## Access Data\n", "\n", "Now that we have our downloaded data, we can use `rhdf5` to open our file for exploration." ] }, { "cell_type": "code", "execution_count": 84, "id": "1b1c9b3e-0250-4079-ac1d-4dbfe1fef3df", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 6 × 5
groupnameotypedclassdim
<chr><chr><chr><chr><chr>
0/ ANCILLARY H5I_GROUP
1/ANCILLARYmodel_dataH5I_DATASETCOMPOUND35
2/ANCILLARYpft_lut H5I_DATASETCOMPOUND7
3/ANCILLARYregion_lutH5I_DATASETCOMPOUND7
4/ BEAM0000 H5I_GROUP
5/BEAM0000 agbd H5I_DATASETFLOAT 48675
\n" ], "text/latex": [ "A data.frame: 6 × 5\n", "\\begin{tabular}{r|lllll}\n", " & group & name & otype & dclass & dim\\\\\n", " & & & & & \\\\\n", "\\hline\n", "\t0 & / & ANCILLARY & H5I\\_GROUP & & \\\\\n", "\t1 & /ANCILLARY & model\\_data & H5I\\_DATASET & COMPOUND & 35 \\\\\n", "\t2 & /ANCILLARY & pft\\_lut & H5I\\_DATASET & COMPOUND & 7 \\\\\n", "\t3 & /ANCILLARY & region\\_lut & H5I\\_DATASET & COMPOUND & 7 \\\\\n", "\t4 & / & BEAM0000 & H5I\\_GROUP & & \\\\\n", "\t5 & /BEAM0000 & agbd & H5I\\_DATASET & FLOAT & 48675\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 6 × 5\n", "\n", "| | group <chr> | name <chr> | otype <chr> | dclass <chr> | dim <chr> |\n", "|---|---|---|---|---|---|\n", "| 0 | / | ANCILLARY | H5I_GROUP | | |\n", "| 1 | /ANCILLARY | model_data | H5I_DATASET | COMPOUND | 35 |\n", "| 2 | /ANCILLARY | pft_lut | H5I_DATASET | COMPOUND | 7 |\n", "| 3 | /ANCILLARY | region_lut | H5I_DATASET | COMPOUND | 7 |\n", "| 4 | / | BEAM0000 | H5I_GROUP | | |\n", "| 5 | /BEAM0000 | agbd | H5I_DATASET | FLOAT | 48675 |\n", "\n" ], "text/plain": [ " group name otype dclass dim \n", "0 / ANCILLARY H5I_GROUP \n", "1 /ANCILLARY model_data H5I_DATASET COMPOUND 35 \n", "2 /ANCILLARY pft_lut H5I_DATASET COMPOUND 7 \n", "3 /ANCILLARY region_lut H5I_DATASET COMPOUND 7 \n", "4 / BEAM0000 H5I_GROUP \n", "5 /BEAM0000 agbd H5I_DATASET FLOAT 48675" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gedi_data <- h5ls(download_file)\n", "head(gedi_data)" ] }, { "cell_type": "markdown", "id": "fc3f5b0e-23ca-4706-b268-d1364bdee6e4", "metadata": {}, "source": [ "We can extract the different beams associated with GEDI L2A." ] }, { "cell_type": "code", "execution_count": 85, "id": "6a1f6404-c41b-4707-8324-8794a418b738", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
  1. '/BEAM0000'
  2. '/BEAM0001'
  3. '/BEAM0010'
  4. '/BEAM0011'
  5. '/BEAM0101'
  6. '/BEAM0110'
  7. '/BEAM1000'
  8. '/BEAM1011'
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item '/BEAM0000'\n", "\\item '/BEAM0001'\n", "\\item '/BEAM0010'\n", "\\item '/BEAM0011'\n", "\\item '/BEAM0101'\n", "\\item '/BEAM0110'\n", "\\item '/BEAM1000'\n", "\\item '/BEAM1011'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. '/BEAM0000'\n", "2. '/BEAM0001'\n", "3. '/BEAM0010'\n", "4. '/BEAM0011'\n", "5. '/BEAM0101'\n", "6. '/BEAM0110'\n", "7. '/BEAM1000'\n", "8. '/BEAM1011'\n", "\n", "\n" ], "text/plain": [ "[1] \"/BEAM0000\" \"/BEAM0001\" \"/BEAM0010\" \"/BEAM0011\" \"/BEAM0101\" \"/BEAM0110\"\n", "[7] \"/BEAM1000\" \"/BEAM1011\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "beams <- paste0(\"/\", gedi_data[grep(\"^BEAM\", gedi_data$name),]$name)\n", "beams" ] }, { "cell_type": "markdown", "id": "40c4ccda-20cc-4017-bc6b-6c99d327cfb7", "metadata": {}, "source": [ "Now that we have a list of beams, we can see what data is held within each beam. Let's create a dataframe with all variables associated with `/BEAM0001` and their dimensions (how many rows of data are available within each variable)." ] }, { "cell_type": "code", "execution_count": 86, "id": "33143f08-f386-429e-a051-c415277e7225", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Available variables for /BEAM0001 and their dimensions:\n", " name dim\n", "192 agbd 47789\n", "193 agbd_pi_lower 47789\n", "194 agbd_pi_upper 47789\n", "195 agbd_prediction \n", "309 agbd_se 47789\n", "310 agbd_t 47789\n", "311 agbd_t_se 47789\n", "312 algorithm_run_flag 47789\n", "313 beam 47789\n", "314 channel 47789\n", "315 degrade_flag 47789\n", "316 delta_time 47789\n", "317 elev_lowestmode 47789\n", "318 geolocation \n", "349 l2_quality_flag 47789\n", "350 l4_quality_flag 47789\n", "351 land_cover_data \n", "363 lat_lowestmode 47789\n", "364 lon_lowestmode 47789\n", "365 master_frac 47789\n", "366 master_int 47789\n", "367 predict_stratum 47789\n", "368 predictor_limit_flag 47789\n", "369 response_limit_flag 47789\n", "370 selected_algorithm 47789\n", "371 selected_mode 47789\n", "372 selected_mode_flag 47789\n", "373 sensitivity 47789\n", "374 shot_number 47789\n", "375 solar_elevation 47789\n", "376 surface_flag 47789\n", "377 xvar 4 x 47789\n" ] } ], "source": [ "beam_variables <- gedi_data[gedi_data$group == beams[2],]\n", "\n", "cat(\"Available variables for /BEAM0001 and their dimensions:\\n\")\n", "print(beam_variables[, c(\"name\", \"dim\")])" ] }, { "cell_type": "markdown", "id": "a549e718-65ec-4406-be32-5a047652c810", "metadata": {}, "source": [ "Let's read some of the data associated with specific variables, and load them into a dataframe." ] }, { "cell_type": "code", "execution_count": 88, "id": "23e30a0f-dc69-41b3-9147-88f184b51da4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 6 × 5
latitudelongitudeelevationshot_numberagbd
<dbl><dbl><dbl><int64><dbl>
36569-4.637412103.87793288.70019580100100036569398.62744
36580-4.632800103.88123391.72319580100100036580565.04077
36581-4.632382103.88153412.30419580100100036581378.42584
36585-4.630685103.88273344.15819580100100036585265.46426
36586-4.630273103.88303393.29219580100100036586323.67648
36588-4.629430103.88363388.07319580100100036588 36.59831
\n" ], "text/latex": [ "A data.frame: 6 × 5\n", "\\begin{tabular}{r|lllll}\n", " & latitude & longitude & elevation & shot\\_number & agbd\\\\\n", " & & & & & \\\\\n", "\\hline\n", "\t36569 & -4.637412 & 103.8779 & 3288.700 & 19580100100036569 & 398.62744\\\\\n", "\t36580 & -4.632800 & 103.8812 & 3391.723 & 19580100100036580 & 565.04077\\\\\n", "\t36581 & -4.632382 & 103.8815 & 3412.304 & 19580100100036581 & 378.42584\\\\\n", "\t36585 & -4.630685 & 103.8827 & 3344.158 & 19580100100036585 & 265.46426\\\\\n", "\t36586 & -4.630273 & 103.8830 & 3393.292 & 19580100100036586 & 323.67648\\\\\n", "\t36588 & -4.629430 & 103.8836 & 3388.073 & 19580100100036588 & 36.59831\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 6 × 5\n", "\n", "| | latitude <dbl> | longitude <dbl> | elevation <dbl> | shot_number <int64> | agbd <dbl> |\n", "|---|---|---|---|---|---|\n", "| 36569 | -4.637412 | 103.8779 | 3288.700 | 19580100100036569 | 398.62744 |\n", "| 36580 | -4.632800 | 103.8812 | 3391.723 | 19580100100036580 | 565.04077 |\n", "| 36581 | -4.632382 | 103.8815 | 3412.304 | 19580100100036581 | 378.42584 |\n", "| 36585 | -4.630685 | 103.8827 | 3344.158 | 19580100100036585 | 265.46426 |\n", "| 36586 | -4.630273 | 103.8830 | 3393.292 | 19580100100036586 | 323.67648 |\n", "| 36588 | -4.629430 | 103.8836 | 3388.073 | 19580100100036588 | 36.59831 |\n", "\n" ], "text/plain": [ " latitude longitude elevation shot_number agbd \n", "36569 -4.637412 103.8779 3288.700 19580100100036569 398.62744\n", "36580 -4.632800 103.8812 3391.723 19580100100036580 565.04077\n", "36581 -4.632382 103.8815 3412.304 19580100100036581 378.42584\n", "36585 -4.630685 103.8827 3344.158 19580100100036585 265.46426\n", "36586 -4.630273 103.8830 3393.292 19580100100036586 323.67648\n", "36588 -4.629430 103.8836 3388.073 19580100100036588 36.59831" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# set variables\n", "lats <- h5read(download_file, \"/BEAM0001/lat_lowestmode\")\n", "lons <- h5read(download_file, \"/BEAM0001/lon_lowestmode\")\n", "elev <- h5read(download_file, \"/BEAM0001/elev_lowestmode\")\n", "shot_num <- h5read(download_file, \"/BEAM0001/shot_number\", bit64conversion='bit64')\n", "agbd <- h5read(download_file, \"/BEAM0001/agbd\")\n", "\n", "# create dataframe\n", "gedi_df <- data.frame(latitude = lats, longitude = lons, elevation = elev, shot_number = shot_num, agbd = agbd)\n", "head(gedi_df[!(gedi_df$agbd %in% \"-9999\"),]) # drop missing values, load first few rows" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.3.3" } }, "nbformat": 4, "nbformat_minor": 5 }