{
"cells": [
{
"cell_type": "markdown",
"id": "7cd9f447-0ffa-4655-8e78-4059687388f5",
"metadata": {},
"source": [
"# GEDI Data Access \n",
"\n",
"Authors: Harshini Girish (UAH), Sheyenne Kirkland (UAH), Alex Mandel (Development Seed), Henry Rodman (Development Seed), Zac Deziel (Development Seed)\n",
"\n",
"Date: April 15, 2025\n",
"\n",
"Description: In this notebook, users will learn how to search for GEDI data using `maap-py`, download it, and then open it using `rhdf5`."
]
},
{
"cell_type": "markdown",
"id": "b7c5913a-a549-49c0-96d7-64b3b5c57972",
"metadata": {},
"source": [
"## Run This Notebook\n",
"\n",
"To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the [\"Getting started with the MAAP\"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.\n",
"\n",
"Disclaimer: it is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within an \"R/Python\" workspace."
]
},
{
"cell_type": "markdown",
"id": "b3029ee9-143e-4fb9-bccd-fb2ab8fd7c46",
"metadata": {},
"source": [
"## Additional Resources\n",
"- [rhdf5](https://www.bioconductor.org/packages/release/bioc/html/rhdf5.html)\n",
" - The `rhdf5` package page, with installation instructions, documentation, and more.\n",
" \n",
"- [NASA's Operational CMR (MAAP Docs)](https://docs.maap-project.org/en/latest/technical_tutorials/search/catalog.html#nasa-s-operational-cmr) \n",
" - A section in the MAAP Docs offering an overview of resources to search and access NASA's CMR.\n",
"\n",
"- [GEDI02_A v2 Dataset Landing Page](https://lpdaac.usgs.gov/products/gedi02_av002/)\n",
" - Learn more about NASA's GEDI L2A dataset, which is accessed in this notebook.\n"
]
},
{
"cell_type": "markdown",
"id": "481f3d0b-fa32-4830-8cfd-fa5749661dcb",
"metadata": {},
"source": [
"## Install and Load Required Libraries\n",
"Let’s install and load the packages necessary for this tutorial."
]
},
{
"cell_type": "code",
"execution_count": 64,
"id": "424804c9-0236-434e-adef-57b0067e0293",
"metadata": {},
"outputs": [],
"source": [
"library(\"rhdf5\") # to read HDF5 files \n",
"library(\"reticulate\") # to use maap-py python"
]
},
{
"cell_type": "markdown",
"id": "f75c8e55-655b-4af4-8a32-e5f056451a1d",
"metadata": {},
"source": [
"Let's also invoke the `MAAP` constructor. This will allow us to use the python-based `maap-py` library from R, which will be used to get credentials and conduct a NASA CMR search."
]
},
{
"cell_type": "code",
"execution_count": 65,
"id": "49caadc2-6d06-4190-8a0e-0389ba10343a",
"metadata": {},
"outputs": [],
"source": [
"maap_py <- import(\"maap.maap\")\n",
"maap <- maap_py$MAAP()"
]
},
{
"cell_type": "markdown",
"id": "8b8999df-9875-4d26-8787-67fe10a7c3f4",
"metadata": {},
"source": [
"## Collection and Granule Search\n",
"\n",
"Using `maap-py`, we can conduct a collection and granule search for data within NASA's CMR. For this example, we'll use data available within the GEDI L2A collection. For more information on CMR searching in R, see [\"Searching for Data in NASA's CMR in R\"](https://docs.maap-project.org/en/develop/technical_tutorials/working_with_r/cmr_search_in_r.html). "
]
},
{
"cell_type": "code",
"execution_count": 66,
"id": "00960e01-9d92-4cd4-8243-5a2276a50a6e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collection Concept ID: C2237824918-ORNL_CLOUD \n",
"Granules:\n",
" [1] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_01_T02638_02_002_02_V002.h5\"\n",
" [2] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_02_T02638_02_002_02_V002.h5\"\n",
" [3] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_03_T02638_02_002_02_V002.h5\"\n",
" [4] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_04_T02638_02_002_02_V002.h5\"\n",
" [5] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_01_T03909_02_002_02_V002.h5\"\n",
" [6] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_02_T03909_02_002_02_V002.h5\"\n",
" [7] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_03_T03909_02_002_02_V002.h5\"\n",
" [8] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_04_T03909_02_002_02_V002.h5\"\n",
" [9] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108015253_O01960_01_T03910_02_002_02_V002.h5\"\n",
"[10] \"GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108015253_O01960_02_T03910_02_002_02_V002.h5\"\n"
]
}
],
"source": [
"# search for a GEDI collection\n",
"gedi_collections <- maap$searchCollection(\n",
" short_name = \"GEDI_L4A_AGB_Density_V2_1_2056\",\n",
" version = \"2.1\",\n",
" cmr_host = \"cmr.earthdata.nasa.gov\",\n",
" cloud_hosted = \"true\"\n",
")\n",
"\n",
"# get collection ID for granule search\n",
"collection_concept_id <- gedi_collections[[1]][[\"concept-id\"]]\n",
"cat(\"Collection Concept ID:\", collection_concept_id, \"\\n\")\n",
"\n",
"# search for the first granules\n",
"gedi_granules <- maap$searchGranule(\n",
" collection_concept_id = collection_concept_id,\n",
" limit = as.integer(10),\n",
" cmr_host = \"cmr.earthdata.nasa.gov\"\n",
")\n",
"\n",
"granule_names <- sapply(gedi_granules, function(names) names[[\"Granule\"]][[\"GranuleUR\"]])\n",
"cat(\"Granules:\\n\")\n",
"print(granule_names)"
]
},
{
"cell_type": "markdown",
"id": "46a5ee49-1aea-47db-b00e-84ae80d79fc8",
"metadata": {},
"source": [
"Let's get the S3 URL from the first granule from our granule search."
]
},
{
"cell_type": "code",
"execution_count": 67,
"id": "2af4e627-e280-4db0-ae3f-615a371c579e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] \"s3://ornl-cumulus-prod-protected/gedi/GEDI_L4A_AGB_Density_V2_1/data/GEDI04_A_2019107224731_O01958_01_T02638_02_002_02_V002.h5\"\n"
]
}
],
"source": [
"s3_link <- gedi_granules[[1]][\"Granule\"][\"OnlineAccessURLs\"][[1]][1][\"URL\"]\n",
"print(s3_link)"
]
},
{
"cell_type": "markdown",
"id": "e030f8f1-9d8a-4efd-95a9-359a9015029b",
"metadata": {},
"source": [
"## Get Credentials\n",
"\n",
"Since we will be downloading the GEDI data, we will need temporary credentials for NASA ORNL DAAC."
]
},
{
"cell_type": "code",
"execution_count": 68,
"id": "2e712519-fba8-4c14-8721-5836140e40a1",
"metadata": {},
"outputs": [],
"source": [
"credentials <- maap$aws$earthdata_s3_credentials(\n",
" \"https://data.ornldaac.earthdata.nasa.gov/s3credentials\"\n",
")\n",
"\n",
"s3 <- paws::s3(\n",
" credentials = list(\n",
" creds = list(\n",
" access_key_id = credentials[\"accessKeyId\"],\n",
" secret_access_key = credentials[\"secretAccessKey\"],\n",
" session_token = credentials[\"sessionToken\"]\n",
" )),\n",
" region = \"us-west-2\")"
]
},
{
"cell_type": "markdown",
"id": "f530a65b-390e-4423-a299-adb09cda8665",
"metadata": {},
"source": [
"## Download File"
]
},
{
"cell_type": "markdown",
"id": "389b790c-196e-43a4-b850-341ab2557f79",
"metadata": {},
"source": [
"Before downloading, lets do some prepping. First we'll create a directory to download our file to. Then from our S3 link, we can get the bucket, key, and a filename."
]
},
{
"cell_type": "code",
"execution_count": 69,
"id": "d27a1cb9-8f1b-4a83-a7ae-1f1647b36196",
"metadata": {},
"outputs": [],
"source": [
"# create directory\n",
"download_dir = file.path(getwd(), \"data\")\n",
"dir.create(download_dir, showWarnings = FALSE, recursive = TRUE)"
]
},
{
"cell_type": "code",
"execution_count": 70,
"id": "672d1720-0eea-40dc-abbb-8fab3c9749b7",
"metadata": {},
"outputs": [],
"source": [
"# get bucket from file path\n",
"s3_parts <- strsplit(sub(\"s3://\",\"\", s3_link), \"/\", fixed = TRUE)[[1]] # drop the s3 prefix\n",
"bucket <- s3_parts[1] # grab the 1st item which is the bucket name\n",
"\n",
"# create file name for download\n",
"filename <- tail(s3_parts, n=1) # grab the last part of the path\n",
"download_file <- file.path(download_dir, filename)\n",
"\n",
"# get key from file path\n",
"key <- paste(tail(s3_parts, n=-1), collapse='/') # grab everything in the path, except the 1st item"
]
},
{
"cell_type": "markdown",
"id": "a2a9ebf2-d5e6-4c01-9570-851541a524ab",
"metadata": {},
"source": [
"Now we can download our file."
]
},
{
"cell_type": "code",
"execution_count": 71,
"id": "176e33df-0b43-429d-b5d4-71344ab881d6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate}\n",
"\\end{enumerate}\n"
],
"text/markdown": [
"\n",
"\n"
],
"text/plain": [
"list()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"s3$download_file(Bucket = bucket, Key = key, Filename = download_file)"
]
},
{
"cell_type": "markdown",
"id": "350ad311-0d79-42e8-aac1-c43945762cd1",
"metadata": {},
"source": [
"## Access Data\n",
"\n",
"Now that we have our downloaded data, we can use `rhdf5` to open our file for exploration."
]
},
{
"cell_type": "code",
"execution_count": 84,
"id": "1b1c9b3e-0250-4079-ac1d-4dbfe1fef3df",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"A data.frame: 6 × 5\n",
"\n",
"\t | group | name | otype | dclass | dim |
\n",
"\t | <chr> | <chr> | <chr> | <chr> | <chr> |
\n",
"\n",
"\n",
"\t| 0 | / | ANCILLARY | H5I_GROUP | | |
\n",
"\t| 1 | /ANCILLARY | model_data | H5I_DATASET | COMPOUND | 35 |
\n",
"\t| 2 | /ANCILLARY | pft_lut | H5I_DATASET | COMPOUND | 7 |
\n",
"\t| 3 | /ANCILLARY | region_lut | H5I_DATASET | COMPOUND | 7 |
\n",
"\t| 4 | / | BEAM0000 | H5I_GROUP | | |
\n",
"\t| 5 | /BEAM0000 | agbd | H5I_DATASET | FLOAT | 48675 |
\n",
"\n",
"
\n"
],
"text/latex": [
"A data.frame: 6 × 5\n",
"\\begin{tabular}{r|lllll}\n",
" & group & name & otype & dclass & dim\\\\\n",
" & & & & & \\\\\n",
"\\hline\n",
"\t0 & / & ANCILLARY & H5I\\_GROUP & & \\\\\n",
"\t1 & /ANCILLARY & model\\_data & H5I\\_DATASET & COMPOUND & 35 \\\\\n",
"\t2 & /ANCILLARY & pft\\_lut & H5I\\_DATASET & COMPOUND & 7 \\\\\n",
"\t3 & /ANCILLARY & region\\_lut & H5I\\_DATASET & COMPOUND & 7 \\\\\n",
"\t4 & / & BEAM0000 & H5I\\_GROUP & & \\\\\n",
"\t5 & /BEAM0000 & agbd & H5I\\_DATASET & FLOAT & 48675\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 6 × 5\n",
"\n",
"| | group <chr> | name <chr> | otype <chr> | dclass <chr> | dim <chr> |\n",
"|---|---|---|---|---|---|\n",
"| 0 | / | ANCILLARY | H5I_GROUP | | |\n",
"| 1 | /ANCILLARY | model_data | H5I_DATASET | COMPOUND | 35 |\n",
"| 2 | /ANCILLARY | pft_lut | H5I_DATASET | COMPOUND | 7 |\n",
"| 3 | /ANCILLARY | region_lut | H5I_DATASET | COMPOUND | 7 |\n",
"| 4 | / | BEAM0000 | H5I_GROUP | | |\n",
"| 5 | /BEAM0000 | agbd | H5I_DATASET | FLOAT | 48675 |\n",
"\n"
],
"text/plain": [
" group name otype dclass dim \n",
"0 / ANCILLARY H5I_GROUP \n",
"1 /ANCILLARY model_data H5I_DATASET COMPOUND 35 \n",
"2 /ANCILLARY pft_lut H5I_DATASET COMPOUND 7 \n",
"3 /ANCILLARY region_lut H5I_DATASET COMPOUND 7 \n",
"4 / BEAM0000 H5I_GROUP \n",
"5 /BEAM0000 agbd H5I_DATASET FLOAT 48675"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"gedi_data <- h5ls(download_file)\n",
"head(gedi_data)"
]
},
{
"cell_type": "markdown",
"id": "fc3f5b0e-23ca-4706-b268-d1364bdee6e4",
"metadata": {},
"source": [
"We can extract the different beams associated with GEDI L2A."
]
},
{
"cell_type": "code",
"execution_count": 85,
"id": "6a1f6404-c41b-4707-8324-8794a418b738",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"- '/BEAM0000'
- '/BEAM0001'
- '/BEAM0010'
- '/BEAM0011'
- '/BEAM0101'
- '/BEAM0110'
- '/BEAM1000'
- '/BEAM1011'
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item '/BEAM0000'\n",
"\\item '/BEAM0001'\n",
"\\item '/BEAM0010'\n",
"\\item '/BEAM0011'\n",
"\\item '/BEAM0101'\n",
"\\item '/BEAM0110'\n",
"\\item '/BEAM1000'\n",
"\\item '/BEAM1011'\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. '/BEAM0000'\n",
"2. '/BEAM0001'\n",
"3. '/BEAM0010'\n",
"4. '/BEAM0011'\n",
"5. '/BEAM0101'\n",
"6. '/BEAM0110'\n",
"7. '/BEAM1000'\n",
"8. '/BEAM1011'\n",
"\n",
"\n"
],
"text/plain": [
"[1] \"/BEAM0000\" \"/BEAM0001\" \"/BEAM0010\" \"/BEAM0011\" \"/BEAM0101\" \"/BEAM0110\"\n",
"[7] \"/BEAM1000\" \"/BEAM1011\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"beams <- paste0(\"/\", gedi_data[grep(\"^BEAM\", gedi_data$name),]$name)\n",
"beams"
]
},
{
"cell_type": "markdown",
"id": "40c4ccda-20cc-4017-bc6b-6c99d327cfb7",
"metadata": {},
"source": [
"Now that we have a list of beams, we can see what data is held within each beam. Let's create a dataframe with all variables associated with `/BEAM0001` and their dimensions (how many rows of data are available within each variable)."
]
},
{
"cell_type": "code",
"execution_count": 86,
"id": "33143f08-f386-429e-a051-c415277e7225",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Available variables for /BEAM0001 and their dimensions:\n",
" name dim\n",
"192 agbd 47789\n",
"193 agbd_pi_lower 47789\n",
"194 agbd_pi_upper 47789\n",
"195 agbd_prediction \n",
"309 agbd_se 47789\n",
"310 agbd_t 47789\n",
"311 agbd_t_se 47789\n",
"312 algorithm_run_flag 47789\n",
"313 beam 47789\n",
"314 channel 47789\n",
"315 degrade_flag 47789\n",
"316 delta_time 47789\n",
"317 elev_lowestmode 47789\n",
"318 geolocation \n",
"349 l2_quality_flag 47789\n",
"350 l4_quality_flag 47789\n",
"351 land_cover_data \n",
"363 lat_lowestmode 47789\n",
"364 lon_lowestmode 47789\n",
"365 master_frac 47789\n",
"366 master_int 47789\n",
"367 predict_stratum 47789\n",
"368 predictor_limit_flag 47789\n",
"369 response_limit_flag 47789\n",
"370 selected_algorithm 47789\n",
"371 selected_mode 47789\n",
"372 selected_mode_flag 47789\n",
"373 sensitivity 47789\n",
"374 shot_number 47789\n",
"375 solar_elevation 47789\n",
"376 surface_flag 47789\n",
"377 xvar 4 x 47789\n"
]
}
],
"source": [
"beam_variables <- gedi_data[gedi_data$group == beams[2],]\n",
"\n",
"cat(\"Available variables for /BEAM0001 and their dimensions:\\n\")\n",
"print(beam_variables[, c(\"name\", \"dim\")])"
]
},
{
"cell_type": "markdown",
"id": "a549e718-65ec-4406-be32-5a047652c810",
"metadata": {},
"source": [
"Let's read some of the data associated with specific variables, and load them into a dataframe."
]
},
{
"cell_type": "code",
"execution_count": 88,
"id": "23e30a0f-dc69-41b3-9147-88f184b51da4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"A data.frame: 6 × 5\n",
"\n",
"\t | latitude | longitude | elevation | shot_number | agbd |
\n",
"\t | <dbl> | <dbl> | <dbl> | <int64> | <dbl> |
\n",
"\n",
"\n",
"\t| 36569 | -4.637412 | 103.8779 | 3288.700 | 19580100100036569 | 398.62744 |
\n",
"\t| 36580 | -4.632800 | 103.8812 | 3391.723 | 19580100100036580 | 565.04077 |
\n",
"\t| 36581 | -4.632382 | 103.8815 | 3412.304 | 19580100100036581 | 378.42584 |
\n",
"\t| 36585 | -4.630685 | 103.8827 | 3344.158 | 19580100100036585 | 265.46426 |
\n",
"\t| 36586 | -4.630273 | 103.8830 | 3393.292 | 19580100100036586 | 323.67648 |
\n",
"\t| 36588 | -4.629430 | 103.8836 | 3388.073 | 19580100100036588 | 36.59831 |
\n",
"\n",
"
\n"
],
"text/latex": [
"A data.frame: 6 × 5\n",
"\\begin{tabular}{r|lllll}\n",
" & latitude & longitude & elevation & shot\\_number & agbd\\\\\n",
" & & & & & \\\\\n",
"\\hline\n",
"\t36569 & -4.637412 & 103.8779 & 3288.700 & 19580100100036569 & 398.62744\\\\\n",
"\t36580 & -4.632800 & 103.8812 & 3391.723 & 19580100100036580 & 565.04077\\\\\n",
"\t36581 & -4.632382 & 103.8815 & 3412.304 & 19580100100036581 & 378.42584\\\\\n",
"\t36585 & -4.630685 & 103.8827 & 3344.158 & 19580100100036585 & 265.46426\\\\\n",
"\t36586 & -4.630273 & 103.8830 & 3393.292 & 19580100100036586 & 323.67648\\\\\n",
"\t36588 & -4.629430 & 103.8836 & 3388.073 & 19580100100036588 & 36.59831\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A data.frame: 6 × 5\n",
"\n",
"| | latitude <dbl> | longitude <dbl> | elevation <dbl> | shot_number <int64> | agbd <dbl> |\n",
"|---|---|---|---|---|---|\n",
"| 36569 | -4.637412 | 103.8779 | 3288.700 | 19580100100036569 | 398.62744 |\n",
"| 36580 | -4.632800 | 103.8812 | 3391.723 | 19580100100036580 | 565.04077 |\n",
"| 36581 | -4.632382 | 103.8815 | 3412.304 | 19580100100036581 | 378.42584 |\n",
"| 36585 | -4.630685 | 103.8827 | 3344.158 | 19580100100036585 | 265.46426 |\n",
"| 36586 | -4.630273 | 103.8830 | 3393.292 | 19580100100036586 | 323.67648 |\n",
"| 36588 | -4.629430 | 103.8836 | 3388.073 | 19580100100036588 | 36.59831 |\n",
"\n"
],
"text/plain": [
" latitude longitude elevation shot_number agbd \n",
"36569 -4.637412 103.8779 3288.700 19580100100036569 398.62744\n",
"36580 -4.632800 103.8812 3391.723 19580100100036580 565.04077\n",
"36581 -4.632382 103.8815 3412.304 19580100100036581 378.42584\n",
"36585 -4.630685 103.8827 3344.158 19580100100036585 265.46426\n",
"36586 -4.630273 103.8830 3393.292 19580100100036586 323.67648\n",
"36588 -4.629430 103.8836 3388.073 19580100100036588 36.59831"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# set variables\n",
"lats <- h5read(download_file, \"/BEAM0001/lat_lowestmode\")\n",
"lons <- h5read(download_file, \"/BEAM0001/lon_lowestmode\")\n",
"elev <- h5read(download_file, \"/BEAM0001/elev_lowestmode\")\n",
"shot_num <- h5read(download_file, \"/BEAM0001/shot_number\", bit64conversion='bit64')\n",
"agbd <- h5read(download_file, \"/BEAM0001/agbd\")\n",
"\n",
"# create dataframe\n",
"gedi_df <- data.frame(latitude = lats, longitude = lons, elevation = elev, shot_number = shot_num, agbd = agbd)\n",
"head(gedi_df[!(gedi_df$agbd %in% \"-9999\"),]) # drop missing values, load first few rows"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "4.3.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}