GEDI Data Access
Authors: Harshini Girish (UAH), Sheyenne Kirkland (UAH), Alex Mandel (Development Seed), Henry Rodman (Development Seed), Zac Deziel (Development Seed)
Date: April 15, 2025
Description: In this notebook, users will learn how to search for GEDI data using maap-py, download it, and then open it using rhdf5.
Run This Notebook
To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the “Getting started with the MAAP” section of our documentation.
Disclaimer: it is highly recommended to run a tutorial within MAAP’s ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within an “R/Python” workspace.
Additional Resources
-
The
rhdf5package page, with installation instructions, documentation, and more.
NASA’s Operational CMR (MAAP Docs)
A section in the MAAP Docs offering an overview of resources to search and access NASA’s CMR.
GEDI02_A v2 Dataset Landing Page
Learn more about NASA’s GEDI L2A dataset, which is accessed in this notebook.
Install and Load Required Libraries
Let’s install and load the packages necessary for this tutorial.
[64]:
library("rhdf5") # to read HDF5 files
library("reticulate") # to use maap-py python
Let’s also invoke the MAAP constructor. This will allow us to use the python-based maap-py library from R, which will be used to get credentials and conduct a NASA CMR search.
[65]:
maap_py <- import("maap.maap")
maap <- maap_py$MAAP()
Collection and Granule Search
Using maap-py, we can conduct a collection and granule search for data within NASA’s CMR. For this example, we’ll use data available within the GEDI L2A collection. For more information on CMR searching in R, see “Searching for Data in NASA’s CMR in R”.
[66]:
# search for a GEDI collection
gedi_collections <- maap$searchCollection(
short_name = "GEDI_L4A_AGB_Density_V2_1_2056",
version = "2.1",
cmr_host = "cmr.earthdata.nasa.gov",
cloud_hosted = "true"
)
# get collection ID for granule search
collection_concept_id <- gedi_collections[[1]][["concept-id"]]
cat("Collection Concept ID:", collection_concept_id, "\n")
# search for the first granules
gedi_granules <- maap$searchGranule(
collection_concept_id = collection_concept_id,
limit = as.integer(10),
cmr_host = "cmr.earthdata.nasa.gov"
)
granule_names <- sapply(gedi_granules, function(names) names[["Granule"]][["GranuleUR"]])
cat("Granules:\n")
print(granule_names)
Collection Concept ID: C2237824918-ORNL_CLOUD
Granules:
[1] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_01_T02638_02_002_02_V002.h5"
[2] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_02_T02638_02_002_02_V002.h5"
[3] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_03_T02638_02_002_02_V002.h5"
[4] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_04_T02638_02_002_02_V002.h5"
[5] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_01_T03909_02_002_02_V002.h5"
[6] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_02_T03909_02_002_02_V002.h5"
[7] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_03_T03909_02_002_02_V002.h5"
[8] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_04_T03909_02_002_02_V002.h5"
[9] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108015253_O01960_01_T03910_02_002_02_V002.h5"
[10] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108015253_O01960_02_T03910_02_002_02_V002.h5"
Let’s get the S3 URL from the first granule from our granule search.
[67]:
s3_link <- gedi_granules[[1]]["Granule"]["OnlineAccessURLs"][[1]][1]["URL"]
print(s3_link)
[1] "s3://ornl-cumulus-prod-protected/gedi/GEDI_L4A_AGB_Density_V2_1/data/GEDI04_A_2019107224731_O01958_01_T02638_02_002_02_V002.h5"
Get Credentials
Since we will be downloading the GEDI data, we will need temporary credentials for NASA ORNL DAAC.
[68]:
credentials <- maap$aws$earthdata_s3_credentials(
"https://data.ornldaac.earthdata.nasa.gov/s3credentials"
)
s3 <- paws::s3(
credentials = list(
creds = list(
access_key_id = credentials["accessKeyId"],
secret_access_key = credentials["secretAccessKey"],
session_token = credentials["sessionToken"]
)),
region = "us-west-2")
Download File
Before downloading, lets do some prepping. First we’ll create a directory to download our file to. Then from our S3 link, we can get the bucket, key, and a filename.
[69]:
# create directory
download_dir = file.path(getwd(), "data")
dir.create(download_dir, showWarnings = FALSE, recursive = TRUE)
[70]:
# get bucket from file path
s3_parts <- strsplit(sub("s3://","", s3_link), "/", fixed = TRUE)[[1]] # drop the s3 prefix
bucket <- s3_parts[1] # grab the 1st item which is the bucket name
# create file name for download
filename <- tail(s3_parts, n=1) # grab the last part of the path
download_file <- file.path(download_dir, filename)
# get key from file path
key <- paste(tail(s3_parts, n=-1), collapse='/') # grab everything in the path, except the 1st item
Now we can download our file.
[71]:
s3$download_file(Bucket = bucket, Key = key, Filename = download_file)
Access Data
Now that we have our downloaded data, we can use rhdf5 to open our file for exploration.
[84]:
gedi_data <- h5ls(download_file)
head(gedi_data)
| group | name | otype | dclass | dim | |
|---|---|---|---|---|---|
| <chr> | <chr> | <chr> | <chr> | <chr> | |
| 0 | / | ANCILLARY | H5I_GROUP | ||
| 1 | /ANCILLARY | model_data | H5I_DATASET | COMPOUND | 35 |
| 2 | /ANCILLARY | pft_lut | H5I_DATASET | COMPOUND | 7 |
| 3 | /ANCILLARY | region_lut | H5I_DATASET | COMPOUND | 7 |
| 4 | / | BEAM0000 | H5I_GROUP | ||
| 5 | /BEAM0000 | agbd | H5I_DATASET | FLOAT | 48675 |
We can extract the different beams associated with GEDI L2A.
[85]:
beams <- paste0("/", gedi_data[grep("^BEAM", gedi_data$name),]$name)
beams
- '/BEAM0000'
- '/BEAM0001'
- '/BEAM0010'
- '/BEAM0011'
- '/BEAM0101'
- '/BEAM0110'
- '/BEAM1000'
- '/BEAM1011'
Now that we have a list of beams, we can see what data is held within each beam. Let’s create a dataframe with all variables associated with /BEAM0001 and their dimensions (how many rows of data are available within each variable).
[86]:
beam_variables <- gedi_data[gedi_data$group == beams[2],]
cat("Available variables for /BEAM0001 and their dimensions:\n")
print(beam_variables[, c("name", "dim")])
Available variables for /BEAM0001 and their dimensions:
name dim
192 agbd 47789
193 agbd_pi_lower 47789
194 agbd_pi_upper 47789
195 agbd_prediction
309 agbd_se 47789
310 agbd_t 47789
311 agbd_t_se 47789
312 algorithm_run_flag 47789
313 beam 47789
314 channel 47789
315 degrade_flag 47789
316 delta_time 47789
317 elev_lowestmode 47789
318 geolocation
349 l2_quality_flag 47789
350 l4_quality_flag 47789
351 land_cover_data
363 lat_lowestmode 47789
364 lon_lowestmode 47789
365 master_frac 47789
366 master_int 47789
367 predict_stratum 47789
368 predictor_limit_flag 47789
369 response_limit_flag 47789
370 selected_algorithm 47789
371 selected_mode 47789
372 selected_mode_flag 47789
373 sensitivity 47789
374 shot_number 47789
375 solar_elevation 47789
376 surface_flag 47789
377 xvar 4 x 47789
Let’s read some of the data associated with specific variables, and load them into a dataframe.
[88]:
# set variables
lats <- h5read(download_file, "/BEAM0001/lat_lowestmode")
lons <- h5read(download_file, "/BEAM0001/lon_lowestmode")
elev <- h5read(download_file, "/BEAM0001/elev_lowestmode")
shot_num <- h5read(download_file, "/BEAM0001/shot_number", bit64conversion='bit64')
agbd <- h5read(download_file, "/BEAM0001/agbd")
# create dataframe
gedi_df <- data.frame(latitude = lats, longitude = lons, elevation = elev, shot_number = shot_num, agbd = agbd)
head(gedi_df[!(gedi_df$agbd %in% "-9999"),]) # drop missing values, load first few rows
| latitude | longitude | elevation | shot_number | agbd | |
|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <int64> | <dbl> | |
| 36569 | -4.637412 | 103.8779 | 3288.700 | 19580100100036569 | 398.62744 |
| 36580 | -4.632800 | 103.8812 | 3391.723 | 19580100100036580 | 565.04077 |
| 36581 | -4.632382 | 103.8815 | 3412.304 | 19580100100036581 | 378.42584 |
| 36585 | -4.630685 | 103.8827 | 3344.158 | 19580100100036585 | 265.46426 |
| 36586 | -4.630273 | 103.8830 | 3393.292 | 19580100100036586 | 323.67648 |
| 36588 | -4.629430 | 103.8836 | 3388.073 | 19580100100036588 | 36.59831 |