GEDI Data Access

Authors: Harshini Girish (UAH), Sheyenne Kirkland (UAH), Alex Mandel (Development Seed), Henry Rodman (Development Seed), Zac Deziel (Development Seed)

Date: April 15, 2025

Description: In this notebook, users will learn how to search for GEDI data using maap-py, download it, and then open it using rhdf5.

Run This Notebook

To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the “Getting started with the MAAP” section of our documentation.

Disclaimer: it is highly recommended to run a tutorial within MAAP’s ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within an “R/Python” workspace.

Additional Resources

Install and Load Required Libraries

Let’s install and load the packages necessary for this tutorial.

[64]:
library("rhdf5") # to read HDF5 files
library("reticulate") # to use maap-py python

Let’s also invoke the MAAP constructor. This will allow us to use the python-based maap-py library from R, which will be used to get credentials and conduct a NASA CMR search.

[65]:
maap_py <- import("maap.maap")
maap <- maap_py$MAAP()

Get Credentials

Since we will be downloading the GEDI data, we will need temporary credentials for NASA ORNL DAAC.

[68]:
credentials <- maap$aws$earthdata_s3_credentials(
    "https://data.ornldaac.earthdata.nasa.gov/s3credentials"
)

s3 <- paws::s3(
    credentials = list(
        creds = list(
          access_key_id = credentials["accessKeyId"],
          secret_access_key = credentials["secretAccessKey"],
          session_token = credentials["sessionToken"]
          )),
        region = "us-west-2")

Download File

Before downloading, lets do some prepping. First we’ll create a directory to download our file to. Then from our S3 link, we can get the bucket, key, and a filename.

[69]:
# create directory
download_dir = file.path(getwd(), "data")
dir.create(download_dir, showWarnings = FALSE, recursive = TRUE)
[70]:
# get bucket from file path
s3_parts <- strsplit(sub("s3://","", s3_link), "/", fixed = TRUE)[[1]] # drop the s3 prefix
bucket <- s3_parts[1] # grab the 1st item which is the bucket name

# create file name for download
filename <- tail(s3_parts, n=1) # grab the last part of the path
download_file <- file.path(download_dir, filename)

# get key from file path
key <- paste(tail(s3_parts, n=-1), collapse='/') # grab everything in the path, except the 1st item

Now we can download our file.

[71]:
s3$download_file(Bucket = bucket, Key = key, Filename = download_file)

Access Data

Now that we have our downloaded data, we can use rhdf5 to open our file for exploration.

[84]:
gedi_data <- h5ls(download_file)
head(gedi_data)
A data.frame: 6 × 5
groupnameotypedclassdim
<chr><chr><chr><chr><chr>
0/ ANCILLARY H5I_GROUP
1/ANCILLARYmodel_dataH5I_DATASETCOMPOUND35
2/ANCILLARYpft_lut H5I_DATASETCOMPOUND7
3/ANCILLARYregion_lutH5I_DATASETCOMPOUND7
4/ BEAM0000 H5I_GROUP
5/BEAM0000 agbd H5I_DATASETFLOAT 48675

We can extract the different beams associated with GEDI L2A.

[85]:
beams <- paste0("/", gedi_data[grep("^BEAM", gedi_data$name),]$name)
beams
  1. '/BEAM0000'
  2. '/BEAM0001'
  3. '/BEAM0010'
  4. '/BEAM0011'
  5. '/BEAM0101'
  6. '/BEAM0110'
  7. '/BEAM1000'
  8. '/BEAM1011'

Now that we have a list of beams, we can see what data is held within each beam. Let’s create a dataframe with all variables associated with /BEAM0001 and their dimensions (how many rows of data are available within each variable).

[86]:
beam_variables <- gedi_data[gedi_data$group == beams[2],]

cat("Available variables for /BEAM0001 and their dimensions:\n")
print(beam_variables[, c("name", "dim")])
Available variables for /BEAM0001 and their dimensions:
                    name       dim
192                 agbd     47789
193        agbd_pi_lower     47789
194        agbd_pi_upper     47789
195      agbd_prediction
309              agbd_se     47789
310               agbd_t     47789
311            agbd_t_se     47789
312   algorithm_run_flag     47789
313                 beam     47789
314              channel     47789
315         degrade_flag     47789
316           delta_time     47789
317      elev_lowestmode     47789
318          geolocation
349      l2_quality_flag     47789
350      l4_quality_flag     47789
351      land_cover_data
363       lat_lowestmode     47789
364       lon_lowestmode     47789
365          master_frac     47789
366           master_int     47789
367      predict_stratum     47789
368 predictor_limit_flag     47789
369  response_limit_flag     47789
370   selected_algorithm     47789
371        selected_mode     47789
372   selected_mode_flag     47789
373          sensitivity     47789
374          shot_number     47789
375      solar_elevation     47789
376         surface_flag     47789
377                 xvar 4 x 47789

Let’s read some of the data associated with specific variables, and load them into a dataframe.

[88]:
# set variables
lats <- h5read(download_file, "/BEAM0001/lat_lowestmode")
lons <- h5read(download_file, "/BEAM0001/lon_lowestmode")
elev <- h5read(download_file, "/BEAM0001/elev_lowestmode")
shot_num <- h5read(download_file, "/BEAM0001/shot_number", bit64conversion='bit64')
agbd <- h5read(download_file, "/BEAM0001/agbd")

# create dataframe
gedi_df <- data.frame(latitude = lats, longitude = lons, elevation = elev, shot_number = shot_num, agbd = agbd)
head(gedi_df[!(gedi_df$agbd %in% "-9999"),]) # drop missing values, load first few rows
A data.frame: 6 × 5
latitudelongitudeelevationshot_numberagbd
<dbl><dbl><dbl><int64><dbl>
36569-4.637412103.87793288.70019580100100036569398.62744
36580-4.632800103.88123391.72319580100100036580565.04077
36581-4.632382103.88153412.30419580100100036581378.42584
36585-4.630685103.88273344.15819580100100036585265.46426
36586-4.630273103.88303393.29219580100100036586323.67648
36588-4.629430103.88363388.07319580100100036588 36.59831