GEDI Data Access

Authors: Harshini Girish (UAH), Sheyenne Kirkland (UAH), Alex Mandel (Development Seed), Henry Rodman (Development Seed), Zac Deziel (Development Seed)

Date: April 15, 2025

Description: In this notebook, users will learn how to search for GEDI data using maap-py, download it, and then open it using rhdf5.

Run This Notebook

To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the “Getting started with the MAAP” section of our documentation.

Disclaimer: it is highly recommended to run a tutorial within MAAP’s ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within an “R/Python” workspace.

Additional Resources

rhdf5
- The rhdf5 package page, with installation instructions, documentation, and more.
NASA’s Operational CMR (MAAP Docs)
- A section in the MAAP Docs offering an overview of resources to search and access NASA’s CMR.
GEDI02_A v2 Dataset Landing Page
- Learn more about NASA’s GEDI L2A dataset, which is accessed in this notebook.

Install and Load Required Libraries

Let’s install and load the packages necessary for this tutorial.

[64]:

library("rhdf5") # to read HDF5 files
library("reticulate") # to use maap-py python

Let’s also invoke the MAAP constructor. This will allow us to use the python-based maap-py library from R, which will be used to get credentials and conduct a NASA CMR search.

[65]:

maap_py <- import("maap.maap")
maap <- maap_py$MAAP()

Collection and Granule Search

Using maap-py, we can conduct a collection and granule search for data within NASA’s CMR. For this example, we’ll use data available within the GEDI L2A collection. For more information on CMR searching in R, see “Searching for Data in NASA’s CMR in R”.

[66]:

# search for a GEDI collection
gedi_collections <- maap$searchCollection(
    short_name = "GEDI_L4A_AGB_Density_V2_1_2056",
    version = "2.1",
    cmr_host = "cmr.earthdata.nasa.gov",
    cloud_hosted = "true"
)

# get collection ID for granule search
collection_concept_id <- gedi_collections[[1]][["concept-id"]]
cat("Collection Concept ID:", collection_concept_id, "\n")

# search for the first granules
gedi_granules <- maap$searchGranule(
    collection_concept_id = collection_concept_id,
    limit = as.integer(10),
    cmr_host = "cmr.earthdata.nasa.gov"
)

granule_names <- sapply(gedi_granules, function(names) names[["Granule"]][["GranuleUR"]])
cat("Granules:\n")
print(granule_names)

Collection Concept ID: C2237824918-ORNL_CLOUD
Granules:
 [1] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_01_T02638_02_002_02_V002.h5"
 [2] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_02_T02638_02_002_02_V002.h5"
 [3] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_03_T02638_02_002_02_V002.h5"
 [4] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_04_T02638_02_002_02_V002.h5"
 [5] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_01_T03909_02_002_02_V002.h5"
 [6] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_02_T03909_02_002_02_V002.h5"
 [7] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_03_T03909_02_002_02_V002.h5"
 [8] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_04_T03909_02_002_02_V002.h5"
 [9] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108015253_O01960_01_T03910_02_002_02_V002.h5"
[10] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108015253_O01960_02_T03910_02_002_02_V002.h5"

Let’s get the S3 URL from the first granule from our granule search.

[67]:

s3_link <- gedi_granules[[1]]["Granule"]["OnlineAccessURLs"][[1]][1]["URL"]
print(s3_link)

[1] "s3://ornl-cumulus-prod-protected/gedi/GEDI_L4A_AGB_Density_V2_1/data/GEDI04_A_2019107224731_O01958_01_T02638_02_002_02_V002.h5"

Get Credentials

Since we will be downloading the GEDI data, we will need temporary credentials for NASA ORNL DAAC.

[68]:

credentials <- maap$aws$earthdata_s3_credentials(
    "https://data.ornldaac.earthdata.nasa.gov/s3credentials"
)

s3 <- paws::s3(
    credentials = list(
        creds = list(
          access_key_id = credentials["accessKeyId"],
          secret_access_key = credentials["secretAccessKey"],
          session_token = credentials["sessionToken"]
          )),
        region = "us-west-2")

Download File

Before downloading, lets do some prepping. First we’ll create a directory to download our file to. Then from our S3 link, we can get the bucket, key, and a filename.

[69]:

# create directory
download_dir = file.path(getwd(), "data")
dir.create(download_dir, showWarnings = FALSE, recursive = TRUE)

[70]:

# get bucket from file path
s3_parts <- strsplit(sub("s3://","", s3_link), "/", fixed = TRUE)[[1]] # drop the s3 prefix
bucket <- s3_parts[1] # grab the 1st item which is the bucket name

# create file name for download
filename <- tail(s3_parts, n=1) # grab the last part of the path
download_file <- file.path(download_dir, filename)

# get key from file path
key <- paste(tail(s3_parts, n=-1), collapse='/') # grab everything in the path, except the 1st item

Now we can download our file.

[71]:

s3$download_file(Bucket = bucket, Key = key, Filename = download_file)

Access Data

Now that we have our downloaded data, we can use rhdf5 to open our file for exploration.

[84]:

gedi_data <- h5ls(download_file)
head(gedi_data)

A data.frame: 6 × 5
	group	name	otype	dclass	dim
	<chr>	<chr>	<chr>	<chr>	<chr>
0	/	ANCILLARY	H5I_GROUP
1	/ANCILLARY	model_data	H5I_DATASET	COMPOUND	35
2	/ANCILLARY	pft_lut	H5I_DATASET	COMPOUND	7
3	/ANCILLARY	region_lut	H5I_DATASET	COMPOUND	7
4	/	BEAM0000	H5I_GROUP
5	/BEAM0000	agbd	H5I_DATASET	FLOAT	48675

We can extract the different beams associated with GEDI L2A.

[85]:

beams <- paste0("/", gedi_data[grep("^BEAM", gedi_data$name),]$name)
beams

'/BEAM0000'
'/BEAM0001'
'/BEAM0010'
'/BEAM0011'
'/BEAM0101'
'/BEAM0110'
'/BEAM1000'
'/BEAM1011'

Now that we have a list of beams, we can see what data is held within each beam. Let’s create a dataframe with all variables associated with /BEAM0001 and their dimensions (how many rows of data are available within each variable).

[86]:

beam_variables <- gedi_data[gedi_data$group == beams[2],]

cat("Available variables for /BEAM0001 and their dimensions:\n")
print(beam_variables[, c("name", "dim")])

Available variables for /BEAM0001 and their dimensions:
                    name       dim
192                 agbd     47789
193        agbd_pi_lower     47789
194        agbd_pi_upper     47789
195      agbd_prediction
309              agbd_se     47789
310               agbd_t     47789
311            agbd_t_se     47789
312   algorithm_run_flag     47789
313                 beam     47789
314              channel     47789
315         degrade_flag     47789
316           delta_time     47789
317      elev_lowestmode     47789
318          geolocation
349      l2_quality_flag     47789
350      l4_quality_flag     47789
351      land_cover_data
363       lat_lowestmode     47789
364       lon_lowestmode     47789
365          master_frac     47789
366           master_int     47789
367      predict_stratum     47789
368 predictor_limit_flag     47789
369  response_limit_flag     47789
370   selected_algorithm     47789
371        selected_mode     47789
372   selected_mode_flag     47789
373          sensitivity     47789
374          shot_number     47789
375      solar_elevation     47789
376         surface_flag     47789
377                 xvar 4 x 47789

Let’s read some of the data associated with specific variables, and load them into a dataframe.

[88]:

# set variables
lats <- h5read(download_file, "/BEAM0001/lat_lowestmode")
lons <- h5read(download_file, "/BEAM0001/lon_lowestmode")
elev <- h5read(download_file, "/BEAM0001/elev_lowestmode")
shot_num <- h5read(download_file, "/BEAM0001/shot_number", bit64conversion='bit64')
agbd <- h5read(download_file, "/BEAM0001/agbd")

# create dataframe
gedi_df <- data.frame(latitude = lats, longitude = lons, elevation = elev, shot_number = shot_num, agbd = agbd)
head(gedi_df[!(gedi_df$agbd %in% "-9999"),]) # drop missing values, load first few rows

A data.frame: 6 × 5
	latitude	longitude	elevation	shot_number	agbd
	<dbl>	<dbl>	<dbl>	<int64>	<dbl>
36569	-4.637412	103.8779	3288.700	19580100100036569	398.62744
36580	-4.632800	103.8812	3391.723	19580100100036580	565.04077
36581	-4.632382	103.8815	3412.304	19580100100036581	378.42584
36585	-4.630685	103.8827	3344.158	19580100100036585	265.46426
36586	-4.630273	103.8830	3393.292	19580100100036586	323.67648
36588	-4.629430	103.8836	3388.073	19580100100036588	36.59831