Global Ecosystem Dynamics Investigation (GEDI) Level 2A Product Tutorial

This tutorial aims to provide information and code to help users get started working with the GEDI Level 2A (GEDI02_A) product using the MAAP. Information about the GEDI02_A product may be found at the Data Set Landing Page. We will start by importing the packages which will allow us to search for, access, explore, and visualize GEDI02_A product data.

Note: This Jupyter notebook utilizes the folium and pystac_client packages. If you do not have these packages installed, uncomment the lines and run the following code block.

[1]:
# !pip install folium
# !pip install pystac_client

For this tutorial, we will import boto3, folium, h5py, pandas, exists from os.path and Client from pystac_client as shown in the following codeblock.

[2]:
# Install packages
import boto3
import folium
import h5py
import os
import pandas as pd
from maap.maap import MAAP
from os.path import exists
from pystac_client import Client

Searching for and accessing GEDI02_A data

As of the time of the writing of this tutorial (2/10/23), two recommended ways for searching and accessing GEDI02_A data for use on the MAAP ADE are through the maap-stac as well as through NASA’s Common Metadata Repository (CMR). The methods for using these two ways are different and documented in the following two sub-sections.

Via maap-stac

To search for data from the GEDI02_A product, we will use the Client package to open the `maap-stac URL <https://stac.maap-project.org/>`__ and assign this to a variable (client in this case).

[3]:
# Open the maap-stac URL with the Client package
URL = 'https://stac.maap-project.org/'
client = Client.open(URL)

Now we can use the client specified above to search for data within the GEDI02_A product. Let’s search for the first item that is found in the GEDI02_A collection and assign this to a variable (search in this case).

[4]:
collection = 'GEDI02_A' # assign collection name
# Search for 1st item found in the collection
search = client.search(
    max_items = 1,
    collections = collection,
)

Let’s inspect this item using the get_all_items() function.

[5]:
# Inspect first item
item = search.get_all_items()[0]
item
[5]:

After running the code block above, you should receive an output such as the output shown above. Click the arrow next to the item to see information such as the ID, bounding box coordinates, datetime, and more. In order to access the data, we will use the item variable from the above section in order to extract the necessary information to set and display bucket, key, and filename variables. These will be useful to use as arguments when downloading the file.

[6]:
# Use the item variable to extract information about the bucket, key, and file name
href = item.assets['data'].href
path_parts = href.split('/')
bucket = path_parts[2]
key = href.split(bucket)[1][1:]
filename = path_parts[-1]
# Display arguments
bucket, key, filename
[6]:
('nasa-maap-data-store',
 'file-staging/nasa-map/GEDI02_A___002/2021.09.29/GEDI02_A_2021272190541_O15849_04_T03030_02_003_02_V002.h5',
 'GEDI02_A_2021272190541_O15849_04_T03030_02_003_02_V002.h5')

Now let’s set an s3 variable using the boto3.client function. We can use the function download_file along with the arguments we set in the previous block to download the GEDI02_A data we need.

[7]:
# Set s3 variable
s3 = boto3.client('s3')
# If file already exists, do not download file
if exists(filename):
    print("File already downloaded")
# Otherwise, download the file
else:
    s3.download_file(bucket, key, filename)
    print("Finished downloading")
Finished downloading

After the previous block has finished running, we should see the message Finished downloading and the file should appear in the same directory that the Jupyter notebook is in.

Via NASA’s CMR

To search for data from the GEDI02_A product using NASA’s CMR, we invoke the MAAP constructor, setting the maap_host argument to 'api.ops.maap-project.org'.

[8]:
# Invoke the MAAP using the MAAP host argument
maap = MAAP(maap_host='api.ops.maap-project.org')

Now we can use the searchGranule function to find granule data within the collection, using the collection short name “GEDI_02A”. Note that we can use searchGranule’s cmr_host argument to specify a CMR instance external to MAAP and the readable_granule_name argument to find granules matching either granule UR or producer granule id (please see the API documentation for more information). In order to download data from NASA’s CMR, we will set a variable to the first result from the results we obtained.

[9]:
# Search for granule data using CMR host name and collection short name, and readable_granule_name arguments
results = maap.searchGranule(
    cmr_host='cmr.earthdata.nasa.gov',
    short_name='GEDI02_A',
    readable_granule_name = "GEDI02_A_2021272190541_O15849_04_T03030_02_003_02_V002.h5")
# Download first result
filename = results[0].getData()

If desired, the print function can be utilized to see the file name and directory.

[10]:
# Print file directory
print(filename)
./GEDI02_A_2021272190541_O15849_04_T03030_02_003_02_V002.h5

Explore

Now that we have downloaded the data, let’s look into what it contains.

[11]:
# Create variable containing info from the file we downloaded
gediL2A = h5py.File(filename, 'r')

GEDI02_A data has data for 8 different beams. Let’s create a list of beam names to help explore the data.

[12]:
# Create list of beam names
beamNames = [g for g in gediL2A.keys() if g.startswith('BEAM')]
beamNames
[12]:
['BEAM0000',
 'BEAM0001',
 'BEAM0010',
 'BEAM0011',
 'BEAM0101',
 'BEAM0110',
 'BEAM1000',
 'BEAM1011']

Now let’s explore the information available for one of the beams (in this case ‘BEAM0000’).

[13]:
# Get list of objects in the data pertaining to 'BEAM0000'
beam = beamNames[0]
gediL2A_objs = []
gediL2A.visit(gediL2A_objs.append)
gediSDS = [o for o in gediL2A_objs if isinstance(gediL2A[o], h5py.Dataset)]
[i for i in gediSDS if beam in i][0:20]
[13]:
['BEAM0000/ancillary/l2a_alg_count',
 'BEAM0000/beam',
 'BEAM0000/channel',
 'BEAM0000/degrade_flag',
 'BEAM0000/delta_time',
 'BEAM0000/digital_elevation_model',
 'BEAM0000/digital_elevation_model_srtm',
 'BEAM0000/elev_highestreturn',
 'BEAM0000/elev_lowestmode',
 'BEAM0000/elevation_bias_flag',
 'BEAM0000/elevation_bin0_error',
 'BEAM0000/energy_total',
 'BEAM0000/geolocation/elev_highestreturn_a1',
 'BEAM0000/geolocation/elev_highestreturn_a2',
 'BEAM0000/geolocation/elev_highestreturn_a3',
 'BEAM0000/geolocation/elev_highestreturn_a4',
 'BEAM0000/geolocation/elev_highestreturn_a5',
 'BEAM0000/geolocation/elev_highestreturn_a6',
 'BEAM0000/geolocation/elev_lowestmode_a1',
 'BEAM0000/geolocation/elev_lowestmode_a2']

Visualize

Now that we’ve seen the various labels within the /BEAM0000 group, let’s use this information to visualize the GEDI orbit path for our scenes. To start, we shall get samples for various shots, the beam number, longitude, latitude, and quality flags. We can use these samples to create and display a pandas dataframe.

[14]:
# Set variables for shot, beam number, longitude, latitude, and quality flag samples
lonSample, latSample, shotSample, qualitySample, beamSample = [], [], [], [], []
lats = gediL2A[f'{beamNames[0]}/lat_lowestmode'][()]
lons = gediL2A[f'{beamNames[0]}/lon_lowestmode'][()]
shots = gediL2A[f'{beamNames[0]}/shot_number'][()]
quality = gediL2A[f'{beamNames[0]}/quality_flag'][()]
for i in range(len(shots)):
    if i % 100 == 0:
        shotSample.append(str(shots[i]))
        lonSample.append(lons[i])
        latSample.append(lats[i])
        qualitySample.append(quality[i])
        beamSample.append(beamNames[0])
# Create a pandas dataframe containing the sample information
latslons = pd.DataFrame({'Beam': beamSample, 'Shot Number': shotSample, 'Longitude': lonSample,
                         'Latitude': latSample, 'Quality Flag': qualitySample})
# Display the dataframe
latslons
[14]:
Beam Shot Number Longitude Latitude Quality Flag
0 BEAM0000 158490000400502383 52.337698 0.599011 0
1 BEAM0000 158490000400502483 52.367554 0.556129 0
2 BEAM0000 158490000400502583 52.396932 0.514800 0
3 BEAM0000 158490000400502683 52.426548 0.472746 0
4 BEAM0000 158490000400502783 52.456248 0.430695 0
... ... ... ... ... ...
1685 BEAM0000 158490000400670883 135.837810 -51.605016 0
1686 BEAM0000 158490000400670983 135.921291 -51.605653 0
1687 BEAM0000 158490000400671083 136.003206 -51.606299 0
1688 BEAM0000 158490000400671183 136.086629 -51.606741 0
1689 BEAM0000 158490000400671283 136.168523 -51.607159 0

1690 rows × 5 columns

We can now create a map of the orbit path using the dataframe that we have created and utilizing the folium.Map and folium.Circle functions. Include map in the code block to inspect the map which is created within the Jupyter notebook.

[15]:
# Create a map
map = folium.Map(
    location=[
        latslons.Latitude.mean(),
        latslons.Longitude.mean()
    ], zoom_start=3, control_scale=True
)
# Add variables to the map
for index, location_info in latslons.iterrows():
    folium.Circle(
        [location_info["Latitude"], location_info["Longitude"]],
        popup=f"Shot Number: {location_info['Shot Number']}"
    ).add_to(map)
# Display map
map
[15]:
Make this Notebook Trusted to load map: File -> Trust Notebook

References: