Search GEDI L2B and plot canopy heights

In this tutorial, we will use the integrated Earthdata search function in MAAP ADE (https://ade.ops.maap-project.org/) to search for some GEDI L2B data for an area of interest. We will then download some of this data and read its attributes in preparation for some analysis. We will perform a spatial subset on the data to reduce data volumes and we will finally make some basic plots of our data. We will start by importing some of the modules we will need.

Prerequisites

  • geopandas
  • folium
[1]:
# Uncomment the following lines to install these packages if you haven't already.
# !pip install geopandas
# !pip install folium
[2]:
from maap.maap import MAAP
maap = MAAP(maap_host='api.ops.maap-project.org')
import geopandas as gpd
import folium
import h5py
import pandas
import matplotlib
import matplotlib.pyplot as plt
import shapely
/opt/conda/lib/python3.7/site-packages/geopandas/_compat.py:115: UserWarning: The Shapely GEOS version (3.11.1-CAPI-1.17.1) is incompatible with the GEOS version PyGEOS was compiled with (3.8.1-CAPI-1.13.3). Conversions between both will be slow.
  shapely_geos_version, geos_capi_version_string

We will search and download GEDI L2B data using the bounding box of a vector AOI. Firstly, an AOI over the Shenandoah National Park will be created and then we will plot its location on a map.

[3]:
# Using bounding coordinates to create a polygon around Shenadoah National Park
lon_coords = [-78.32129105072025, -78.04618813890727, -78.72985973163064, -79.0158578082679, -78.32129105072025]
lat_coords = [38.88703610703791, 38.74909216350823, 37.88789051477522, 38.03177640342157, 38.88703610703791]

polygon_geom = shapely.geometry.polygon.Polygon(zip(lon_coords, lat_coords))
crs = 'epsg:4326'
AOI = gpd.GeoDataFrame(index=[0], crs=crs, geometry=[polygon_geom])

We can get the bounding box of the AOI so we can use it as a spatial limit on our data search. GeoPandas has a function for returning the spatial coordinates of a bounding box:

[4]:
# Get the bounding box of the shp
bbox = AOI.bounds
# print the bounding box coords
print('minx = ', bbox['minx'][0])
print('miny = ', bbox['miny'][0])
print('maxx = ', bbox['maxx'][0])
print('maxy = ', bbox['maxy'][0])
minx =  -79.0158578082679
miny =  37.88789051477522
maxx =  -78.04618813890727
maxy =  38.88703610703791

Let’s look at our AOI on an interactive map using folium.

[5]:
m = folium.Map(location=[38.5, -78], zoom_start=9, tiles='CartoDB positron')
geo_j = folium.GeoJson(data=AOI, style_function=lambda x: {'fillColor': 'orange'})
geo_j.add_to(m)
m
[5]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Search Data: To search GEDI data we can use the EarthData search integration in the MAAP ADE. Open the Earthdata search toolbar by clicking on the following:

EarthData

This will open up the EarthData search interface in a new tab. We can use the search bar to search GEDI L2B data. By entering “L2B” in the search bar, we can see the relevant GEDI data is filtered. Click on the dataset to get more details.

EarthDataInterface

While we have been searching for data, the MAAP ADE has been keeping a track of our search parameters. This means that we can easily insert our search for GEDI data straight into our notebook.

PasteResults

This gives us our search parameters in our notebook using the GEDI dataset id. The default limit for the number of returned results is 1000. Running this will produce 1000 results, but we can look at the first one by indexing the list of returned results. This is what the data entry looks like. You can see a large amount of metadata for the file along with the URL for where this specific file is stored.

[6]:
# generated from this EDSC search: https://ade.ops.maap-project.org:30052/search/granules?p=C1201460047-NASA_MAAP&q=L2B&tl=1647961605!4!!
data = maap.searchGranule(concept_id="C1201460047-NASA_MAAP", limit=1000)[0]
data
[6]:
{'concept-id': 'G1201636833-NASA_MAAP',
 'collection-concept-id': 'C1201460047-NASA_MAAP',
 'revision-id': '11',
 'format': 'application/echo10+xml',
 'Granule': {'GranuleUR': 'SC:GEDI02_B.002:2433465280',
  'InsertTime': '2021-02-22T19:16:30.675Z',
  'LastUpdate': '2021-09-16T13:36:07.965Z',
  'Collection': {'ShortName': 'GEDI02_B', 'VersionId': '002'},
  'DataGranule': {'SizeMBDataGranule': '16.5413',
   'ProducerGranuleId': 'GEDI02_B_2019108002012_O01959_01_T03909_02_003_01_V002.h5',
   'DayNightFlag': 'UNSPECIFIED',
   'ProductionDateTime': '2021-02-21T14:45:08.000Z'},
  'Temporal': {'RangeDateTime': {'BeginningDateTime': '2019-04-18T00:20:12.000Z',
    'EndingDateTime': '2019-04-18T01:52:53.000Z'}},
  'Spatial': {'HorizontalSpatialDomain': {'Geometry': {'GPolygon': {'Boundary': {'Point': [{'PointLongitude': '80.2890335089',
         'PointLatitude': '-4.6168623465'},
        {'PointLongitude': '82.4542052313', 'PointLatitude': '-1.5568021223'},
        {'PointLongitude': '83.6402279084', 'PointLatitude': '0.1277767863'},
        {'PointLongitude': '83.6814118126', 'PointLatitude': '0.0987901632'},
        {'PointLongitude': '82.4953794073', 'PointLatitude': '-1.5858244175'},
        {'PointLongitude': '80.3302671214',
         'PointLatitude': '-4.6459691487'}]}}}}},
  'AdditionalAttributes': {'AdditionalAttribute': [{'Name': 'Dataset Status',
     'Values': {'Value': 'MAAP Standard Data Product'}},
    {'Name': 'Geolocated', 'Values': {'Value': 'true'}},
    {'Name': 'Spatial Resolution', 'Values': {'Value': '60'}},
    {'Name': 'Data Format', 'Values': {'Value': 'HDF5'}},
    {'Name': 'Band Center Frequency', 'Values': {'Value': '281759.8'}},
    {'Name': 'Swath Width', 'Values': {'Value': '4.2'}},
    {'Name': 'Field of View', 'Values': {'Value': '12'}},
    {'Name': 'Laser Footprint Diameter', 'Values': {'Value': '25'}},
    {'Name': 'Acquisition Type', 'Values': {'Value': 'Satellite Lidar'}},
    {'Name': 'Band Center Wavelength', 'Values': {'Value': '1064'}},
    {'Name': 'Reference_Ground_Track', 'Values': {'Value': '0'}},
    {'Name': 'SEGMENT_NUMBER', 'Values': {'Value': '01'}}]},
  'OnlineAccessURLs': {'OnlineAccessURL': {'URL': 's3://nasa-maap-data-store/file-staging/nasa-map/GEDI02_B___002/2019.04.18/GEDI02_B_2019108002012_O01959_01_T03909_02_003_01_V002.h5',
    'URLDescription': 'File to download'}},
  'Orderable': 'true',
  'DataFormat': 'HDF5',
  'Visible': 'true'}}

So far, this search function requests all of the GEDI data but we can add a spatial subset filter using our AOI from above to limit the results. Adding a spatial filter returns 176 GEDI L2B files that intersect with our AOI.

[7]:
data_aoi = maap.searchGranule(concept_id="C1201460047-NASA_MAAP", bounding_box="-79.0158578082679,37.88789051477522,-78.04618813890727,38.887036107037915", limit=1000)
print(len(data_aoi))
176

This is more data than we need, so let’s look at the contents of a single GEDI file. Firstly we need to bring it from the server side (S3) to our local side. This can be done using the MAAP function getData. Here we will pull the 7th file in our search results.

[8]:
gedi_data = data_aoi[6].getData('./')
print(gedi_data)
.//GEDI02_B_2019145051352_O02537_03_T04809_02_003_01_V002.h5

GEDI data has 8 beams. So, we will check that all beams are in our file and print a list of the available beams.

[9]:
gedi_h5_file = h5py.File(gedi_data, 'r')
gedi_keys = list(gedi_h5_file.keys())
gedi_beams = ['BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011']
gedi_beams_lst = []
for gedi_beam_name in gedi_keys:
    if gedi_beam_name in gedi_beams:
        gedi_beams_lst.append(gedi_beam_name)
print(gedi_beams_lst)
['BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011']

For each beam, we need to get all of the data and metrics associated with it. For this, we have a function that will gather all of the metrics we want and put them into a geopandas dataframe:

[10]:
def get_gedi_df(gedi_h5_file, gedi_beam_name):
    gedi_beam = gedi_h5_file[gedi_beam_name]

    # Get location info.
    gedi_beam_geoloc = gedi_beam['geolocation']
    # Get land cover data.
    gedi_beam_landcover = gedi_beam['land_cover_data']

    gedi_beam_df = pandas.DataFrame(
            {'elevation_bin0'         : gedi_beam_geoloc['elevation_bin0'],
             'elevation_lastbin'      : gedi_beam_geoloc['elevation_lastbin'],
             'height_bin0'            : gedi_beam_geoloc['height_bin0'],
             'height_lastbin'         : gedi_beam_geoloc['height_lastbin'],
             'shot_number'            : gedi_beam_geoloc['shot_number'],
             'solar_azimuth'          : gedi_beam_geoloc['solar_azimuth'],
             'solar_elevation'        : gedi_beam_geoloc['solar_elevation'],
             'latitude_bin0'          : gedi_beam_geoloc['latitude_bin0'],
             'latitude_lastbin'       : gedi_beam_geoloc['latitude_lastbin'],
             'longitude_bin0'         : gedi_beam_geoloc['longitude_bin0'],
             'longitude_lastbin'      : gedi_beam_geoloc['longitude_lastbin'],
             'degrade_flag'           : gedi_beam_geoloc['degrade_flag'],
             'digital_elevation_model': gedi_beam_geoloc['digital_elevation_model'],
             'landsat_treecover'      : gedi_beam_landcover['landsat_treecover'],
             'modis_nonvegetated'     : gedi_beam_landcover['modis_nonvegetated'],
             'modis_nonvegetated_sd'  : gedi_beam_landcover['modis_nonvegetated_sd'],
             'modis_treecover'        : gedi_beam_landcover['modis_treecover'],
             'modis_treecover_sd'     : gedi_beam_landcover['modis_treecover_sd'],
             'beam'                   : gedi_beam['beam'],
             'cover'                  : gedi_beam['cover'],
             'master_frac'            : gedi_beam['master_frac'],
             'master_int'             : gedi_beam['master_int'],
             'num_detectedmodes'      : gedi_beam['num_detectedmodes'],
             'omega'                  : gedi_beam['omega'],
             'pai'                    : gedi_beam['pai'],
             'pgap_theta'             : gedi_beam['pgap_theta'],
             'pgap_theta_error'       : gedi_beam['pgap_theta_error'],
             'rg'                     : gedi_beam['rg'],
             'rh100'                  : gedi_beam['rh100'],
             'rhog'                   : gedi_beam['rhog'],
             'rhog_error'             : gedi_beam['rhog_error'],
             'rhov'                   : gedi_beam['rhov'],
             'rhov_error'             : gedi_beam['rhov_error'],
             'rossg'                  : gedi_beam['rossg'],
             'rv'                     : gedi_beam['rv'],
             'sensitivity'            : gedi_beam['sensitivity'],
             'stale_return_flag'      : gedi_beam['stale_return_flag'],
             'surface_flag'           : gedi_beam['surface_flag'],
             'l2a_quality_flag'       : gedi_beam['l2a_quality_flag'],
             'l2b_quality_flag'       : gedi_beam['l2b_quality_flag']})

    gedi_beam_gdf = gpd.GeoDataFrame(gedi_beam_df, crs='EPSG:4326',
                                           geometry=gpd.points_from_xy(gedi_beam_df.longitude_lastbin,
                                                                             gedi_beam_df.latitude_lastbin))
    return gedi_beam_gdf

To access the data with this function, we can call the function for each beam id that we have:

[11]:
BEAM0000 = get_gedi_df(gedi_h5_file, 'BEAM0000')
BEAM0001 = get_gedi_df(gedi_h5_file, 'BEAM0001')
BEAM0010 = get_gedi_df(gedi_h5_file, 'BEAM0010')
BEAM0011 = get_gedi_df(gedi_h5_file, 'BEAM0011')
BEAM0101 = get_gedi_df(gedi_h5_file, 'BEAM0101')
BEAM0110 = get_gedi_df(gedi_h5_file, 'BEAM0110')
BEAM1000 = get_gedi_df(gedi_h5_file, 'BEAM1000')
BEAM1011 = get_gedi_df(gedi_h5_file, 'BEAM1011')

Now we can look at the data in one of the dataframes (beams). We can see that there are 108583 GEDI shots (108583 entries in each column).

[12]:
print(BEAM0000.head())
print('number of rows = ', len(BEAM0000))
   elevation_bin0  elevation_lastbin  height_bin0  height_lastbin  \
0      533.320205         532.870899 -9999.000000    -9999.000000
1      533.610735         533.161429 -9999.000000    -9999.000000
2      835.586133         790.954687    42.284134       -2.373195
3      822.361914         780.875651    38.280968       -3.229356
4      788.459662         760.602475    25.957445       -1.915897

         shot_number  solar_azimuth  solar_elevation  latitude_bin0  \
0  25370000300165164     -34.511044       -10.500773      51.821288
1  25370000300165165     -34.510292       -10.501078      51.821282
2  25370000300165166     -34.509689       -10.501341      51.821258
3  25370000300165167     -34.508926       -10.501648      51.821253
4  25370000300165168     -34.508163       -10.501955      51.821249

   latitude_lastbin  longitude_bin0  ...  rhov  rhov_error  rossg  \
0         51.821288     -127.096372  ...   0.6     -9999.0    0.5
1         51.821282     -127.095550  ...   0.6     -9999.0    0.5
2         51.821260     -127.094876  ...   0.6     -9999.0    0.5
3         51.821255     -127.094048  ...   0.6     -9999.0    0.5
4         51.821250     -127.093210  ...   0.6     -9999.0    0.5

            rv  sensitivity  stale_return_flag  surface_flag  \
0 -9999.000000   -14.502214                  1             1
1 -9999.000000    -8.689309                  1             1
2  3615.732178     0.965246                  0             1
3  4290.229980     0.969620                  0             1
4  2557.771240     0.943713                  0             1

   l2a_quality_flag  l2b_quality_flag                     geometry
0                 0                 0  POINT (-127.09637 51.82129)
1                 0                 0  POINT (-127.09555 51.82128)
2                 1                 1  POINT (-127.09485 51.82126)
3                 1                 1  POINT (-127.09403 51.82126)
4                 1                 1  POINT (-127.09320 51.82125)

[5 rows x 41 columns]
number of rows =  108583

Before displaying this data we can do a spatial subset to remove the data outside of our AOI by doing a spatial subset. We use the Geopandas clip function to clip out the GEDI data based on the extent of our AOI. This reduces the size of the dataframe from 108583 rows to 508.

[13]:
BEAM0000_sub = gpd.clip(BEAM0000, AOI)
len(BEAM0000_sub)

[13]:
508

We can display this subset of data over our AOI extent.

[14]:
m = folium.Map(location=[38.5, -78], zoom_start=9, tiles='CartoDB positron')

geo_j = folium.GeoJson(data=AOI, style_function=lambda x: {'fillColor': 'orange'})
geo_j.add_to(m)

geo_g = folium.GeoJson(data=BEAM0000_sub,marker = folium.CircleMarker(radius = 1, # Radius in metres
                                           weight = 0, #outline weight
                                           fill_color = '#FF0000',
                                           fill_opacity = 1))
geo_g.add_to(m)
m
[14]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Now we have this subset of data, we can look at some of the GEDI metrics over our area of interest. The ‘rh100’ metric should give us the top of the canopy heights. What does this look like over Shenandoah National Park? We will look at the ground height and the ‘rh100’ above ground. The DEM metric is in meters (m) and the ‘rh100’ metric is in centimeters (cm) above the ground height. Therefore we must add the ‘rh100’ value to the ground height to get the total height of the tree for display purposes. The ‘rh100’ metric is also converted to meters (m) to normalize the units

[15]:
latitude = BEAM0000_sub['latitude_lastbin']
rh100 = BEAM0000_sub['rh100']
ground = BEAM0000_sub['digital_elevation_model']

TreeHeight = ground + rh100/100

Finally, we can make a plot of the ground surface and the canopy heights above the ground surface. We can see the forest canopies in green above the topographically complex ground in brown.

[16]:
plt.figure(figsize=(20, 5))
plt.scatter(latitude, TreeHeight, c='green')
plt.plot(latitude, ground, c='brown')
[16]:
[<matplotlib.lines.Line2D at 0x7f1becec0590>]
../../_images/tutorials_GEDI_SearchGEDI_37_1.png

Now you have this basic structure you can investigate some of the other metrics and GEDI beams to understand more about the data.