Welcome to the MAAP User Documentation!¶
Introduction¶
Welcome¶
Welcome to the user documentation for the Multi-Mission Algorithm and Analysis Platform (MAAP), a collaborative NASA/ESA project for the support of aboveground biomass research.
About the MAAP¶
The MAAP platform is designed to combine data, algorithms, and computational abilities for the processing and sharing of data related to NASA’s GEDI, ESA’s BIOMASS, and NASA/ISRO’s NISAR missions. These missions generate vastly greater amounts of data than previous Earth observation missions. There are unique challenges to processing, storing, and sharing the relevant data due to the high data volume as well as with the data being collected from varied satellites, aircraft, and ground stations with different resolutions, coverages, and processing levels.
MAAP aims to address unique challenges by making it easier to discover and use biomass relevant data, integrating the data for comparison, analysis, evaluation, and generation. An algorithm development environment (ADE) is used to create repeatable and sharable science tools for the research community. The software is open source and adheres to ESA’s and NASA’s commitment to open data.
NASA and ESA are collaborating to further the interoperability of biomass relevant data and metadata. Tools have been developed to support a new approach to data stewardship and there is a data publication workflow for organizing and storing data and generating metadata to be discoverable in a cloud-based centralized location. The platform and data stewardship approaches are designed to ease barriers and promote collaboration between researchers, providers, curators, and experts across NASA and ESA.
This guide aims to help users get started with using the platform for searching, visualizing, accessing, processing, querying, and sharing biomass relevant data to the MAAP. These data, collected from satellites, aircraft, and ground stations, are organized into collections and granules. Collections are a grouping of files that share the same product specification. Granules are the individual files which are independently described, inventoried, and retrieved within a collection. Granules inherit additional attributes from their containing collection. Explanations of the various functions available in MAAP to use in the ADE will also be explored.
Planned Topics¶
The following is a list of existing and planned concepts to be covered in the completed version of this guide:
User Guides
- How To Git Clone a Repository Into the Algorithm Development Environment (ADE)
- Searching
- How to Use the MAAP Earthdata Search Client Graphical User Interface
- How to Search Collections in MAAP
- How to Search Granules in MAAP
- How to Search for and Compile a List of Granule IDs for Batch Processing
- Available CMR Collections in MAAP
- Visualizing
- How to Visualize Web Map Tile Service (WMTS) Layers
- How to Visualize 3D Tiles Layers
- Accessing
- How to Access Data from the MAAP
- How to Access EDAV Data via Web Coverage Service
- Querying
- How to Query Data from the MAAP via Python Client
- Fields Within the Gedi Cal/Val Data
- How to Query ATL08 Entwine Point Tiles
- User Data
- How to Share Data Products to the MAAP Data Store
Registering an Algorithm
- How Do I Register My Algorithm to MAS?
- How Do I Delete My Algorithm from MAS?
- How Do I View Algorithms Already Registered in MAS?
- Running a Data Processing System (DPS) Job
How Do I Run a DPS Job?
- How Do I Check the Status of My DPS Job?
- How Do I Get the Results from My DPS Job?
Frequently Asked Questions (FAQs)
- How Do I Name My Workspace Something Useful?
- How Do I Share My Workspace with Another MAAP User?
- How Do I Copy My EarthData Search into My Jupyter Notebook?
- How Do I Import Granules over from My EarthData Search into My Jupyter Notebook?
- Which Files in My Workspace Are Persistent, and Which Ones Are Only Available in My Workspace?
Platform Technical Documentation
- ADE Workspace
- How to Create Workspaces
- How to Share Workspaces
- ADE Projects
- How to Create a Project
- How to Add a Project to Your Workspace
- How to Update Your Project from inside Your Workspace
- ADE EarthData Search
- How to Find Data
- How to Add Data to a Notebook
- How to Visualize Data in a Notebook
- ADE Common Mapping Client
- How to Get Started
- API Documentation
- ADE Jupyter Notebook Magics
- Load the Inline Magics in Notebook
- Available Magics
- ADE Workspace and Data Collaboration
- Organizations
- How to Share Data
- ADE SSH
- SSH Access
- ADE User Interface Do’s/Don’t
- Side Panel
- Workspace Administration
- How to Manage a Workspace
- DPS Algorithms
- List Algorithms
- Describe an Algorithm
- Register an Algorithm
- Delete an Algorithm
- DPS Job Submission
- List Previous Jobs
- Execute a Job
- Check Job Status
- Check Job Results
- Delete Job
- Dismiss Job
API Documentation
cmr
Operations Related to CMRmas
Operations to Register an Algorithmdps
Operations to Interface with HySDS Mozartwmts
Retrieve Tileswms
WMS GetMapmembers
Operations for MAAP Membersquery
Operations for Query Service
Getting started with the MAAP¶
Learn how to sign up for an account and access all the services of the MAAP.
Signing up for an Earthdata Login account¶
The MAAP offers accounts for NASA users through Earthdata Login. Before accessing the MAAP as a NASA user, you will need to create an Earthdata Login account. Anyone can register for an Earthdata Login profile here: https://urs.earthdata.nasa.gov/users/new.
Signing up for a new MAAP account¶
Once registered, you can register for a MAAP account by navigating to the MAAP ADE at http://ade.ops.maap-project.org. On your first visit, select the “URS” login button shown here:
If this is your first visit to the MAAP, you will be asked to agree to the MAAP Terms of Use:
Once registered, you should be redirected back to the MAAP ADE showing a disabled account message similar to this:
At this point, a MAAP administrator will approve your account, which will grant you access to the MAAP ADE. This process typically takes 5-10 minutes. To check on the status of your pending account, contact the MAAP team at support@maap-project.org.
Once your MAAP account is approved, you will receive an email notification using the address of your Earthdata Login account to let you know that your access is enabled.
Code repository access¶
After creating your MAAP account, you can create a code repository by navigating to the MAAP GitLab account at https://repo.ops.maap-project.org. This GitLab account is connected to your ADE workspaces automatically when signing into the ADE.
Uploading your public SSH key¶
In order to access your ADE workspaces using SSH, you’ll need to upload your public SSH key to your MAAP profile using the MAAP portal at https://ops.maap-project.org.
Click on your profile name (or the “Login” button) in the top right corner shown here:
On your profile page, click the “Choose File” button to upload your key.
After uploading your key, the SSH connection will be enabled after your next login into the ADE.
Note on future URL changes¶
The “.ops.maap-project.org” URLs referenced in this guide will be converted to “.maap-project.org” once the version one of MAAP is released in the near future. At that time, these URLs will be updated accordingly.
How To Git Clone a Repository Into the Algorithm Development Environment (ADE)¶
This example walks through cloning a repository into the ADE. Cloning a repository allows you to open, edit, and run files contained within the cloned repository. In this example, we look at cloning the “MAAP-Project/maap-documentation” Git repository, so that you are able to experiment with the code examples contained within this user documentation.
When inside of a workspace, navigate to Git tab at the top of the Jupyter window. Click it to see the option to Clone.
We can also access the “Clone a repo” dialogue box by selecting the File Browser tab on the JupyterLab sidebar and using the Git Clone
button located at the top of the sidebar. The “Clone a repo” dialogue box prompts you to enter the URI of the repository you wish to clone. For this example we enter ” https://github.com/MAAP-Project/maap-documentation.git “. This can be found by visiting the GitHub site for the “MAAP-Project/maap-documentation” Git
repository and clicking the Code button.
Then press the CLONE button to clone the repository into the ADE.
With the File Browser tab on the JupyterLab sidebar selected, a folder named “maap-documentation” should now appear under the “projects” directory. Folders for the various sections of the guide can be found in the “docs/source/” directory.
To open the IPython Notebook for an example, go to a section directory and double-click on appropriate “.ipynb” file. For more information about the using Git in Jupyterlab, see https://github.com/jupyterlab/jupyterlab-git .
Search¶
Using the MAAP Earthdata Search Client Graphical User Interface¶
The Earthdata Search Client (EDSC) allows users to search, preview, download, and access EOSDIS Earth observation data. The MAAP has its own separate instance of NASA’s EDSC which is located at search.maap-project.org. This MAAP ESDC provides a graphical user interface (GUI) for the MAAP’s Common Metadata Repository (CMR) in order to make the process of discovering data easy for MAAP users.
(Image of the MAAP EDSC GUI)
Using the Earthdata Search Client¶
We can use the searchbar to search for a term (example: GEDI) to narrow the resulting collections. The search can be further refined by picking a temporal range from the calendar using the Temporal button and setting spatial boundaries using the Spatial
button. Additionally, using the Advanced search
lets you search by the HUC ID or region (more information about hydrological units may be found here).
You can use the Clear Filters
button to undo any filters that have been set. The sidebar to the left allows you to further refine your search by selecting one of the available facets. These include Features, Keywords, Platforms, Instruments, Organizations, Projects, and Processing levels.
We can also use the tools with the background map to refine our search. We can search spatially using Search by spatial polygon , Search by spatial rectangle
, and Search by spatial coordinate
. Layers may be edited using the Edit layers
button and deleted using the Delete layers
buttons. There are also options for North Polar Stereographic
, Geographic (Equirectangular)
, and South Polar Stereographic
projections. There are options to Zoom in
, Zoom home
, and Zoom out
. Finally, we can change the basemap by selecting the Map layers
button.
The results of the search are displayed in the Matching Collections section. Collection names and summaries for each result are shown here. The View collection details button may be used to view related URLs and additional information about the selected collection. Also, collections may be added to a project using the Add collection to the current project
button. Clicking anywhere else on a result allows you to see the
granules within the collection available for download.
Using the Earthdata Search Client in the MAAP Algorithm Development Environment (ADE)¶
The MAAP EDSC can also be accessed within the MAAP ADE when using the MAAP JupyterLab IDE. When inside of a workspace, we can use the Data Search tab at the top of the Jupyter window to see the option to Open EarthData Search. We can also access the MAAP EDSC by selecting the Commands tab on the JupyterLab sidebar and scrolling down to the Search section where the Open EarthData Search option is located. Alternatively, we can use the search bar at the top of the JupyterLab sidebar to find this option, or press Ctrl+Shift+C to be taken directly to the search bar.
The JupyterLab Search extension is integrated with the embedded Earthdata Search Client allowing users to paste queries and query results into a notebook. This is accomplished by using the EDSC and navigating to the desired JupyterLab notebook. Next, use the Data Search tab at the top of the Jupyter window to see the Paste Search Query and Paste Search Results options can be selected (these can also be accessed by the Commands tab on the JupyterLab sidebar).
Searching for Collections in MAAP¶
These examples walk through the MAAP API functionality of searching for collections based on specific parameters. Collections are groupings of files that share the same product specification. Searching for collections can be useful for finding individual files, known as granules, which are used for processing.
We begin by importing the MAAP
package and creating a new MAAP class.
[1]:
# import the MAAP package to handle queries
from maap.maap import MAAP
# import printing package to help display outputs
from pprint import pprint
# invoke the MAAP search client
maap = MAAP()
We can use the maap.searchCollection
function to return a list of desired collections. Before using this function, let’s use the help
function to view the specific arguments and keywords for maap.searchCollection
.
[2]:
# view help for the searchCollection function
help(maap.searchCollection)
Help on method searchCollection in module maap.maap:
searchCollection(limit=100, **kwargs) method of maap.maap.MAAP instance
Search the CMR collections
:param limit: limit of the number of results
:param kwargs: search parameters
:return: list of results (<Instance of Result>)
The help text is showing that maap.searchCollection
accepts a limit and search parameters. The limit parameter limits the number of resulting collections returned by maap.searchCollection
. Note that limit=100
means that the default limit for results from the MAAP API is 100. maap.searchCollection
accepts any additional search parameters that are included in the CMR. For a list of accepted parameters, please refer to the CMR Search Collections API
reference.
In this example we will explore search options including:
- Finding all Collections
- Searching by temporal filter
- Searching by spatial filter
- Using the results from one search as inputs into another
- Searching by additional attributes
Finding all Collections¶
Here we will demonstrate how to create a list containing all of the collections contained within the CMR. To do this, we will use the maap.searchCollection
function without any additional search parameters.
[3]:
# search all collections
results = maap.searchCollection()
# print the number of collections
pprint(f'Got {len(results)} results')
'Got 61 results'
We were able to get 61 results. The result from the MAAP API is a list of collections where each element in the list is the metadata for that particular collection. Note that as more collections are added to the CMR, it is important to remember that the default limit on results is 100. To change the limit, type limit=
and then a value within the parentheses after maap.searchCollection()
.
Let’s look at the metadata for the first collection in our list of results (results[0]
) using pprint
. For formatting purposes, we can use the depth
parameter to control the number of levels of metadata detail to display. By default, there is no constraint on the depth. By setting a depth
parameter (in this case depth=2
), we can ensure that the next contained level is replaced by an ellipsis.
[4]:
# print the metadata for the first collection
# we use the depth parameter to set the layer of metadata detail in the results, with (1) having the least detail
# (1) displays the concept ID, format, and revision ID
# adjust the depth to a larger value (6) if you would like to view all of the metadata
pprint(results[0],depth=2)
{'Collection': {'AdditionalAttributes': {...},
'ArchiveCenter': 'NASA/NSIDC_DAAC',
'Campaigns': {...},
'CollectionState': 'COMPLETE',
'Contacts': {...},
'DOI': {...},
'DataSetId': 'ABoVE LVIS L1B Geolocated Return Energy '
'Waveforms V001',
'Description': 'This data set contains laser altimetry return '
'energy waveform measurements over Alaska and '
'Western Canada taken from the NASA Land, '
'Vegetation, and Ice Sensor (LVIS). The data '
"were collected as part of NASA's Terrestrial "
'Ecology Program campaign, the Arctic-Boreal '
'Vulnerability Experiment (ABoVE).',
'InsertTime': '2020-10-17T20:32:38.639Z',
'LastUpdate': '2020-10-17T20:32:38.639Z',
'LongName': 'Not provided',
'OnlineAccessURLs': {...},
'OnlineResources': {...},
'Orderable': 'false',
'Platforms': {...},
'ProcessingLevelId': '1B',
'RevisionDate': '2019-09-06T19:27:00.000Z',
'ScienceKeywords': {...},
'ShortName': 'ABLVIS1B',
'Spatial': {...},
'SpatialKeywords': {...},
'Temporal': {...},
'VersionId': '001',
'Visible': 'true'},
'concept-id': 'C1200110748-NASA_MAAP',
'format': 'application/echo10+xml',
'revision-id': '11'}
The Collection
key has all of the collection information including attributes, the archive center, spatial, and temporal information. The concept-id
is a unique identifier for this collection. It can be used to further refine search results from the CMR, such as when searching for granule information.
Searching by Temporal Filter¶
Here we use a temporal filter to narrow down our results using the temporal
keyword in our search. The temporal keyword takes datetime information in a specific format. The date format used is YYYY-MM-DDThh:mm:ssZ
and temporal search criteria may be either a single date or a date range. If one date is provided then it can be inferred as the start or end date. To define a start date and return all collections
after the date, put a comma after the date (YYYY-MM-DDThh:mm:ssZ,
). To define a end date and return all granules prior to the data, put a comma before the date (,YYYY-MM-DDThh:mm:ssZ
). Lastly, to get a date range, provide the start date and end date separated by a comma (YYYY-MM-DDThh:mm:ssZ,YYYY-MM-DDThh:mm:ssZ
).
In this example we will search for one month of data.
[5]:
datetimeRange = '2000-01-01T00:00:00Z,2000-01-31T23:59:59Z' # specify datetime range to search for data in January 2000
results = maap.searchCollection(temporal=datetimeRange)
pprint(f'Got {len(results)} results')
'Got 3 results'
[6]:
collectionName = results[0]['Collection']['ShortName'] # get the collection short name
collectionDate = results[0]['Collection']['Temporal']['RangeDateTime']['BeginningDateTime'] # get the collection start time
pprint(
f'Collection {collectionName} was acquired starting at {collectionDate}', width=100)
'Collection Global_Forest_Change_2000-2017 was acquired starting at 2000-01-01T00:00:00.000Z'
It appears the first result correctly matches with the beginning and ending temporal search parameters. Keep in mind that the results are limited to 100 so the final collection returned may not match the end date that was searched for.
Searching by Spatial Filter¶
Here we will illustrate how to search for collections by a spatial filter. There are a couple of spatial filters available to search by in the CMR including point, line, polygon, and bounding box. In this example, we will explore filtering with a bounding box which is a sequence of four latitude and longitude values in the order of [W,S,E,N]
.
[7]:
collectionDomain = '-42,10,42,20' # specify bounding box to search by
results = maap.searchCollection(bounding_box=collectionDomain)
pprint(f'Got {len(results)} results')
'Got 17 results'
[8]:
collectionName = results[3]['Collection']['ShortName'] # get a collection short name
collectionGeometry = results[3]['Collection']['Spatial']['HorizontalSpatialDomain']['Geometry'] # grab the spatial information from collection
pprint(f'Collection {collectionName} was acquired within the following geometry: ', width=100)
pprint(collectionGeometry)
'Collection GEDI01_B was acquired within the following geometry: '
{'BoundingRectangle': {'EastBoundingCoordinate': '180.0',
'NorthBoundingCoordinate': '55.0',
'SouthBoundingCoordinate': '-55.0',
'WestBoundingCoordinate': '-180.0'},
'CoordinateSystem': 'CARTESIAN'}
We can see from the first collection that the spatial coordinates of the collection intersect our search box. Also note that spatial filtering yields more refined search results with only 16 collections returned.
Searching by Additional Attributes¶
The MAAP has provided additional metadata, known as additional attributes, to the collection metadata to support the unique search needs of the aboveground terrestrial carbon research community. There are many additional attributes available:
variable name | additional attribute name | data type |
---|---|---|
site_name | Site Name | string |
data_format | Data Format | string |
track_number | Track Number | float |
polarization | Polarization | string |
dataset_status | Dataset Status | string |
spat_res | Spatial Resolution | float |
samp_freq | Sampling Frequency | float |
band_ctr_freq | Band Center Frequency | float |
freq_band_name | Frequency Band Name | string |
swath_width | Swath Width | float |
field_view | Field of View | float |
laser_foot_diam | Laser Footprint Diameter | float |
pass_number | Pass Number | int |
revisit_time | Revisit Time | float |
flt_number | Flight Number | int |
number_plots | Number of Plots | int |
plot_area | Plot Area | float |
br_ht | Breast Height | float |
ret_per_pulse | Returns Per Pulse | string |
min_diam_meas | Minimum Diameter Measured | float |
flt_alt | Flight Altitude | float |
hdg | Heading | float |
swath_slant_rg_st_ang | Swath Slant Range Start Angle | float |
azm_rg_px_spacing | Azimuth Range Pixel Spacing | float |
slant_rg_px_spacing | Slant Range Pixel Spacing | float |
acq_type | Acquisition Type | string |
orbit_dir | Orbit Direction | string |
band_ctr_wavelength | Band Center Wavelength | float |
swath_slant_rg_end_ang | Swath Slant Range End Angle | float |
For example, if a user is only interested in using data from the Lope National Park Gabon site, we can use the following query:
[9]:
results = maap.searchCollection(site_name="Lope National Park Gabon")
pprint(f'Got {len(results)} results')
'Got 5 results'
[10]:
pprint(results[0],depth=2)
{'Collection': {'AdditionalAttributes': {...},
'ArchiveCenter': 'MAAP Data Management Team',
'Campaigns': {...},
'CollectionDataType': 'SCIENCE_QUALITY',
'CollectionState': 'COMPLETE',
'Contacts': {...},
'DataSetId': 'AfriSAR UAVSAR Geocoded Covariance Matrix '
'product Generated Using NISAR Tools',
'Description': 'The Geocoded Covariance Matrix dataset is the '
'4x4 Native Covariance Matrix geocoded to a '
'spatial resolution of 25m using cubic '
'interpolation methods. These covariance '
'matrix datasets provides the variability '
'between retrievals for each co-registered '
'single-look-complex (SLC) polarization and '
'provide inferences on the scattering '
'characteristics of a target. The SLC data was '
'collected from repeat-pass flights of the '
'Uninhabited Aerial Vehicle Synthetic Aperture '
'Radar (UAVSAR) instrument during the AfriSAR '
'field campaign over the Gabonese forest in '
'February-March 2016. The AfriSAR campaign was '
'a collaborative airborne mission between the '
'National Aeronautics and Space Administration, '
'the European Space Agency and the Gabonese '
'Space Agency collecting radar, lidar and field '
'measurements in support of future space borne '
'missions for biomass research.',
'InsertTime': '2020-10-17T20:32:38.676Z',
'LastUpdate': '2020-10-17T20:32:38.676Z',
'LongName': 'Not provided',
'OnlineAccessURLs': {...},
'OnlineResources': {...},
'Orderable': 'false',
'Platforms': {...},
'ProcessingLevelDescription': 'Geocoded and mapped to uniform '
'spatial grid scales',
'ProcessingLevelId': '3',
'RevisionDate': '2019-04-08T21:02:00.000Z',
'ScienceKeywords': {...},
'ShortName': 'AfriSAR_UAVSAR_Geocoded_Covariance',
'Spatial': {...},
'Temporal': {...},
'VersionId': '1',
'Visible': 'true'},
'concept-id': 'C1200109238-NASA_MAAP',
'format': 'application/echo10+xml',
'revision-id': '5'}
The returned results will give you only datasets that have been tagged as part of the Lope National Park Gabon research site.
Searching for Granules in MAAP¶
These examples will walk through the MAAP API functionality of searching granules within a collection based on specific parameters. Granules are individual files from a sensor where a group of granules make a collection within CMR. The granules are the raw data that will be used for processing.
We begin by importing the MAAP
and pprint
packages. Then invoke the MAAP
constructor, setting the maap_host
argument to 'api.ops.maap-project.org'
.
[1]:
# import the MAAP package
from maap.maap import MAAP
# import printing package to help display outputs
from pprint import pprint
# invoke the MAAP constructor using the maap_host argument
maap = MAAP(maap_host='api.ops.maap-project.org')
Here we view the specific arguments and keywords for the maap.searchGranule
function.
[2]:
help(maap.searchGranule)
Help on method searchGranule in module maap.maap:
searchGranule(limit=20, **kwargs) method of maap.maap.MAAP instance
Search the CMR granules
:param limit: limit of the number of results
:param kwargs: search parameters
:return: list of results (<Instance of Result>)
As we can see from the result, maap.searchGranule
accepts a limit keyword which limits the number of results from CMR. maap.searchGranule()
also accepts any additional search parameters that are included in CMR. For a list of accepted parameters, please refer to the CMR Search Granules API reference
It is important to note that the default limit on results from the MAAP API is 20. To increase the number of results we will specify a variable and use it in later queries.
[3]:
# get at max 500 results from CMR
MAXRESULTS = 500
In this example we will explore search options including:
- Searching by Collection Concept ID
- Searching by temporal filter
- Searching by spatial filter
- Using the results from one search as inputs into another
- Searching by additional attributes
For the next couple of examples, we will focus on the IceSat-2/ATLAS Land and Vegetation Height dataset.
Searching by Collection Concept ID¶
Here we will search by a unique ID that is given to CMR collections. You can find the collection IDs for all of the collections in MAAP in a table within the documentation: https://maap-project.readthedocs.io/en/latest/search/cmr_collection_table.html
It is recommended to begin the search with the Collection Concept ID as this is a specific unique identifier for a collection and will avoid ambiguity when searching by a long name or short name.
[4]:
COLLECTIONID = 'C1201746153-NASA_MAAP' # specify the collection id for the ATLAS dataset
results = maap.searchGranule(concept_id=COLLECTIONID,limit=MAXRESULTS)
pprint(f'Got {len(results)} results')
'Got 500 results'
We were able to get 500 results! There are most likely more than 500 granules in search results, but remember we limited the results to 500 granules. The result from the MAAP API is a list of granules where each element in the list is the metadata for that particular granule.
Now let’s look at the metadata for the first result.
[5]:
# print the first granule's metadata
# we use the depth parameter to set the layer of metadata detail in the results, with (1) having the least detail
# (1) displays the collection concept ID, concept ID, format, and revision ID
# adjust the depth to a larger value (6) if you would like to view all of the metadata
pprint(results[0],depth=2)
{'Granule': {'AdditionalAttributes': {...},
'Collection': {...},
'DataFormat': 'HDF5',
'DataGranule': {...},
'GranuleUR': 'SC:ATL08.005:228968197',
'InsertTime': '2021-11-05T13:31:43.040Z',
'LastUpdate': '2021-11-22T20:24:55.537Z',
'OnlineAccessURLs': {...},
'OrbitCalculatedSpatialDomains': {...},
'Orderable': 'true',
'Spatial': {...},
'Temporal': {...},
'Visible': 'true'},
'collection-concept-id': 'C1201746153-NASA_MAAP',
'concept-id': 'G1201746154-NASA_MAAP',
'format': 'application/echo10+xml',
'revision-id': '9'}
There is a lot of information in the metadata so let’s break it down…
The Granule
key has all of the granule information including attributes, browse imagery URLs, spatial, and temporal information. The collection-concept-id
should match what you searched by and be the same for each granule. Lastly the granule specific concept-id
is a unique identifier for this granule. This information can be used to further refine search results from CMR, specifically the granule information.
Searching by Temporal Filter¶
Here we will combine a search from earlier using the Collection Concept ID with a temporal filter to fine tune our results using the temporal
keyword in our search.
The temporal keyword takes datetime information in a specific format. The date format used is YYYY-MM-DDThh:mm:ssZ
and temporal search criteria may be either a single date or a date range. If one date is provided then it can be inferred as start or end date. To define a start date and return all granules after the date, put a comma after the date (YYYY-MM-DDThh:mm:ssZ,
). To define an end date and return all
granules prior to the data, put a comma before the date (,YYYY-MM-DDThh:mm:ssZ
). Lastly, to get a date range, provide the start date and end date separated by a comma (YYYY-MM-DDThh:mm:ssZ,YYYY-MM-DDThh:mm:ssZ
).
In this example we will search for one month of data.
[6]:
dateRange = '2018-12-01T00:00:00Z,2018-12-31T23:59:59Z' # specify a date range to search for data for Dec. 2018
results = maap.searchGranule(temporal=dateRange,concept_id=COLLECTIONID,limit=MAXRESULTS,)
pprint(f'Got {len(results)} results')
'Got 500 results'
[7]:
granuleFilename = results[0]['Granule']['DataGranule']['ProducerGranuleId'] # get the granule file name
granuleDate = results[0]['Granule']['Temporal']['RangeDateTime']['BeginningDateTime'] # get the granule start time
pprint(f'Granule {granuleFilename} was acquired starting at {granuleDate}',width=100)
'Granule ATL08_20181201001339_09680103_005_01.h5 was acquired starting at 2018-12-01T00:13:48.477Z'
It looks like the first result correctly matches with the beginning temporal search parameter. Keep in mind that the results are limited to 500 so the final granule returned may not match the end date that was searched for.
Searching by Spatial Filter¶
Here we will illustrate how to search for granules by a spatial filter. There are a couple of spatial filters available to search by in CMR including point, line, polygon, and bounding box. The most simple to use is the bounding box which is a sequence of four latitude and longitude values in the order of [W,S,E,N]
. In this example, we are going to search data from the NASA Shuttle Radar Topography Mission
Global 1 arc second V003 dataset over Gabon using the bounding_box
keyword.
[8]:
granuleDomain = '8.79799563969,-3.97882659263,14.4254557634,2.32675751384' # specify bounding box to search by
results = maap.searchGranule(bounding_box=granuleDomain,concept_id='C1200000522-NASA_MAAP',limit=MAXRESULTS)
pprint(f'Got {len(results)} results')
'Got 43 results'
[9]:
granuleFilename = results[0]['Granule']['DataGranule']['ProducerGranuleId'] # get the granule file name
granuleDate = results[0]['Granule']['Spatial']['HorizontalSpatialDomain']['Geometry'] # grab the spatial information from granule
pprint(f'Granule {granuleFilename} was acquired within the following geometry: ',width=100)
pprint(granuleDate)
'Granule N00E014.SRTMGL1.hgt.zip was acquired within the following geometry: '
{'BoundingRectangle': {'EastBoundingCoordinate': '15.00027778',
'NorthBoundingCoordinate': '1.00027778',
'SouthBoundingCoordinate': '-0.00027778',
'WestBoundingCoordinate': '13.99972222'}}
We can see from the first granule that the spatial coordinates of the granule intersect our search box. Also note that spatial filtering yields more refined search results with only 43 granules returned.
Searching by Additional Attributes¶
The MAAP has provided additional metadata, also called additional attributes, to the granule metadata to support the unique search needs of the aboveground terrestrial carbon research community. There are many additional attributes available. To get started users can search by the following keywords:
site_name
data_format
track_number
polarization
For example, if a user is only interested in using data from the Mondah Forest Gabon site with a data format of ASCII we can use the following query:
[10]:
results = maap.searchGranule(site_name="Mondah Forest Gabon",data_format='ASCII',concept_id='C1200110729-NASA_MAAP')
pprint(f'Got {len(results)} results')
'Got 17 results'
[11]:
pprint(results[0],depth=2)
{'Granule': {'AdditionalAttributes': {...},
'Collection': {...},
'DataFormat': 'ASCII',
'DataGranule': {...},
'GranuleUR': 'SC:AFLVIS2.001:138348895',
'InsertTime': '2018-09-24T11:00:23.343Z',
'LastUpdate': '2018-09-24T11:00:54.336Z',
'OnlineAccessURLs': {...},
'OnlineResources': {...},
'Orderable': 'true',
'Platforms': {...},
'Spatial': {...},
'Temporal': {...},
'Visible': 'true'},
'collection-concept-id': 'C1200110729-NASA_MAAP',
'concept-id': 'G1200115690-NASA_MAAP',
'format': 'application/echo10+xml',
'revision-id': '6'}
The returned results will give you only ASCII datasets that have been tagged as part of the Mondah Forest Gabon research site.
DISCLAIMER: The MAAP data team is working to update the additional attributes within the MAAP platform so these are subject to change. Furthermore, the accepted parameters for the additional attributes are changing and further documentation will be provided as updates come.
The MAAP API provides rich functionality to interact with the CMR instance within the MAAP platform. Users can search datasets programmatically by many parameters and even combine parameters such as spatial and temporal filters to refine results.
Searching for and Compiling a List of Granule IDs for Batch Processing¶
While using the MAAP ADE, you may wish to create a list of granule IDs to be used for batch processing, granules being the individual files from a sensor that are used for processing. For this example, we will imagine a scenario that we wish to produce a biomass estimate for a single Landsat tile, and then expand that estimate over a larger area. In order to produce this expanded estimate, we will create a list of the Landsat files which fall within a certain area.
We start by importing the MAAP
and pprint
packages and creating a new MAAP class.
[1]:
# import the MAAP package
from maap.maap import MAAP
# import printing package to help display outputs
from pprint import pprint
# create MAAP class
maap = MAAP()
We can use the searchGranule
function to search for Landsat 8 granules. Click here for more information about searching for granules with the searchGranule
function. Since the default limit on results from the MAAP API is 20, we specify a variable to use in our search query.
[2]:
# get at max 1000 results from CMR
MAXRESULTS = 1000
To filter our search to Landsat 8 data, we create a variable with the collection ID of the Landsat 8 OLI Surface Reflectance Analysis Ready Data collection. You can find this collection ID as well as the collection IDs for all of the collections in MAAP in the following table within the documentation: https://maap-project.readthedocs.io/en/latest/search/cmr_collection_table.html. Using the Collection Concept ID is the preferred method to filter by a collection, as it is a unique identifier which avoids ambiguity.
[3]:
COLLECTIONID = 'C1200110769-NASA_MAAP' # specifying the collection ID for the Landsat 8 dataset
Next, we search for granules using the collection ID and a spatial filter. We can use a bounding box as our spatial filter. The bounding box is a sequence of four latitude and longitude values in the order of [W,S,E,N]. For this example, let’s search for granules from the Landsat 8 data using a bounding box for the country Peru.
[4]:
collectionDomain = '-81.4109425524,-18.3479753557,-68.6650797187,-0.0572054988649' # specify bounding box to search by
# getting results from granule search using the bounding box, collection ID, and results limit
results = maap.searchGranule(bounding_box=collectionDomain,concept_id=COLLECTIONID,limit=MAXRESULTS)
pprint(f'Got {len(results)} results') # print the number of results
'Got 805 results'
We were able to get 805 results. Each element in the list results
contains the metadata for one of the granules returned by the search. Within this metadata is the key concept-id
, which is the unique identifier for each granule. To create a list of granule IDs, we create a new list and use a for loop to add the concept-id
from each element of results
into the new list.
[5]:
granuleID_list = [] # create list for granule IDs
for result in results: # loop through each element (granule) in the list `results`
granuleID_list.append(result['concept-id']) # add the concept id for each granule to `granuleID_list`
# Tip: you can use `print(granuleID_list)` to see the result
Now we have a list of all the granule IDs for granules in the Landsat 8 collection that fall within the bounding box for the country Peru within granuleID_list
.
Available CMR Collections in MAAP¶
Last updated: 2020-02-21
Common Name | CMR Long Name | Concept ID | Revision ID |
---|---|---|---|
ABoVE LVIS L1B Geolocated Waveforms | ABoVE LVIS L1B Geolocated Return Energy Waveforms V001 | C1200110748-NASA_MAAP | 11 |
ABoVE LVIS L2 Geolocated Surface Elevation | ABoVE LVIS L2 Geolocated Surface Elevation Product V001 | C1200125288-NASA_MAAP | 10 |
ABoVE UAV SAR PolSAR | Arctic-Boreal Vulnerability Experiment Uninhabited Aerial Vehicle Synthetic Aperture Radar Polarimetric SAR | C1200120114-NASA_MAAP | 3 |
AfriSAR: Aboveground Biomass, Gabon Sites | AfriSAR: Aboveground Biomass for Lope, Mabounie, Mondah, and Rabi Sites, Gabon | C1200115768-NASA_MAAP | 4 |
AfriSAR: Canopy Structure from Pol InSAR and Coherence | AfriSAR: Canopy Structure Derived from PolInSAR and Coherence TomoSAR NISAR tools - Cloned | C1200116817-NASA_MAAP | 1 |
AfriSAR: Canopy Structure from Pol InSAR and Coherence | AfriSAR: Canopy Structure Derived from PolInSAR and Coherence TomoSAR NISAR tools | C1200000521-NASA_MAAP | 8 |
AfriSAR DLR | AFRISAR_DLR | C1200109552-ESA_MAAP | 1 |
AfriSAR DLR 2 | AFRISAR_DLR2 | C1200109550-ESA_MAAP | 3 |
AfriSAR: LVIS Canopy Cover and Vertical Profile Metrics | AfriSAR: Canopy Cover and Vertical Profile Metrics Derived from LVIS, Gabon, 2016 | C1200097474-NASA_MAAP | 4 |
AfriSAR LVIS L1B Geolocated Waveforms | AfriSAR LVIS L1B Geolocated Return Energy Waveforms V001 | C1200110728-NASA_MAAP | 10 |
AfriSAR LVIS L2 Geolocated Surface Elevation | AfriSAR LVIS L2 Geolocated Surface Elevation Product V001 | C1200110729-NASA_MAAP | 10 |
AfriSAR UAVSAR Coregistered SLCs | AfriSAR UAVSAR Coregistered SLCs Generated Using NISAR Tools | C1200000308-NASA_MAAP | 15 |
AfriSAR UAVSAR Geocoded Covariance Matrix product | AfriSAR UAVSAR Geocoded Covariance Matrix product Generated Using NISAR Tools | C1200109238-NASA_MAAP | 5 |
AfriSAR UAVSAR Geocoded SLCs | AfriSAR UAVSAR Geocoded SLCs Generated Using NISAR Tools | C1200109237-NASA_MAAP | 4 |
AfriSAR: Mondah Forest Tree Species, Biophysical, and Biomass Data | AfriSAR: Mondah Forest Tree Species, Biophysical, and Biomass Data, Gabon, 2016 | C1200000492-NASA_MAAP | 13 |
AfriSAR UAV SAR MLC Product | AfriSAR Uninhabited Aerial Vehicle Synthetic Aperture Radar Multi Look Cross Product | C1200125308-NASA_MAAP | 5 |
AfriSAR UAVSAR Normalization Area | AfriSAR UAVSAR Normalization Area Generated Using NISAR Tools | C1200109244-NASA_MAAP | 8 |
AfriSAR ONERA | AFRISAR_ONERA | C1200015069-ESA_MAAP | 3 |
AfriSAR: Polarimetric Height Profiles | AfriSAR: Polarimetric Height Profiles by TomoSAR, Lope and Rabi Forests, Gabon, 2016 | C1200015148-NASA_MAAP | 5 |
AfriSAR: Rainforest Canopy Height from PolInSAR and Lidar Data | AfriSAR: Rainforest Canopy Height Derived from PolInSAR and Lidar Data, Gabon | C1200015068-NASA_MAAP | 5 |
AfriSAR UAV SAR SLC Product | AfriSAR Uninhabited Aerial Vehicle Synthetic Aperture Radar Single Look Complex | C1200125309-NASA_MAAP | 4 |
AfriSAR UAVSAR Ungeocoded Covariance Matrix product | AfriSAR UAVSAR Ungeocoded Covariance Matrix product Generated Using NISAR Tools | C1200109245-NASA_MAAP | 5 |
AfriSAR UAVSAR Vertical Wavenumber (KZ) | AfriSAR UAVSAR Vertical Wavenumber (KZ) Generated Using NISAR Tools | C1200109243-NASA_MAAP | 3 |
ALOS PALSAR | Advanced Land Observation Satellite (ALOS) Phased Array L-band Synthetic Aperture Radar | C1200116820-NASA_MAAP | 14 |
ALOS PALSAR RTC High Resolution Products | Advance Land Observing Satellite Phased Array type L-band Synthetic Aperture Radar Radiometric Terrain-Corrected High Resolution products | C1200120106-NASA_MAAP | 5 |
ATLAS/ICESat-2 Geolocated Photon Data | ATLAS/ICESat-2 L2A Global Geolocated Photon Data V002 | C1200166513-NASA_MAAP | 7 |
ATLAS/ICESat-2 Land and Vegetation Height | ATLAS/ICESat-2 L3A Land and Vegetation Height V002 | C1200116818-NASA_MAAP | 3 |
BIOSAR 1 | BIOSAR1 | C1200015072-ESA_MAAP | 11 |
BIOSAR 2 | BIOSAR2 | C1200116534-ESA_MAAP | 1 |
BIOSAR 3 | BIOSAR3 | C1200116533-ESA_MAAP | 1 |
Environmental Stress Factor for Diameter-Height Tree Allometry | Environmental Stress Factor for Diameter-Height Tree Allometry | C1200109548-NASA_MAAP | 5 |
GEDI Calibration/Validation Airborne Lidar Dataset | Global Ecosystem Dynamics Investigation (GEDI) Calibration/Validation Airborne Lidar Dataset | C1200117888-NASA_MAAP | 5 |
GEDI Calibration/Validation Field Survey Dataset | Global Ecosystem Dynamics Investigation (GEDI) Calibration/Validation Field Survey Dataset | C1200116827-NASA_MAAP | 12 |
GEDI L1B Geolocated Waveform Data | GEDI L1B Geolocated Waveform Data Global Footprint Level V001 | C1200231031-NASA_MAAP | 4 |
GEDI L2A Elevation and Height Metrics Data | GEDI L2A Elevation and Height Metrics Data Global Footprint Level V001 | C1200231029-NASA_MAAP | 5 |
GEDI L2B Canopy Cover and Vertical Profile Metrics Data | GEDI L2B Canopy Cover and Vertical Profile Metrics Data Global Footprint Level V001 | C1200231030-NASA_MAAP | 1 |
Global Forest Change 2000-2017 | Global Forest Change 2000-2017 | C1200090741-NASA_MAAP | 3 |
Global PALSAR-2/PALSAR Forest/Non-Forest Map | Global 25m Resolution PALSAR-2/PALSAR Forest/Non-Forest Map | C1200090708-NASA_MAAP | 8 |
Global PALSAR-2/PALSAR Mosaic | Global 25m Resolution PALSAR-2/PALSAR Mosaic | C1200097475-NASA_MAAP | 7 |
GlobCover Global Land Cover Product (2005-06) | GlobCover Global Land Cover Product (2005-06) | C1200015150-NASA_MAAP | 4 |
GlobCover Global Land Cover Product (2009) | GlobCover Global Land Cover Product (2009) | C1200015149-NASA_MAAP | 8 |
GPM IMERG Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V05 | GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V05 (GPM_3IMERGHH) at GES DISC | C1200015188-NASA_MAAP | 5 |
INDREX2 | INDREX2 | C1200015073-ESA_MAAP | 1 |
Landsat 7 ETM+ Surface Reflectance Analysis Ready Data | Landsat 7 Enhanced Thematic Mapper Plus (ETM+) Surface Reflectance Analysis Ready Data (ARD) V1 | C1200110768-NASA_MAAP | 10 |
Landsat 8 OLI Surface Reflectance Analysis Ready Data | Landsat 8 Operational Land Imager (OLI) Surface Reflectance Analysis Ready Data (ARD) V1 | C1200110769-NASA_MAAP | 8 |
NASA Langley KingAir B-200 Flight Tracks for Gabon | NASA Langley KingAir B-200 Flight Tracks for Gabon, Africa 2016 | C1200090707-NASA_MAAP | 4 |
NASA Shuttle Radar Topography Mission 1 arc second V003 | NASA Shuttle Radar Topography Mission Global 1 arc second V003 | C1200000522-NASA_MAAP | 8 |
Sentinel-1A Dual-pol ground projected images | Sentinel-1A Dual-pol ground projected high and full resolution images | C1200231010-NASA_MAAP | 11 |
Sentinel-1A slant-range product | Sentinel-1A slant-range product | C1200166514-NASA_MAAP | 1 |
Sentinel-1B Dual-pol ground projected images | Sentinel-1B Dual-pol ground projected high and full resolution images | C1200231011-NASA_MAAP | 7 |
TROPISAR | TROPISAR | C1200116477-ESA_MAAP | 1 |
Deprecated Collections in MAAP¶
Last updated: 2022-02-08
As of January 19, 2022, the following CMR collection versions have been deprecated and are no longer available in the MAAP CMR:
Collection Name | Version | Concept ID |
---|---|---|
GPM IMERG | v5 | C1200015188-NASA_MAAP |
ATL08 | v002 | C1200116818-NASA_MAAP |
ATL08 | v003 | C1200235747-NASA_MAAP |
ATL03 | v002 | C1200166513-NASA_MAAP |
GEDI L2B | v001 | C1200231030-NASA_MAAP |
Newer versions of each of these collections are listed as follows:
Collection Name | Newest Available Version | Concept ID |
---|---|---|
GPM IMERG | N/A | N/A |
ATL08 | v004 | C1201309827-NASA_MAAP |
ATL03 | v004 | C1201300747-NASA_MAAP |
GEDI L2B | v002 | C1201460047-NASA_MAAP |
Searching the STAC Catalog¶
This tutorial provides a basic introduction to searching the MAAP STAC catalog (https://stac.maap-project.org/) using pystac-client
.
Another method of searching the STAC catalog is via the STAC browser.
About the STAC Catalog¶
At this time, the STAC catalog provides discovery of a subset of MAAP datasets. These datasets were selected because MAAP CMR analytics indicated selected datasets were being searched for the most. The data files have not been moved at all in the process of publishing datasets to STAC.
Data will continue to be added to the STAC catalog with priority given to datasets which are known to be in-use by MAAP UWG members through CMR metrics, S3 metrics, direct collaboration with data team members and by request.
Prerequisites
- pystac-client
- rioxarray (for opening a raster as an xarray dataset)
Authorship
- Author: Aimee Barciauskas
- Date: December 13, 2022
- Resources used: https://pystac-client.readthedocs.io/en/stable/tutorials/pystac-client-introduction.html
[1]:
# Uncomment the next line to install pystac-client if you haven't already.
# !pip install pystac-client
[2]:
from pystac_client import Client
STAC Client¶
We first connect to an API by retrieving the root catalog, or landing page, of the API with the Client.open function.
[3]:
# STAC API root URL
URL = 'https://stac.maap-project.org/'
# custom headers
headers = []
cat = Client.open(URL, headers=headers)
cat
[3]:
<Client id=stac-fastapi>
CollectionClient
As with a static catalog the get_collections function will iterate through the Collections in the Catalog. Notice that because this is an API it can get all the Collections through a single call, rather than having to fetch each one individually.
[4]:
for collection in cat.get_all_collections():
print(collection)
<CollectionClient id=AfriSAR_UAVSAR_Coreg_SLC>
<CollectionClient id=Landsat8_SurfaceReflectance>
<CollectionClient id=AfriSAR_AGB_Maps_1681>
<CollectionClient id=ABLVIS1B>
<CollectionClient id=GEDI02_B>
<CollectionClient id=AFLVIS2>
<CollectionClient id=BIOSAR1>
<CollectionClient id=GEDI02_A>
<CollectionClient id=AFRISAR_DLR>
[5]:
collection = cat.get_collection('AFRISAR_DLR')
collection
[5]:
<CollectionClient id=AFRISAR_DLR>
STAC Items
The main functions for getting items return iterators, where pystac-client will handle retrieval of additional pages when needed. Note that one request is made for the first ten items, then a second request for the next ten.
[6]:
items = collection.get_items()
# flush stdout so we can see the exact order that things happen
def get_ten_items(items):
for i, item in enumerate(items):
print(f"{i}: {item}", flush=True)
if i == 9:
return
print('First page', flush=True)
get_ten_items(items)
print('Second page', flush=True)
get_ten_items(items)
First page
0: <Item id=afrisar_dlr_roi_RAB100q>
1: <Item id=afrisar_dlr_roi_RAB099q>
2: <Item id=afrisar_dlr_roi_RAB098q>
3: <Item id=afrisar_dlr_roi_RAB097q>
4: <Item id=afrisar_dlr_roi_RAB096q>
5: <Item id=afrisar_dlr_roi_RAB095q>
6: <Item id=afrisar_dlr_roi_RAB094q>
7: <Item id=afrisar_dlr_roi_RAB093q>
8: <Item id=afrisar_dlr_roi_RAB092q>
9: <Item id=afrisar_dlr_roi_RAB091q>
Second page
0: <Item id=afrisar_dlr_roi_RAB090q>
1: <Item id=afrisar_dlr_roi_RAB089q>
2: <Item id=afrisar_dlr_roi_RAB088q>
3: <Item id=afrisar_dlr_roi_RAB087q>
4: <Item id=afrisar_dlr_roi_RAB086q>
5: <Item id=afrisar_dlr_roi_RAB085q>
6: <Item id=afrisar_dlr_roi_RAB084q>
7: <Item id=afrisar_dlr_roi_RAB083q>
8: <Item id=afrisar_dlr_roi_RAB082q>
9: <Item id=afrisar_dlr_roi_RAB081q>
Discover the URL of one item using xarray
[7]:
item = collection.get_item('afrisar_dlr_H4-2_SLC_VV')
item.assets['data'].href
[7]:
'https://bmap-catalogue-data.oss.eu-west-0.prod-cloud-ocb.orange-business.com/Campaign_data/afrisar_dlr/afrisar_dlr_H4-2_SLC_VV.tiff'
Visualize¶
Visualizing Web Map Tile Service (WMTS) Layers¶
At this time, there are two collections (AFLVIS2 and AfriSAR_UAVSAR_Coreg_SLC) for which we can visualize WMTS layers. WMTS layers can be visualized using the Common Mapping Client (CMC). The CMC is a starter-kit for web-based mapping applications which utilizes several common mapping funtionalities. This example demonstrates how to visualize WMTS layers using ipyCMC
, a Jupyter Lab widget for the CMC.
First, we import the ipycmc
package.
[1]:
# Import the ipycmc module
import ipycmc
We utilize the CMC widget to visualize data in the MAAP Algorithm Development Environment (ADE).
[2]:
# utilize the CMC widget
w = ipycmc.MapCMC()
w
The CMC user interface provides a variety of tools for analyzing data. In the top left corner, the Map Layers drop-down menu allows us to turn map layers on and off and provides tools for managing layer positioning, setting layer opacity, and zooming to a layer. At the lower right end of the display are tools for displaying the data, adjusting the zoom level, and selecting data. The ‘Switch to 3D map’ button allows us to switch between 2D and 3D maps. The ‘Home’ button zooms to the global extent and the ‘Zoom In’ and ‘Zoom Out’ buttons zoom the data to appear nearer or farther away. The ‘Select Area’ button allows us to select an area using a variety of geometrical methods, the ‘Select Basemap’ button allows us to select from a list of available basemaps, and the ‘Fullscreen’ button toggles full screen mode. At the bottom right, the cursor location on the map is displayed in Latitude/Longitude decimal degrees. At the bottom left of the user interface, there is a text box allowing us to change the date and time and arrows to change the day.
Visualizing Collections¶
To visualize the available collections, we utilize the load_layer_config()
function, handling the url as a WMTS xml file.
[3]:
# visualize the available set of layers
w.load_layer_config("https://api.ops.maap-project.org/api/wmts/GetCapabilities", "wmts/xml")
Once the layers load, the number displayed in the Map Layers drop-down menu should increase. Pressing the downward arrow on the drop-down menu displays the loaded layers. Turn the layers on and off by pressing the toggle to the left of the layer names and zoom to a layer by pressing the crosshair button.
Visualizing Single Granules¶
We can also visualize a single granule which is within one of the available collections by using the granule UR with the load_layer_config()
function and handling the url as a WMTS file.
[4]:
# visualize a single granule using the granule UR
w.load_layer_config("https://api.ops.maap-project.org/api/wmts/GetCapabilities?short_name=AFLVIS2&granule_ur=SC:AFLVIS2.001:138348905", "wmts/xml")
This example here shows how we can add a layer by providing a granule UR. These granule URs can be extracted from the metadata by searching using CMR and the MAAP API. Please see the search granule example for specifics on how to search for granules and extract the UR - https://maap-project.readthedocs.io/en/latest/search/granules.html.
Visualizing SRTM dataset from MAAP CMR STAC using MosaicJSON¶
In this notebook, we discover SRTM COGs from `MAAP’s CMR STAC API Proxy <https://cmr-stac.dit.maap-project.org/stac/NASA_MAAP/collections/SRTMGL1_COD.v001>`__ and generate a mosaic.
MosaicJSON¶
A common challenge in visualizing spatial data is data is stored across many files representing small spatial extents.
MosaicJSON is a specification created by DevelopmentSeed which aims to be an open standard for representing metadata about a spatial mosaic of many COG files.
MosaicJSON can be seen as a cloud friendly virtual raster (see GDAL’s VRT) enabling spatial and temporal indexing for a list of Cloud-Optimized GeoTIFF.
Data¶
The NASA Shuttle Radar Topographic Mission (SRTM) has provided digital elevation data (DEMs) for over 80% of the globe. This data is currently distributed free of charge by USGS and is available for download from the National Map Seamless Data Distribution System, or the USGS FTP site.
At MAAP, we’ve converted this elevation data into Cloud-Optimized GeoTIFFs (COGs) so they can be efficiently queried and visualized. These COGs are available in the MAAP CMR.
Titiler Endpoint¶
By default, TiTiler has a complete mosaicjson
endpoint. For this demo we are going to use an instance of TiTiler deployed by MAAP at https://titiler.maap-project.org
Requirements¶
To be able to run this notebook you’ll need the following requirements:
- rasterio
- folium
- requests
- tqdm
- rio-tiler (2.0b8) (Optional)
- cogeo-mosaic (Optional)
pip install rasterio folium requests tqdm
pip install rio-tiler cogeo-mosaic --pre
Get the Data¶
[1]:
import requests
from pprint import pprint
from rio_tiler.io import COGReader
from rasterio.features import bounds as featureBounds
from folium import Map, TileLayer, GeoJson
Fetch SRTM COG STAC items¶
[2]:
stac_endpoint = "https://cmr-stac.dit.maap-project.org/stac/NASA_MAAP/search"
stac_json = {
"collections": ["SRTMGL1_COD.v001"],
"bbox": "4,42,16,48",
"limit": 120
}
headers = {
"Content-Type": "application/json",
"Accept": "application/geo+json, application/json",
}
# Read Items
r_stac = requests.post(stac_endpoint, headers=headers, json=stac_json)
items = r_stac.json()
Map the data bounds¶
[3]:
geojson = {'type': 'FeatureCollection', 'features': items['features']}
bounds = featureBounds(geojson)
zoom_start = 5
m = Map(
tiles="OpenStreetMap",
location=((bounds[1] + bounds[3]) / 2,(bounds[0] + bounds[2]) / 2),
zoom_start=zoom_start
)
geo_json = GeoJson(
data=geojson,
style_function=lambda x: {
'opacity': 1, 'dashArray': '1', 'fillOpacity': 0, 'weight': 1
},
)
geo_json.add_to(m)
m
[3]:
2. Create Mosaic¶
We’re using the TiTiler deployed by MAAP
[4]:
titiler_endpoint = "https://titiler.maap-project.org" # MAAP titiler endpoint
[5]:
%time
from rio_tiler.io import COGReader
first_cog = geojson['features'][0]['assets']['browse']['href']
with COGReader(first_cog) as cog:
info = cog.info()
CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 7.63 µs
[6]:
from cogeo_mosaic.mosaic import MosaicJSON
# We are creating the mosaicJSON using the geojson
# SRTMGL1 CODs have a "browse" asset type, so we can create a mosaic of these type="image/tiff" assets
# accesor function provide access to those
mosaicdata = MosaicJSON.from_features(geojson.get('features'), minzoom=6, maxzoom=info.maxzoom, accessor=lambda feature : feature['assets']['browse']['href'])
[7]:
# Uploading the mosaicjson to the TiTiler
r = requests.post(
url=f"{titiler_endpoint}/mosaics",
headers={
"Content-Type": "application/vnd.titiler.mosaicjson+json",
},
json=mosaicdata.dict(exclude_none=True)).json()
pprint(r)
{'id': '7e9554c7-a41d-407d-a835-d6e6add92c40',
'links': [{'href': 'https://titiler.maap-project.org/mosaics/7e9554c7-a41d-407d-a835-d6e6add92c40',
'rel': 'self',
'title': 'Self',
'type': 'application/json'},
{'href': 'https://titiler.maap-project.org/mosaics/7e9554c7-a41d-407d-a835-d6e6add92c40/mosaicjson',
'rel': 'mosaicjson',
'title': 'MosaicJSON',
'type': 'application/json'},
{'href': 'https://titiler.maap-project.org/mosaics/7e9554c7-a41d-407d-a835-d6e6add92c40/tilejson.json',
'rel': 'tilejson',
'title': 'TileJSON',
'type': 'application/json'},
{'href': 'https://titiler.maap-project.org/mosaics/7e9554c7-a41d-407d-a835-d6e6add92c40/tiles/{z}/{x}/{y}',
'rel': 'tiles',
'title': 'Tiles',
'type': 'application/json'},
{'href': 'https://titiler.maap-project.org/mosaics/7e9554c7-a41d-407d-a835-d6e6add92c40/WMTSCapabilities.xml',
'rel': 'wmts',
'title': 'WMTS',
'type': 'application/json'}]}
The response from the post request gives endpoints for different services (eg. mosaicjson, tilejson, tiles, wmts, etc). We’re fetching the tilejson
endpoint.
[8]:
tilejson_endpoint = list(filter(lambda x: x.get("rel") == "tilejson", dict(r)["links"]))
NOTE: You have to zoom to “minzoom” to load the data.
SECOND NOTE: The important bit is the “tiles” endpoint returned from f"{titiler_endpoint}/mosaics
. This endpoint (e.g. https://titiler.maap-project.org/mosaics/4199126b-9313-435a-b4b5-17802716b7b1/tiles/{z}/{x}/{y}
) could be used in any map client which can tile x,y,z
layers.
[9]:
r_te = requests.get(
tilejson_endpoint[0].get('href')
).json()
pprint(r_te)
tiles = TileLayer(
tiles=f"{r_te['tiles'][0]}?rescale=0,4000",
min_zoom=r_te["minzoom"],
max_zoom=r_te["maxzoom"],
opacity=1,
attr="USGS"
)
tiles.add_to(m)
m
{'bounds': [2.9997222, 40.9997222, 17.0002778, 49.0002778],
'center': [10.0, 45.0, 6],
'maxzoom': 12,
'minzoom': 6,
'name': '7e9554c7-a41d-407d-a835-d6e6add92c40',
'scheme': 'xyz',
'tilejson': '2.2.0',
'tiles': ['https://titiler.maap-project.org/mosaics/7e9554c7-a41d-407d-a835-d6e6add92c40/tiles/{z}/{x}/{y}@1x'],
'version': '1.0.0'}
[9]:
Algorithm Development Environment (ADE) Visualization Example¶
MosaicJSON¶
A common challenge in visualizing spatial data is data is stored across many files representing small spatial extents.
MosaicJSON is a specification created by DevelopmentSeed which aims to be an open standard for representing metadata about a spatial mosaic of many COG files.
MosaicJSON can be seen as a cloud friendly virtual raster (see GDAL’s VRT) enabling spatial and temporal indexing for a list of Cloud-Optimized GeoTIFF.
Visualizing COGs generated in the MAAP Algorithm Development Environment (ADE)¶
This notebook visualizes COGs generated in the ADE using the python `Common Mapping Client <https://github.com/nasa/common-mapping-client>`__, an open source project of NASA and JPL.
Any Cloud Optimized GeoTIFF (COG), or group of COGs, in an ADE workspace can be visualized in a dynamic map by using a tiling service hosted in MAAP.
Steps: 1. Make a list of TIFFs in your workspace to use as a single layer 2. Generate a MosaicJSON file from this list of files (or a GeoJSON index) 3. Combine the MosaicJSON with other tiler visualization parameters to register a layer with your visualization tool.
[1]:
import glob
import ipycmc
import os
from pprint import pprint
import requests
import urllib
#!pip install cogeo_mosaic
from cogeo_mosaic.mosaic import MosaicJSON
from cogeo_mosaic.backends import MosaicBackend
Build a list of files¶
You can either make a list of file paths, or create a GeoJSON layer with a column containing the file paths. The paths need to be S3 paths currently.
[2]:
# Local Path to your COGs
dps_output = "/projects/shared-buckets/alexdevseed/landsat8/viz/"
# titiler endpoint
titiler_endpoint = f"https://titiler.maap-project.org"
# Search for files to include, use recursive if nested folders (common in DPS output)
files = glob.glob(os.path.join(dps_output, "Landsat*_dps.tif"), recursive=False)
def local_to_s3(url):
''' A Function to convert local paths to s3 urls'''
return url.replace("/projects/shared-buckets", "s3://maap-ops-workspace/shared")
tiles = [local_to_s3(file) for file in files]
print(tiles)
['s3://maap-ops-workspace/shared/alexdevseed/landsat8/viz/Landsat8_30542_comp_cog_2015-2020_dps.tif', 's3://maap-ops-workspace/shared/alexdevseed/landsat8/viz/Landsat8_30543_comp_cog_2015-2020_dps.tif', 's3://maap-ops-workspace/shared/alexdevseed/landsat8/viz/Landsat8_30822_comp_cog_2015-2020_dps.tif', 's3://maap-ops-workspace/shared/alexdevseed/landsat8/viz/Landsat8_30823_comp_cog_2015-2020_dps.tif']
How to find the S3 path¶
You might be wondering how to find the S3 path to begin with. Right now the easiest way is to right click on a file in the file explorer on the left panel, and Get Presigned S3 Url.
It will look something like https://maap-ops-workspace.s3.amazonaws.com/shared/alexdevseed/landsat8/viz/Copernicus_30438_covars_cog_topo_stack.tif?AWSAccessKeyId...
The first part of the url is the bucket name: maap-ops-workspace
. After the next /
, it then matches to the local path.
Future versions of MAAP will include functions to do this part for you…
Make a mosaic¶
[3]:
# Now take the list of S3 paths and generate a mosaic
# TODO: if you have a lot of files (more than 100), creating a GeoJSON index and using from_features will be more efficient.
mosaicdata = MosaicJSON.from_urls(tiles, minzoom=9, maxzoom=16)
[4]:
r = requests.post(
url=f"{titiler_endpoint}/mosaics",
headers={
"Content-Type": "application/vnd.titiler.mosaicjson+json",
},
json=mosaicdata.dict(exclude_none=True)).json()
mosaicid = r['id']
Make a Map¶
[5]:
w = ipycmc.MapCMC()
w
[6]:
# Build a WMTS call
"""
All of this is subject to change in a future version
The important parameters for users:
url : the S3 path to the MosaicJSON file,
bidx (band number),
rescale (required if using non Byte data type),
colormap_name or colormap
Other parameters are possible, see https://titiler.maap-project.org/docs#/MosaicJSON/wmts_mosaicjson_WMTSCapabilities_xml_get
"""
wmts_url = f"https://titiler.ops.maap-project.org/mosaics/{mosaicid}/WMTSCapabilities.xml"
params = {
"tile_format":"png",
"tile_scale":"1",
"pixel_selection":"first",
"TileMatrixSetId":"WebMercatorQuad",
"bidx":"6", # Select which band to use
"resampling_method":"nearest",
"rescale":"0,1", # Values in data are from 0 to 1
"return_mask":"true",
"colormap_name":"viridis" # Any colormap from matplotlib will work
}
wmts_call = "?".join([wmts_url, urllib.parse.urlencode(params)])
# Note Jupyter bug add amp; incorrectly when printing the url
wmts_call
[6]:
'https://titiler.ops.maap-project.org/mosaics/e613b570-7318-4d88-92c4-1c5373c6b513/WMTSCapabilities.xml?tile_format=png&tile_scale=1&pixel_selection=first&TileMatrixSetId=WebMercatorQuad&bidx=6&resampling_method=nearest&rescale=0%2C1&return_mask=true&colormap_name=viridis'
[7]:
# This adds a new layer to the map above, call Cloud Optimized GeoTIFF
w.load_layer_config(wmts_call, "wmts/xml")
Access¶
Accessing Data from the MAAP¶
In this example, we demonstrate how to access data from the MAAP using the getLocalPath()
function. At this time, this procedure is the same for user-contributed data added to the store.
We import the os
module, import the MAAP
package, and create a new MAAP class.
[1]:
# import os module
import os
# import the MAAP package
from maap.maap import MAAP
# create MAAP class
maap = MAAP()
For this example, the site_name
additional attribute is used to search for granules that have been tagged as part of the Mondah Forest Gabon research site. For more information about searching for granules in MAAP, please see https://maap-project.readthedocs.io/en/latest/search/granules.html.
[2]:
# assign Mondah Forest Gabon site name
SHORTNAME = "AFLVIS2"
SITENAME = 'Mondah Forest Gabon'
# search for granules with site name
results = maap.searchGranule(short_name=SHORTNAME,site_name=SITENAME)
We assign a variable (in this case, data_file
) to the first result of our search from the cell above.
[3]:
# grab first result
data_file = results[0]
A data directory is then set, and if the directory does not already exist, it is created. The file from our search is then downloaded into the file system in this directory. Here, the function getLocalPath()
is accessing the data stored on the MAAP’s Simple Storage Service (S3) bucket and downloading it directly to the path provided.
[4]:
# set data directory
dataDir = './data'
# check if directory exists -> if directory doesn't exist, directory is created
if not os.path.exists(dataDir):
os.mkdir(dataDir)
# download file from search into data directory
data = data_file.getLocalPath(dataDir)
data
[4]:
'./data/LVIS2_Gabon2016_0308_R1808_038002.TXT'
Now we can see that the data directory has been created and the ‘LVIS2_Gabon2016_0308_R1808_038002.TXT’ file is downloaded into the directory. The downloaded file remains in the data directory until the user deletes it.
Accessing Data from AWS Requester Pays Buckets¶
Some data is cloud available but in requester pays buckets. In this example, we use Rasterio, Boto3, and MAAP’s aws.requester_pays_credentials()
function to retrieve data within the usgs-landsat
requester pays bucket.
[5]:
import boto3
import rasterio as rio
from maap.maap import MAAP
from rasterio.plot import show
from rasterio.session import AWSSession
maap = MAAP(maap_host='api.ops.maap-project.org')
credentials = maap.aws.requester_pays_credentials()
boto3_session = boto3.Session(
aws_access_key_id=credentials['aws_access_key_id'],
aws_secret_access_key=credentials['aws_secret_access_key'],
aws_session_token=credentials['aws_session_token']
)
aws_session = AWSSession(boto3_session, requester_pays=True)
file_s3 = 's3://usgs-landsat/collection02/level-2/standard/oli-tirs/2015/044/025/LC08_L2SP_044025_20150812_20200908_02_T1/LC08_L2SP_044025_20150812_20200908_02_T1_SR_B2.TIF'
with rio.Env(aws_session):
with rio.open(file_s3, 'r') as src:
# list of overviews
oviews = src.overviews(1)
# get second item from list to retrieve a thumbnail
oview = oviews[1]
# read first band of file and set shape of new output array
thumbnail = src.read(1, out_shape=(1, int(src.height // oview), int(src.width // oview)))
# now display the thumbnail
show(thumbnail)

[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ff88c062650>
You may adjust the expiration time of the AWS credentials generated by maap.aws.requester_pays_credentials()
:
[ ]:
# Credential expiration time in seconds (defaults to 12 hours)
maap.aws.requester_pays_credentials(expiration=3600)
Accessing data provided by NASA’s Distributed Active Archive Centers (DAACs)¶
It is possible to download data provided by DAACs, including data which is not cataloged by the MAAP’s CMR, using the NASA MAAP ADE. This data is hosted externally from the MAAP but can be accessed using the NASA MAAP ADE’s authentication systems.
In order to do this, we start by creating a Jupyter workspace within the NASA MAAP ADE. Using the left-hand navigation, select “+ Get Started” and then select the “Jupyter - MAAP Basic Stable” workspace.
Alternatively, you can create a workspace using the “Workspaces” interface. See Create Workspace for more information.
Accessing data from Jupyter Notebooks in your workspace¶
Within your Jupyter Notebook, start by importing the maap
package. Then invoke the MAAP
constructor, setting the maap_host
argument to 'api.ops.maap-project.org'
.
[1]:
# import the maap package
from maap.maap import MAAP
# invoke the MAAP constructor using the maap_host argument
maap = MAAP(maap_host='api.ops.maap-project.org')
Granting Earthdata Login access to your target DAAC application¶
In order to access external DAAC data from the NASA MAAP ADE, MAAP uses your Earthdata Login profile to send a data request to the desired DAAC application.
Some DAAC applications (such as ‘Alaska Satellite Facility Data Access’) must be authorized before you can use them. Login or register at https://urs.earthdata.nasa.gov/ in order to see the applications that you have authorized. From the profile page, click on the ‘Applications’ tab and select ‘Authorized Apps’ from the drop-down menu.
This takes you to the Approved Applications page which lists the applications you have authorized. To add more applications, scroll down to the bottom of the page and click the ‘APPROVE MORE APPLICATIONS’ button which takes you to the Application search page.
Enter the desired application name within the search box and click the ‘SEARCH’ button. After this, a list of search results appears.
Once you find the desired application, click the ‘AUTHORIZE’ button next to the name.
You are then presented with its End User License Agreement. In order to have authorization, you need to select the ‘I agree to the terms of End User License Agreement’ checkbox and then click the ‘AGREE’ button.
After this is done, you are then shown the Approved Applications page again and the desired application should now be listed.
Note that if Earthdata Login access is not granted to your target DAAC application, the following example will result in a 401-permission error.
Accessing Sentinel-1 Granule Data from the Alaska Satellite Facility (ASF)¶
Search for a granule using the searchGranule
function (for more information on searching for granules, see Searching for Granules in MAAP). Then utilize the getData
function, which downloads granule data if it doesn’t already exist locally. We can use getData
to download the first result from our granule search into the file system and assign it to a variable (in this case download
). Note that you will need to
authorize the ‘Alaska Satellite Facility Data Access’ application before downloading any results from our search (see the above section for more information concerning authorizing applications).
[2]:
# search for granule data using the short_name argument
results = maap.searchGranule(short_name='SENTINEL-1A_DP_GRD_HIGH')
# download first result
download = results[0].getData()
Note that we can then use the print
function to see the file name and directory.
[3]:
# print file directory
print(download)
./S1A_S3_GRDH_1SDH_20140615T034444_20140615T034512_001055_00107C_8977.zip
Accessing Harmonized Landsat Sentinel-2 (HLS) Level 3 Granule Data from the Land Processes Distributed Active Archive Center (LP DAAC)¶
We use a similar approach in order to access HLS Level 3 granule data. Note that this data is not cataloged by the MAAP’s CMR but we can use searchGranule
’s cmr_host
argument to specify a CMR instance external to MAAP.
[4]:
# search for granule data using CMR host name and short name arguments
results = maap.searchGranule(
cmr_host='cmr.earthdata.nasa.gov',
short_name='HLSL30')
# download first result
download = results[0].getData()
As in the previous example, we can use the print
function to see the file name and directory.
[5]:
# print file directory
print(download)
./HLS.L30.T59WPT.2013101T001445.v2.0.B09.tif
Accessing Global Ecosystem Dynamics Investigation (GEDI) Level 3 Granule Data¶
In this example, we demonstrate how to access GEDI Level 3 granule data on the MAAP ADE.
Within your Jupyter Notebook, start by importing the maap
package. Then invoke the MAAP
constructor, setting the maap_host
argument to 'api.ops.maap-project.org'
.
[1]:
# import os module
import os
# import the maap package to handle queries
from maap.maap import MAAP
# invoke the MAAP constructor using the maap_host argument
maap = MAAP(maap_host='api.ops.maap-project.org')
Search for a granule using the searchGranule
function (for more information on searching for granules, see Searching for Granules in MAAP). Note that we can use searchGranule
’s cmr_host
argument to specify cmr.maap-project.org
as the CMR instance.
[2]:
# search for granule data using CMR host name and collection concept ID arguments
results = maap.searchGranule(
cmr_host='cmr.maap-project.org',
collection_concept_id='C1201702030-NASA_MAAP'
)
Let’s view the list of GranuleUR
s within our results
:
[3]:
# show all of the items
[item['Granule']['GranuleUR'] for item in results]
[3]:
['GEDI_L3_LandSurface_Metrics_V2.GEDI03_elev_lowestmode_stddev_2019108_2020287_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_counts_2019108_2021104_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_elev_lowestmode_stddev_2019108_2021104_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_elev_lowestmode_mean_2019108_2020287_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_elev_lowestmode_mean_2019108_2021104_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_rh100_stddev_2019108_2021104_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_counts_2019108_2020287_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_rh100_mean_2019108_2021104_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_rh100_mean_2019108_2020287_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_rh100_stddev_2019108_2020287_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_counts_2019108_2021216_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_rh100_mean_2019108_2021216_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_elev_lowestmode_stddev_2019108_2021216_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_elev_lowestmode_mean_2019108_2021216_002_02.tif',
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_rh100_stddev_2019108_2021216_002_02.tif']
For this example, we are interested in downloading GEDI03_elev_lowestmode_stddev_2019108_2021104_002_02.tif
.
[4]:
# select item
results[2]['Granule']['GranuleUR']
[4]:
'GEDI_L3_LandSurface_Metrics_V2.GEDI03_elev_lowestmode_stddev_2019108_2021104_002_02.tif'
Now utilize the getData
function, which downloads granule data if it doesn’t already exist locally. We can use getData
to download the third result from our granule search into the file system and assign its local path to a variable (in this case download
).
[5]:
# download granule item
local_dir = '/projects/local_data' # download directory (absolute path or relative to current directory)
os.makedirs(local_dir, exist_ok=True) # create directories, as necessary
download = results[2].getData(local_dir) # default download directory is current directory, if no directory is given
We can then use the print
function to see the file name and directory.
[6]:
# print path to downloaded file
print(download)
/projects/local_data/GEDI03_elev_lowestmode_stddev_2019108_2021104_002_02.tif
Accessing Global Ecosystem Dynamics Investigation (GEDI) Level 4A Granule Data¶
In this example, we demonstrate how to access GEDI Level 4A granule data on the MAAP ADE.
Within your Jupyter Notebook, start by importing the maap
package. Then invoke the MAAP
constructor, setting the maap_host
argument to 'api.ops.maap-project.org'
.
[1]:
# import os module
import os
# import the maap package to handle queries
from maap.maap import MAAP
# invoke the MAAP constructor using the maap_host argument
maap = MAAP(maap_host='api.ops.maap-project.org')
Search for a granule using the searchGranule
function (for more information on searching for granules, see Searching for Granules in MAAP). Note that we can use searchGranule
’s cmr_host
argument to specify cmr.maap-project.org
as the CMR instance. Then utilize the getData
function, which downloads granule data if it doesn’t already exist locally. We can use getData
to download the first result from our
granule search into the file system and assign its local path to a variable (in this case download
).
[2]:
# search for granule data using CMR host name, collection concept ID, and Granule UR arguments
results = maap.searchGranule(
cmr_host='cmr.maap-project.org',
collection_concept_id='C1202028193-NASA_MAAP',
granule_ur='GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_01_T02638_02_002_02_V002.h5')
# download first result
local_dir = '/projects/local_data' # Download directory (absolute path or relative to current directory)
os.makedirs(local_dir, exist_ok=True) # Create directories, as necessary
download = results[0].getData(local_dir) # Default download directory is current directory, if no directory is given
We can then use the print
function to see the file name and directory.
[3]:
# print path to downloaded file
print(download)
/projects/local_data/GEDI04_A_2019107224731_O01958_01_T02638_02_002_02_V002.h5
Accessing Cloud Optimized Data¶
The following is an example that uses Shuttle Radar Topography Mission (SRTM) Cloud Optimized GeoTIFF data from the MAAP data store, via MAAP CMR API search. In this example, we read in elevation data using a bounding box tile.
First we install any necessary packages. Please note that the following block of code only needs to be run if needed within a new workspace and that you may need to restart the kernel to use updated packages.
[1]:
# only run this block if needed in a new workspace
!pip install -U folium geopandas rasterio>=1.2.3 rio-cogeo
WARNING: You are using pip version 21.0; however, version 21.1.3 is available.
You should consider upgrading via the '/opt/conda/bin/python3.7 -m pip install --upgrade pip' command.
Let’s import the maap
and pprint
packages and invoke the MAAP.
[2]:
# import the maap package to handle queries
from maap.maap import MAAP
# import printing package to help display outputs
from pprint import pprint
# invoke the MAAP
maap = MAAP()
We can use the maap.searchCollection
function to search for the desired collection, in this case the SRTMGL1_COD collection and set the results of the function to a variable. More information about searching for collections may be found here.
[3]:
# search for SRTMGL1_COD collection
collection_info = maap.searchCollection(short_name="SRTMGL1_COD", limit=1000)
Perhaps we are only interested in granules (files) from the collection which are within a certain area. We create a string with bounding box values and use the maap.searchGranule
function with the bounding box values. More information about searching for granules may be found here.
[4]:
# set bounding box
bbox = '-101.5,45.5,-100.5,46.5'
# retreive granules from the collection that are within the bounding box
granule_results = maap.searchGranule(short_name="SRTMGL1_COD", bounding_box=bbox , limit=20)
Let’s check how many granules we are working with.
[5]:
# show number of granules in results
len(granule_results)
[5]:
4
Inspecting the Results¶
Now we can work on inspecting our results. In order to do this, we import the geopandas
,shapely
, and folium
packages.
[6]:
# import geopandas to work with geospatial data
import geopandas as gpd
# import shapely for manipulation and analysis of geometric objects
import shapely as shp
# import folium to visualize data in an interactive leaflet map
import folium
/opt/conda/lib/python3.7/site-packages/geopandas/_compat.py:110: UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.8.1-CAPI-1.13.3). Conversions between both will be slow.
shapely_geos_version, geos_capi_version_string
With the above packages imported, we create a function to make polygons from granule items passed to it.
[7]:
def make_polygons(item):
"""
Returns shapely.geometry.polygon.Polygon
Parameters:
-----------
item : dictionary
A result from granule search results returned by maap.searchGranules()
"""
# get boundary information from granule result
bounds = item['Granule']['Spatial']['HorizontalSpatialDomain']['Geometry']['BoundingRectangle']
# get boundary values from `bounds` and convert to floating point number
item_bbox = [float(value) for value in bounds.values()]
# use boundary values to create a shapely polygon for the function to return
bbox_polygon = shp.geometry.box(
item_bbox[0],
item_bbox[1],
item_bbox[2],
item_bbox[3]
)
return bbox_polygon
With our make_polygons
function, we can use the gpd.GeoSeries
function to create a GeoSeries of all the polygons created from our granule results. According to the GeoPandas documentation, a GeoSeries is a “Series [a type of one-dimensional array] object designed to store shapely geometry objects”. We use ‘EPSG:4326’ (WGS 84) for the coordinate reference system. Then we can check the GeoSeries.
[8]:
# create GeoSeries of all polygons from granule results with WGS 84 coordinate reference system
geometries = gpd.GeoSeries([make_polygons(item) for item in granule_results], crs='EPSG:4326')
# check GeoSeries
geometries
[8]:
0 POLYGON ((-100.99972 46.00028, -100.99972 44.9...
1 POLYGON ((-100.99972 47.00028, -100.99972 45.9...
2 POLYGON ((-99.99972 46.00028, -99.99972 44.999...
3 POLYGON ((-99.99972 47.00028, -99.99972 45.999...
dtype: geometry
Now we create a list from our bounding box values. Then we use the centroid
function to get the centroid of our bounding box and set to a point. Next we use folium.Map
to create a map. For the map’s parameters, let’s set the centroid point coordinates as the location, “cartodbpositron” for the map tileset, and 7 as the starting zoom level. With our map created, we can create a dictionary containing style information for our bounding box. Then we can use folium.GeoJson
to create
GeoJsons from our geometries
GeoSeries and add them to the map. We also use the folium.GeoJson
function to create a GeoJson of a polygon created from our bounding box list, name it, and add our style information. Finally, we check the map by displaying it.
[9]:
# create list of bounding box values
bbox_list = [float(value) for value in bbox.split(',')]
# get centroid point from bounding box values
center = shp.geometry.box(*bbox_list).centroid
# create map with folium with arguments for lat/lon of map, map tileset, and starting zoom level
m = folium.Map(
location=[center.y,center.x],
tiles="cartodbpositron",
zoom_start=7,
)
# create style information for bounding box
bbox_style = {'fillColor': '#ff0000', 'color': '#ff0000'}
# create GeoJson of `geometries` and add to map
folium.GeoJson(geometries, name="tiles").add_to(m)
# create GeoJson of `bbox_list` polygon and add to map with specified style
folium.GeoJson(shp.geometry.box(*bbox_list),
name="bbox",
style_function=lambda x:bbox_style).add_to(m)
# display map
m
[9]:
Creating a Mosaic¶
Let’s check the raster information contained in our granule results. In order to do this, we import some more packages. To read and write files in raster format, import rasterio
. From rasterio
we import merge
to copy valid pixels from an input into an output file, AWSSession
to set up an Amazon Web Services (AWS) session, and show
to display images and label axes. Also import boto3
in order to work with AWS. From matplotlib
, we want to import imshow
which allows
us to display images from data. Import numpy
to work with multi-dimensional arrays and numpy.ma
to work with masked arrays. From pyproj
, import Proj
for converting between geographic and projected coordinate reference systems and Transformer
to make transformations.
[10]:
# import rasterio for reading and writing in raster format
import rasterio as rio
# copy valid pixels from input files to an output file.
from rasterio.merge import merge
# set up AWS session
from rasterio.session import AWSSession
# display images, label axes
from rasterio.plot import show
# import boto3 to work with Amazon Web Services
import boto3
# display images from data
from matplotlib.pyplot import imshow
# import numpy to work with multi-dimensional arrays
import numpy as np
# import numpy.ma to work with masked arrays
import numpy.ma as ma
# convert between geographic and projected coordinates and make transformations
from pyproj import Proj, Transformer
Finally, we import os
and run some code in order to speed up Geospatial Data Abstraction Library (GDAL) reads from Amazon Simple Storage Service (S3) buckets by skipping sidecar (connected) files.
[11]:
# speed up GDAL reads from S3 buckets by skipping sidecar files
import os
os.environ['GDAL_DISABLE_READDIR_ON_OPEN'] = 'EMPTY_DIR'
Now that we have the necessary packages, let’s get a list of S3 urls to the granules. To start, set up an AWS session. The S3 urls are contained within the granule_results
list, so we loop through the granules in our results to get the S3 urls and add them to a new list. We can then use the sort
function to sort the S3 urls in an acending order. Then we can check our S3 url list.
[12]:
# set up AWS session
aws_session = AWSSession(boto3.Session())
# get the S3 urls to the granules
file_S3 = [item['Granule']['OnlineAccessURLs']['OnlineAccessURL']['URL'] for item in granule_results]
# sort list in ascending order
file_S3.sort()
# check list
file_S3
[12]:
['s3://nasa-maap-data-store/file-staging/nasa-map/SRTMGL1_COD___001/N45W101.SRTMGL1.tif',
's3://nasa-maap-data-store/file-staging/nasa-map/SRTMGL1_COD___001/N45W102.SRTMGL1.tif',
's3://nasa-maap-data-store/file-staging/nasa-map/SRTMGL1_COD___001/N46W101.SRTMGL1.tif',
's3://nasa-maap-data-store/file-staging/nasa-map/SRTMGL1_COD___001/N46W102.SRTMGL1.tif']
Looks good. We can now check to see that we can read the AWS files and display a thumbnail. Pass the boto3 session to rio.Env
, which is the GDAL/AWS environment for rasterio
. Use the rio.open
function to read in one of the Cloud Optimized GeoTIFFs. Now let’s use the overviews
command to get a list of overviews. Overviews are versions of the data with lower resolution, and can thus increase performance in applications. Let’s get the second overview from our list for retrieving a
thumbnail. Retrieve a thumbnail by reading the first band of our file and setting the shape of the new output array. The shape can be set with a tuple of integers containing the number of datasets as well as the height
and width
of the file divided by our integer from the overview list. Now use the show
function to display the thumbnail.
[13]:
# prove that we can read the AWS files
# for more information - https://automating-gis-processes.github.io/CSC/notebooks/L5/read-cogs.html
with rio.Env(aws_session):
with rio.open(file_S3[1], 'r') as src:
# list of overviews
oviews = src.overviews(1)
# get second item from list to retrieve a thumbnail
oview = oviews[1]
# read first band of file and set shape of new output array
thumbnail = src.read(1, out_shape=(1, int(src.height // oview), int(src.width // oview)))
# now display the thumbnail
show(thumbnail)

[13]:
<AxesSubplot:>
Since we verified that we can read the AWS files and display a thumbnail, we can create a mosaic from all of the rasters in our file_S3
list. To do this, again pass the boto3 session to rio.Env
. Then create a list which contains all of the read in Cloud Optimized GeoTIFFs (this may take a while).
[14]:
# create a mosaic from all the images
with rio.Env(aws_session):
sources = [rio.open(raster) for raster in file_S3]
Now we can use the merge
function to merge these source files together using our list of bounding box values as the bounds. merge
copies the valid pixels from input rasters and outputs them to a new raster.
[15]:
# merge the source files
mosaic, out_trans = merge(sources, bounds = bbox_list)
Lastly, we use ma.masked_values
for masking all of the NoData values, allowing the mosaic to be plotted correctly. ma.masked_values
returns a MaskedArray in which a data array is masked by an approximate value. For the parameters, let’s use our mosaic as the data array and the integer of the “nodata” value as the value to mask by. Now we can use show
to display our masked raster using matplotlib
with a “terrain” colormap.
[16]:
# mask the NoData values so it can be plotted correctly
masked_mosaic = ma.masked_values(mosaic, int(sources[0].nodatavals[0]))
# display the masked mosaic
show(masked_mosaic, cmap = 'terrain')

[16]:
<AxesSubplot:>
Accessing EDAV Data via Web Coverage Service¶
This example demonstrates how to retrieve raster data from EDAV using a Web Coverage Service (WCS). A WCS lets you access coverage data with multiple dimensions online. The downloaded data is subsetted from data hosted on the MAAP and is available with the full resolution and values.
Setting up the Environment¶
We start by installing the libraries that are used to query the WCS connection point, and then to load, explore, and plot the raster data. We use rasterio
for reading and writing raster formats, rio-cogeo
for creating and validating Cloud Optimized GEOTIFF (COG) data, and owslib
for interacting with Open Geospatial Consortium (OGC) services.
[1]:
# install libraries
# %pip is a magic command that installs into the current kernel
# -q means quiet (to give less output)
%pip install -q rasterio
%pip install -q rio-cogeo
%pip install -q owslib
After installing the libraries, note that you may see multiple messages that you need to restart the kernel. Then import rasterio
, show
from rasterio.plot
to display images with labeled axes, and WebCoverageService
from owslib.wcs
to program with an OGC web service.
[2]:
# import rasterio
import rasterio as rio
# import show
from rasterio.plot import show
# import WebCoverageService
from owslib.wcs import WebCoverageService
Querying the WCS¶
Now we can configure the WCS source, use the getCoverage
function to request a file in GeoTIFF format, and save what is returned to our workspace.
[3]:
# configure the WCS source
EDAV_WCS_Base = "https://edav-wcs.adamplatform.eu/wcs"
wcs = WebCoverageService(f'{EDAV_WCS_Base}?service=WCS', version='2.0.0')
[4]:
# request imagery to download
response = wcs.getCoverage(
identifier=['test_afrisar_onera_ClopeTB10_biomass_COG'], # coverage ID
format='image/tiff', # format what the coverage response will be returned as
filter='false', # define constraints on query
scale=1, # resampling factor (1 full resolution, 0.1 resolution degraded of a factor of 10)
subsets=[('Long',11.54,11.8),('Lat',-0.3,0.0)] # subset the image by lat / lon
)
# save the results to file as a tiff
results = "EDAV_example.tif"
with open(results, 'wb') as file:
file.write(response.read())
We can use gdalinfo
to provide information about our raster dataset to make sure the data is valid and contains spatial metadata.
[5]:
# gives information about the dataset
!gdalinfo {results}
Driver: GTiff/GeoTIFF
Files: EDAV_example.tif
Size is 2836, 3403
Coordinate System is:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (11.539968877500000,-0.115643000000000)
Pixel Size = (0.000035932500000,-0.000035932500000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 11.5399689, -0.1156430) ( 11d32'23.89"E, 0d 6'56.31"S)
Lower Left ( 11.5399689, -0.2379213) ( 11d32'23.89"E, 0d14'16.52"S)
Upper Right ( 11.6418734, -0.1156430) ( 11d38'30.74"E, 0d 6'56.31"S)
Lower Right ( 11.6418734, -0.2379213) ( 11d38'30.74"E, 0d14'16.52"S)
Center ( 11.5909212, -0.1767821) ( 11d35'27.32"E, 0d10'36.42"S)
Band 1 Block=2836x1 Type=Float32, ColorInterp=Gray
NoData Value=0
Reading the Data¶
We can now use rio.open
with our results
path string and return an opened dataset object. We can set a variable (rast
) to what is read from this dataset object. Then, we utilize the function show
to display the raster using Matplotlib.
[6]:
# take path and return opened dataset object, set variable to read dataset object
with rio.open(results, 'r') as src:
rast = src.read()
[7]:
# make a plot
show(rast, transform=src.transform, cmap='pink')

[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f27bc568c90>
We now have a visual of our raster. Let’s import and employ the show_hist
function from rasterio.plot
to generate a histogram of the raster.
[8]:
# import show_hist
from rasterio.plot import show_hist
# create histogram
show_hist(rast,
bins=50,# number of bins to compute histogram across
alpha=.3,# transparancy
title="Histogram"# figure title
)

We can also generate a plot using Matplotlib. Let’s import matplotlib.pyplot
and numpy
and make a new plot. To do this, use the plt.subplots
function to return a figure and a single “Axes” instance. Then remove single-dimensional entries from the shape of our array using np.squeeze
and display the data as an image using imshow
. Now, we can set the norm limits for image scaling using the set_clim
function.
[9]:
# import matplotlib.pyplot
import matplotlib.pyplot as plt
# import numpy
import numpy as np
[10]:
# set figure and single "Axes" instance
fig, ax = plt.subplots(1, figsize=(8,8))
# remove single-dimensional entries from the shape of the variable rast
# and display the image
edavplot = ax.imshow(np.squeeze(rast), cmap='pink')

Newer Method rioxarray
¶
Another way to work with raster data is with the rasterio
“xarray” extension. Let’s install and import rioxarray
and create a plot using the open_rasterio
and plot
functions.
[11]:
# install rasterio xarray extension
%pip install -q rioxarray
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 22.0.3; however, version 23.0.1 is available.
You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.
[12]:
# import rasterio xarray extension
import rioxarray
[13]:
# opens results with rasterio to set dataarray
edav_x = rioxarray.open_rasterio(results)
[14]:
# plot dataarray
edav_x.plot(cmap="gist_earth", figsize=(10,8))
[14]:
<matplotlib.collections.QuadMesh at 0x7f275e7e5310>

References¶
- WCS Adapted from: Jan Verbesselt, Jorge Mendes de Jesus, Aldo Bergsma, Dainius Masiliūnas, David Swinkels, Corné Vreugdenhil. Handling Raster data with Python - 2020-01-20 https://geoscripting-wur.github.io/PythonRaster/
- OWSLib https://github.com/geopython/OWSLib
- rioxarray https://corteva.github.io/rioxarray
Query¶
How to Query Data from the MAAP via Python Client¶
Supported collections can be subsetted through the MAAP Query Service. At the time of writing (03/22/2021), the GEDI Calibration/Validation Field Survey Dataset is the only valid dataset for this service. However, more data will be made available for querying as the MAAP team continues to develop expanded services for the platform. Users can interact with the service through the MAAP Python client.
First, we import the json
module, import the MAAP
package, and create a new MAAP class.
[18]:
# import the json module
import json
# import the MAAP package
from maap.maap import MAAP
# create MAAP class
maap = MAAP()
How to Use maap.executequery()
¶
We use the executeQuery()
function to return a response object, containing the server’s response to our HTTP request. This object can be used to view the response headers, access the raw data of the response, or parse the response as a JavaScript Object Notation (JSON). JSON is a data-interchange format, designed to be easy for humans to read and write.
executeQuery
Parameters¶
src
- a dictionary-like object specifying the dataset to query. Currently, the only option is as follows:{ "Collection": { "ShortName": "GEDI Cal/Val Field Data_1", "VersionId": "2" } }
query
- a dictionary-like object specifying the parameters for query. A dictionary can includebbox
,where
,fields
, andtable
:bbox
- a list of floating point numbers identifying a bounding box of geographic coordinates.where
- a dictionary-like object which maps fields to required values within a query.fields
- a list of fields to return, a subset of all fields available for the corresponding dataset.table
- the name of the table to query. At this time, the only valid options are “tree” or “plot” corresponding to tables in the GEDI Cal/Val database.
poll_results
- a parameter which must beTrue
to use thetimeout
parameter.timeout
- the waiting period for a response. This indicates the maximum number of seconds to wait for a response. Note that timeout has a default value of ‘180’ and requires that thepoll_results
parameter beTrue
. Depending on the request, it may be necessary to modify the timeout to make sure the server has enough time to process the request.wait_interval
- number of seconds to wait between each poll for results.wait_interval
is only used ifpoll_results
=True
(default 0.5). -max_redirects
- the maximum number of redirects to follow when scheduling an execution (default 5).
Query Searching for a Project Name¶
In this example, we create a dictionary containing a Collection
key, which contains entries for ShortName
and VersionId
. This is used later in the executeQuery()
function. For this example, we use the GEDI Calibration/Validation Field Survey Dataset collection. More information about dictionaries in Python can be found here.
[19]:
# create dictionary with a "Collection" key containing short name and version ID entries:
collection = {
"Collection": {
"ShortName": "GEDI Cal/Val Field Data_1",
"VersionId": "2"
}
}
We also create a function (in this example named fetch_results
) which utilizes the executeQuery
function to return results of queries. Within this function, we use the executeQuery
function to get a response object. Within the executeQuery
function, our collection
dictionary is assigned to src
, a dictionary-like object specifying the dataset to query. We also have the query used in the argument assigned to query
, a dictionary-like object which specifies the parameters
for the query. We set timeout
to the timeout used in the argument (default is 180 seconds) and set poll_results
to True
in order to set the maximum waiting period for a response. We can check the ‘Content-Type’ header of our response to see the content type of the response. In the following code, the ‘Content-Type’ header is checked to determine if it is JSON or not, in order to set an appropriate variable to return.
[20]:
def fetch_results(query={}, timeout=180):
"""
Function which utilizes the `executeQuery` function to return the results of queries.
"""
# use the executeQuery() function to get a response object
response = maap.executeQuery(
# dictionary-like object specifying the dataset to query
src = collection,
# dictionary-like object specifying the parameters for query
query = query,
# must be True to use the timeout parameter
poll_results = True,
# max waiting period for a response in seconds
timeout = timeout
)
# if the 'Content-Type' is json, creates variable with json version of the response
if (response.headers.get("Content-Type") == "application/json"):
data = response.json()
# if the 'Content-Type' is not json, creates variable with unicode content of the response
else:
data = response.text
# returns `data` as json string
return json.loads(data)
Now that we have our collection dictionary and our function to return the results of queries, let’s use a print statement to display the first project name from a query which utilizes the bbox
and fields
parameters within query
. The bbox
parameter is a GeoJSON-compliant bounding box ([minX, minY, maxX, maxY]) which is used to filter data spatially. GeoJSON is a format for encoding geographic data structures. More information about the bounding box can be found in the standard
specification of the GeoJSON format, located here - https://tools.ietf.org/html/rfc7946#section-5. The fields
parameter is a list of fields to return in the query response. In this case, we assign ‘project’ to fields
.
[21]:
# prints the first project name in the results of the query as a json string
print(json.dumps(fetch_results({"bbox": [9.31, 0.53, 9.32, 0.54], "fields": ["project"]})[0], indent=2))
{
"project": "gabon_mondah"
}
Inspecting a Single Observation¶
In the previous example, we displayed the project name for a result from our query, but let’s say we wished to see all of the fields and their associated values for a result. In this example, we make another query, this time only specifying the bounding box. The print statement displays the variables for a single observation. A list of the variables and their units and descriptions can be found here.
[22]:
# prints the fields with values for the first result in the results of the query as a json string
print(json.dumps(fetch_results({"bbox": [9.315, 0.535, 9.32, 0.54]})[0], indent=2))
{
"project": "gabon_mondah",
"plot": "NASA11",
"subplot": "1",
"survey": "AfriSAR_ESA_2016",
"private": 0,
"date": "2016-02-01",
"region": "Af",
"vegetation": "TropRF",
"map": 3083.93471636915,
"mat": 25.6671529098763,
"pft.modis": "Evergreen Broadleaf trees",
"pft.name": null,
"latitude": 0.538705025207016,
"longitude": 9.31982893597376,
"p.sample": 0,
"p.stemmap": 0,
"p.origin": "C",
"p.orientation": -2.18195751718555,
"p.shape": "R",
"p.majoraxis": 100,
"p.minoraxis": 100,
"p.geom": "POLYGON ((535537.75 59601.25, 535627.75 59590.5, 535642.25 59489.25, 535543 59498.5, 535537.75 59601.25, 535537.75 59601.25))",
"p.epsg": 32632,
"p.area": 10000,
"p.mindiam": 0.01,
"sp.geom": "POLYGON((535537.510093 59494.109258, 535537.526710 59519.109253, 535562.526704 59519.092636, 535562.510088 59494.092642, 535537.510093 59494.109258))",
"sp.ix": 1,
"sp.iy": 4,
"sp.shape": "R",
"sp.area": 625,
"sp.mindiam": 0.01,
"pai": null,
"lai": null,
"cover": null,
"dft": "EA",
"agb": null,
"agb.valid": null,
"agb.lower": null,
"agb.upper": null,
"agbd.ha": null,
"agbd.ha.lower": null,
"agbd.ha.upper": null,
"sn": null,
"snd.ha": null,
"sba": null,
"sba.ha": null,
"swsg.ba": null,
"h.t.max": null,
"sp.agb": null,
"sp.agb.valid": null,
"sp.agbd.ha": null,
"sp.agbd.ha.lower": null,
"sp.agbd.ha.upper": null,
"sp.sba.ha": null,
"sp.swsg.ba": null,
"sp.h.t.max": null,
"l.project": "jpl_mondah",
"l.instr": null,
"l.epsg": 32632,
"l.date": null,
"tree.date": "2016-02-10",
"family": "Myristicaceae",
"species": "Coelocaryon sp.",
"pft": null,
"wsg": 0.49353216374269,
"wsg.sd": 0.0941339098036177,
"tree": "5501",
"stem": "1",
"x": 535539.544644182,
"y": 59496.0746845487,
"z": null,
"status": 1,
"allom.key": 2,
"a.stem": 0.0437435361085843,
"h.t": 9.46667,
"h.t.mod": 17.0934257579035,
"d.stem": 0.236,
"d.stem.valid": 1,
"d.ht": 1.3,
"c.w": null,
"m.agb": 145.005237378992,
"id": 2077,
"geom_obj": "0103000020E6100000010000000500000032A8C5A385A32240A1FCE9F75E39E13F32892DA985A32240A012DE4C393BE13FF2CC041CA3A32240D4DDCAF5383BE13F85ED9C16A3A3224083B5D8A05E39E13F32A8C5A385A32240A1FCE9F75E39E13F",
"wwf.ecoregion": "Central African mangroves"
}
Query Using Multiple Parameters With where
¶
In the output of the previous example, we can see the field "species"
. Let’s say we are interested in finding observations for the "gabon_mondah"
project within the same bounding box as the previous example which have the same species. We can do this using where
, a dictionary-like object which maps fields to required values within a query. To help demonstrate how to use where
, we can create a new function (in this example named species_query
) to print the number of results as
well as the first result, adding on to our previous fetch_results
function which utilized the executeQuery
function.
[23]:
def species_query(query = {}, timeout = 180):
"""
Function which utilizes the `fetch_results` (and thereby `executeQuery`) function and prints the number of results
as well as the first result.
"""
# set `data` to results of query
data = fetch_results(query = query, timeout = timeout)
# get the number of results within `data`
num_results = len(data)
# if `data` is not null and contains at least one result, the number of results and the first result are printed
if((data is not None) and (num_results > 0)):
first_result = data[0]
print(f"Number of results: {num_results}")
print(f"First result: {json.dumps(first_result, indent=2)}")
# else prints "No result"
else:
print(num_results)
print("No result")
Let’s call the species_query
function. We enter the same bounding box values as in the previous example. This time around, we enter where
in our query and set the project
as gabon_mondah
and the species
as Coelocaryon sp.
. We can set a list of fields to return in query response using fields
. For this example, we can choose to return only the project, family, species, latitude, and longitude values. After completing our query, we can manually set the timeout value (in
this example ‘200’).
[24]:
# call `species_query` function with bounding box values, where the project is "gabon_mondah",
# the species is "Aucoumea klaineana", fields include "project", "family", "species", "latitude", and "longitude",
# and the timeout value is 200 (use the scrollbar to see the entire function call)
species_query({"bbox": [9.315, 0.535, 9.32, 0.54], "where": {"project": "gabon_mondah", "species": "Coelocaryon sp."}, "fields": ["project", "family", "species", "latitude", "longitude"]}, 200)
Number of results: 10
First result: {
"project": "gabon_mondah",
"family": "Myristicaceae",
"species": "Coelocaryon sp.",
"latitude": 0.538705025207016,
"longitude": 9.31982893597376
}
We now see that there are many results and the latitude and longitude coordinates for the first result. To see this information for Aucoumea klaineana, we can copy the code from the above cell, changing the species
to Aucoumea klaineana
within the function argument.
[25]:
# call `species_query` function with bounding box values, where the project is "gabon_mondah",
# the species is "Coelocaryon sp.", fields include "project", "family", "species", "latitude", and "longitude",
# and the timeout value is 200 (use the scrollbar to see the entire function call)
species_query({"bbox": [9.315, 0.535, 9.32, 0.54], "where": {"project": "gabon_mondah", "species": "Aucoumea klaineana"}, "fields": ["project", "family", "species", "latitude", "longitude"]}, 200)
Number of results: 49
First result: {
"project": "gabon_mondah",
"family": "Burseraceae",
"species": "Aucoumea klaineana",
"latitude": 0.538705025207016,
"longitude": 9.31982893597376
}
Using executeQuery
to Vizualize Plots and Trees¶
Now that we know how to print the query results of the collection, let’s look at examples of vizualizing the information from query results. The first time you run this, you will need to install folium
. folium
allows us to visualize data on an interactive map. With folium
installed, you will then import the folium
library. Also import matplotlib.pyplot
, which allows us to generate interactive plots
[26]:
# installs folium
!pip install folium
Requirement already satisfied: folium in /Users/philvarner/code/devseed/maap-documentation/.venv/lib/python3.7/site-packages (0.12.1.post1)
Requirement already satisfied: numpy in /Users/philvarner/code/devseed/maap-documentation/.venv/lib/python3.7/site-packages (from folium) (1.21.5)
Requirement already satisfied: jinja2>=2.9 in /Users/philvarner/code/devseed/maap-documentation/.venv/lib/python3.7/site-packages (from folium) (3.0.3)
Requirement already satisfied: requests in /Users/philvarner/code/devseed/maap-documentation/.venv/lib/python3.7/site-packages (from folium) (2.27.1)
Requirement already satisfied: branca>=0.3.0 in /Users/philvarner/code/devseed/maap-documentation/.venv/lib/python3.7/site-packages (from folium) (0.4.2)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/philvarner/code/devseed/maap-documentation/.venv/lib/python3.7/site-packages (from jinja2>=2.9->folium) (2.0.1)
Requirement already satisfied: certifi>=2017.4.17 in /Users/philvarner/code/devseed/maap-documentation/.venv/lib/python3.7/site-packages (from requests->folium) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in /Users/philvarner/code/devseed/maap-documentation/.venv/lib/python3.7/site-packages (from requests->folium) (3.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/philvarner/code/devseed/maap-documentation/.venv/lib/python3.7/site-packages (from requests->folium) (1.26.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /Users/philvarner/code/devseed/maap-documentation/.venv/lib/python3.7/site-packages (from requests->folium) (2.0.11)
[27]:
# import folium library and matplotlib.pyplot
import folium
import matplotlib.pyplot as plt
Plot Project Plots¶
Now we can begin plotting all the plots for a given project. For this example, we’ll check out the australia_ausplotsforests
project. We can query the plot table by using our fetch_results
helper function. This time, we set the table
parameter. table
is the name of the table to query. The GEDI Cal/Val database has tables for “tree” and “plot” so let’s set table
to “plot”. We can then preview our results.
[28]:
# set project and query for all the plots in that project
project = 'australia_ausplotsforests'
results = fetch_results({
"bbox": [-180, 90, 180, -90],
"where": {
"project": project
},
"fields": ["latitude", "longitude", "plot"],
"table":"plot"
}, 1000)
# print first 10 results
results[0:10]
[28]:
[{'latitude': -31.2421, 'longitude': 152.4609, 'plot': 'NSFNNC002'},
{'latitude': -31.2421, 'longitude': 152.4609, 'plot': 'NSFNNC002'},
{'latitude': -31.2421, 'longitude': 152.4609, 'plot': 'NSFNNC002'},
{'latitude': -31.2421, 'longitude': 152.4609, 'plot': 'NSFNNC002'},
{'latitude': -31.2421, 'longitude': 152.4609, 'plot': 'NSFNNC002'},
{'latitude': -31.2421, 'longitude': 152.4609, 'plot': 'NSFNNC002'},
{'latitude': -31.2421, 'longitude': 152.4609, 'plot': 'NSFNNC002'},
{'latitude': -31.2421, 'longitude': 152.4609, 'plot': 'NSFNNC002'},
{'latitude': -31.2421, 'longitude': 152.4609, 'plot': 'NSFNNC002'},
{'latitude': -31.2421, 'longitude': 152.4609, 'plot': 'NSFNNC002'}]
Create Project Plots Dictionary¶
Now we can create a dictionary for the project plots from our results.
[29]:
# create dictionary for project plots, where each plot has lat/lon info
project_plots = {}
keys = [ 'latitude', 'longitude' ]
for result in results:
project_plots[result['plot']] = { key: result[key] for key in keys }
To center our map more easily and plot trees later on, we can set a variable to the first project plot to be used for centering the map.
[30]:
# Select the first plot, just to center the map easily
first_plot = list(project_plots.keys())[0]
first_plot
[30]:
'NSFNNC002'
Map the Plots¶
Now we can create a map and place the plot points onto it. First we will set variables for the map’s location center and starting zoom level. Then we will use the folium.Map
function to create a map which displays “Stamen Terrain” tiles with our location center and starting zoom level. Next, we will create a loop using the folium.Marker
function to add markers to the map at the latitude and longitude for each plot and have popup text displaying each plot’s name. Lastly, we display the
map in order to interact with it.
[31]:
# set location center for map
center = [ project_plots[first_plot]['latitude'], project_plots[first_plot]['longitude'] ]
# set zoom level, note that depending on the project,
# you may wish to increase the zoom level
zoom = 3
# create a map with Folium and Leaflet.js
m = folium.Map(location=center, tiles="Stamen Terrain", zoom_start = zoom)
# Add markers to map for each plot in `project_plots`
for plot in project_plots.items():
folium.Marker(
[plot[1]['latitude'], plot[1]['longitude']],
popup = f"plot: {plot[0]}"
).add_to(m)
# display map
m
[31]:
The map has zoom in and zoom out buttons, can be dragged with a mouse, and displays markers which can be clicked on to display their popup text.
Plot Trees¶
Now let’s look at plotting trees for the first plot of the designated project. We construct another query with our fetch_results
helper function. This time, we query for fields containing plant height, UTM coordinates, and elevation as well as set the table
parameter to “tree”.
[32]:
# query for trees for the first plot of the project
results = fetch_results({
"bbox": [-180, 90, 180, -90],
"where": {
"project": project,
"plot": first_plot
},
# h.t -> total height of plant from ground to highest leaf
# x -> easting UTM coordinate
# y -> northing UTM coordinate
# z -> elevation relative to geoid coordinate
"fields": [ "h.t", "x", "y", "z"],
"table": "tree"
}, 1000)
Determine the Number and Height of Trees¶
To see the number of trees in the list of results, we use the len
function within a formatted print statement. Since not all of the results contain heights, lets check how many trees with height data exist by creating a new variable set to the results with heights and use the len
function within a formatted print statement again. Also, we can set variables to contain the heights, x and y values contained within our results to be used when plotting the trees.
[33]:
# print number of trees in results
print(f"Number of trees: {len(results)}")
# print number of trees with heights
heights = [r["h.t"] for r in results if r["h.t"] is not None]
print(f"Number of trees with heights: {len(heights)}")
# set variables to the heights, x and y values contained in the results
hts = [r['h.t'] for r in results]
xs = [r['x'] for r in results]
ys = [r['y'] for r in results]
Number of trees: 520
Number of trees with heights: 117
Plot the Trees¶
With our variables for tree heights, x and y coordinate values, let’s use the plt.scatter
function to create a scatter plot with a title and labels for the UTM coordinates of the trees. Lastly, let’s also create a histogram using plt.hist
to visualize the distribution of tree heights. Here we also include some code to adjust the display colors and add labels and a title.
[34]:
# create scatter plot with x and y values
plt.scatter(xs, ys, c=hts, cmap="Greens")
# add labels and title
plt.xlabel('Easting UTM Coordinate')
plt.ylabel('Northing UTM Coordinate')
plt.title(f"UTM Coordinates for Plot {first_plot} in Project {project}\n", fontsize=22)
plt.show()
# adjust display colors
cm = plt.cm.Greens
nbins = 8
# create histogram for tree heights
n, bins, patches = plt.hist(x=heights, bins=nbins, rwidth=0.85)
for i, p in enumerate(patches):
plt.setp(p, 'facecolor', cm(i/nbins))
# add labels and title
plt.xlabel('Height')
plt.ylabel('Frequency')
plt.title(f"Tree Heights for Plot {first_plot} in Project {project}\n", fontsize=22)
plt.show()


Fields Within the Gedi Cal/Val Data¶
Last updated: 2020-06-30
The Global Ecosystem Dynamics Investigation (GEDI) Forest Structure and Biomass Database (FSBD) is a collection of field and LiDAR datasets developed to serve a central repository for calibration and validation of the GEDI mission data. The GEDI Cal/Val Field Data collection contains the field data contributions to the FSBD. The database has contributions across the globe for a variety of vegetation types from projects dating back to 2003.
The table below describes the variables found in the files:
variable | units | description |
---|---|---|
plot | NA | name of plot |
subplot | NA | subplot identifier |
survey | NA | name of survey event |
private | NA | data privacy (1=private, 0=public) |
date | NA | date of survey event |
region | NA | region of plot location: Eu=Europe; As=Asia; Au=Australia/New Zealand; Af=Africa; SEAsia=South East Asia; SA=South/Central America; US=United States) |
vegetation | NA | vegetation type where sampled: Sav = Savannah; TropRF = Tropical rainforest; TempRF = Temperate rainforest; TropSF = Tropical seasonal forest; TempF = Temperate forest; BorF = Boreal forest; Wo = Woodland; Gr = Grassland; Sh = Shrubland; De = Desert |
map | mm | mean annual rainfall |
mat | deg | mean annual temperature |
pft.modis | NA | plant functional type from the nearest MCD12Q1 product date |
pft.name | NA | plant functional type from field descriptions or other ancillary data |
wwf.ecoregion | NA | Terrestrial ecoregion from the WWF Olson et al. (2004) terrestrial ecoregions of the globe |
latitude | deg | latitude of location where sampled (-90 to 90 deg South to North) |
longitude | deg | longitude of location where sampled (-180 to 180 deg West to East) |
p.sample | NA | subplot type: 0=independent fixed area plots; 1=variable area plots; 2=multiple fixed area plots for aggregation; 3=variable radius sampling |
p.stemmap | NA | stem mapped plot (1=TRUE,0=FALSE) |
p.origin | NA | origin of the plot, SW has to be interpreted as the lower left corner relative to the orientation |
p.orientation | deg | orientation of plot degrees clockwise from UTM map grid north |
p.shape | NA | plot shape (E=ellipse, R=rectangle) |
p.majoraxis | m | major axis of plot |
p.minoraxis | m | minor axis of plot |
p.geom | m | plot geometry WKT string |
p.epsg | NA | epsg code of the UTM zone used for the tree coordinates |
p.area | m2 | plot polygon area |
p.mindiam | m | minimum tree diameter measured across all subplots |
sp.geom | m | subplot geometry WKT string |
sp.ix | NA | subplot track index |
sp.iy | NA | subplot pulse index |
sp.shape | NA | subplot shape (E=ellipse, R=rectangle) |
sp.area | m2 | subplot area |
sp.mindiam | m | minimum tree diameter measured for the subplot |
pai | m2/m2 | plant area index of vegetation |
lai | m2/m2 | leaf area index of vegetation |
cover | NA | vertically projected canopy cover (1-Pgap) |
dft | NA | dominant plant functional type: EA = evergreen angiosperm; DA = deciduous angiosperm; EG = evergreen gymnosperm; DG = deciduous gymnosperm |
agb | kg | mass of above-ground standing trees and shrubs |
agb.valid | NA | valid above ground mass prediction (1=TRUE,0=FALSE) |
agb.lower | kg | lower limit of standard error of mass of above-ground standing trees and shrubs |
agb.upper | kg | upper limit of standard error of mass of above-ground standing trees and shrubs |
agbd.ha | Mg/ha | mass density of above-ground standing trees and shrubs |
agbd.ha.lower | Mg/ha | lower limit of standard error of mass density of above-ground standing trees and shrubs |
agbd.ha.upper | Mg/ha | upper limit of standard error of mass density of above-ground standing trees and shrubs |
sn | n | plot stem number |
snd.ha | n/ha | plot stem number density |
sba | m2 | plot stand basal area |
sba.ha | m2/ha | plot stand basal area per hectare |
swsg.ba | g/cm3/m2 | basal-area-weighted wood specific gravity |
h.t.max | m | maximum total height of plants from ground to highest leaf |
sp.agb | kg | subplot mass of above-ground standing trees and shrubs |
sp.agb.valid | NA | valid subplot above ground mass prediction (1=TRUE,0=FALSE) |
sp.agbd.ha | Mg/ha | subplot mass density of above-ground standing trees and shrubs |
sp.agbd.ha.lower | Mg/ha | subplot lower limit of standard error of mass density of above-ground standing trees and shrubs |
sp.agbd.ha.upper | Mg/ha | subplot upper limit of standard error of mass density of above-ground standing trees and shrubs |
sp.sba.ha | m2/ha | subplot stand basal area per hectare |
sp.swsg.ba | g/cm3/m2 | subplot basal-area-weighted wood specific gravity |
sp.h.t.max | m | subplot maximum total height of plants from ground to highest leaf |
l.project | NA | name of the lidar project dataset on UMD servers |
l.instr | NA | lidar instrument manufacturer and model |
l.epsg | NA | epsg code of the corresponding lidar data |
l.date | NA | date of the corresponding lidar acquisition |
g.fp | NA | geometry collection WKT string of GEDI lidar footprint center locations |
tree.date | NA | date of tree measurement |
family | NA | latin name of botanical family |
species | NA | latin name of species (genus species) |
pft | NA | plant functional type: EA = evergreen angiosperm; DA = deciduous angiosperm; EG = evergreen gymnosperm; DG = deciduous gymnosperm |
wsg | g/cm3 | the ratio of wood density to water |
wsg.sd | g/cm3 | standard deviation of WSG sample values |
tree | NA | tree identifier |
stem | NA | stem identifier |
x | m | easting UTM coordinate |
y | m | northing UTM coordinate |
z | m | elevation relative to geoid coordinate |
status | NA | live tree (1=TRUE,0=FALSE) |
allom.key | NA | key to allometric LUT |
a.stem | m2 | area of total stem cross-section at measurement height |
h.t | m | total height of plant from ground to highest leaf |
h.t.mod | NA | tree heights modelled using local or regional DBH-Ht relationship |
d.stem | m | diameter of stem at measurement height |
d.stem.valid | NA | valid tree diameter to use in AGB calculation (1=TRUE,0=FALSE) |
d.ht | m | height at which stem diameter was measured |
c.w | m | diameter or width of crown |
m.agb | kg | mass of above-ground components of tree |
Tutorials¶
Harmonized Landsat Sentinel (HLS) Search and Composite¶
In this tutorial, we will search the LPDAAC for HLS 30m optical imagery that intersects an AOI. We will filter the catalog based on a cloud cover % and build a maximum-NDVI composite image, including a suite of popular indices. We will begin by installing any packages we need and importing the packages that we will use.
If needed the following packages should be installed:
[1]:
# cleanup data: removes data files that should be replaced
!rm -rf ./local-s3.json
!rm -rf ./sample.json
if False:
!conda install -c conda-forge pystac-client -y
!conda install -c conda-forge rio-tiler -y
Prerequisites
- geopandas
- folium
- pystac-client
- rio_tiler
[2]:
# Uncomment the following lines to install these packages if you haven't already.
# !pip install geopandas
# !pip install folium
# !pip install pystac-client
# !pip install rio_tiler
We will now import a suite of packages that we will need:
[3]:
from maap.maap import MAAP
maap = MAAP(maap_host='api.ops.maap-project.org')
import geopandas as gpd
import folium
import h5py
import pandas
import matplotlib
import matplotlib.pyplot as plt
from shapely.geometry import Polygon
from pystac_client import Client
import datetime
import os
import rasterio as rio
import boto3
import json
import botocore
from rasterio.session import AWSSession
from rio_tiler.io import COGReader
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
/opt/conda/lib/python3.7/site-packages/geopandas/_compat.py:115: UserWarning: The Shapely GEOS version (3.11.1-CAPI-1.17.1) is incompatible with the GEOS version PyGEOS was compiled with (3.8.1-CAPI-1.13.3). Conversions between both will be slow.
shapely_geos_version, geos_capi_version_string
Creating an AOI¶
To begin we will create a polygon in a forested area in Virginia, USA. We will do this by providing a number of lat/lon coordinates and creating an AOI.
[4]:
lon_coords = [-80, -80, -79., -79, -80]
lat_coords = [39, 38.2, 38.2, 39, 39]
polygon_geom = Polygon(zip(lon_coords, lat_coords))
crs = 'epsg:4326'
AOI = gpd.GeoDataFrame(index=[0], crs=crs, geometry=[polygon_geom])
AOI_bbox = AOI.bounds.iloc[0].to_list()
We can visualize this polygon using a folium interactive map.
[5]:
m = folium.Map([38.5,-79.3], zoom_start=9, tiles='OpenStreetMap')
folium.GeoJson(AOI).add_to(m)
folium.LatLngPopup().add_to(m)
m
[5]:
Accessing the HLS Data¶
To be able to access the HLS imagery we need to activate a ‘rasterio’ AWS session. This will give us the required access keys that we need to search and read data from the LPDAAC S3 bucket.
[6]:
def get_aws_session_DAAC():
"""Create a Rasterio AWS Session with Credentials"""
creds = maap.aws.earthdata_s3_credentials('https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials')
boto3_session = boto3.Session(
aws_access_key_id=creds['accessKeyId'],
aws_secret_access_key=creds['secretAccessKey'],
aws_session_token=creds['sessionToken'],
region_name='us-west-2'
)
return AWSSession(boto3_session)
[7]:
print('Getting AWS credentials...')
aws_session = get_aws_session_DAAC()
print('Finished')
Getting AWS credentials...
Finished
Now that we have our session credentials set up, we can search the HLS catalog using the following function, filtering by spatial extent (our AOI) and a time window:
[8]:
def query_stac(year, bbox, max_cloud, api, start_month_day, end_month_day):
print('opening client')
catalog = Client.open(api)
date_min = str(year) + '-' + start_month_day
print('start_month_day:\t\t', start_month_day)
print('end_month_day:\t\t', end_month_day)
date_max = str(year) + '-' + end_month_day
start_date = datetime.datetime.strptime(date_min, "%Y-%m-%d")
end_date = datetime.datetime.strptime(date_max, "%Y-%m-%d")
start = start_date.strftime("%Y-%m-%dT00:00:00Z")
end = end_date.strftime("%Y-%m-%dT23:59:59Z")
print('start date, end date:\t\t', start, end)
print('\nConducting HLS search now...')
search = catalog.search(
collections=["HLSL30.v2.0"],
datetime=[start,end],
bbox=bbox,
max_items=500, # for testing, and keep it from hanging
# query={"eo:cloud_cover":{"lt":20}} #doesn't work
)
print(f"Search query parameters:\n{search}\n")
results = search.get_all_items_as_dict()
return results
Here we run our STAC search and write the results out to a JSON file so we can access it later.
[9]:
search = query_stac(2020, AOI_bbox, 20, 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD', '06-01', '09-30')
with open("./sample.json", "w") as outfile:
json.dump(search, outfile)
opening client
start_month_day: 06-01
end_month_day: 09-30
start date, end date: 2020-06-01T00:00:00Z 2020-09-30T23:59:59Z
Conducting HLS search now...
Search query parameters:
<pystac_client.item_search.ItemSearch object at 0x7ff7aba2b6d0>
So far, we have not filtered by cloud cover, which is a common filtering parameter for optical imagery. We can use the metadata files included in our STAC search to find the amount of cloud cover in each file and decide if it meets our threshold or not. We will set a cloud cover threshold of 50%. While we are doing this, we will also change the URLs to access the optical imagery from “https://” to AWS S3 URLs (“S3://”).
[10]:
def write_local_data_and_catalog_s3(catalog, bands, save_path, cloud_cover, s3_path="s3://"):
'''Given path to a response json from a sat-api query, make a copy changing urls to local paths'''
creds = maap.aws.earthdata_s3_credentials('https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials')
aws_session = boto3.session.Session(
aws_access_key_id=creds['accessKeyId'],
aws_secret_access_key=creds['secretAccessKey'],
aws_session_token=creds['sessionToken'],
region_name='us-west-2')
s3 = aws_session.client('s3')
with open(catalog) as f:
clean_features = []
asset_catalog = json.load(f)
# Remove duplicate scenes
features = asset_catalog['features']
for feature in features:
umm_json = feature['links'][6]['href']
umm_data = !curl -i {umm_json}
for line in umm_data:
if "CLOUD_COVERAGE" in line:
cc_perc = (int(line.split("CLOUD_COVERAGE")[-1].split('"')[4]))
if cc_perc > cloud_cover:
pass
else:
try:
for band in bands:
output_file = feature['assets'][band]['href'].replace('https://data.lpdaac.earthdatacloud.nasa.gov/', s3_path)
bucket_name = output_file.split("/")[2]
s3_key = "/".join(output_file.split("/")[3:])
head = s3.head_object(Bucket = bucket_name, Key = s3_key, RequestPayer='requester')
if head['ResponseMetadata']['HTTPStatusCode'] == 200:
feature['assets'][band]['href'] = output_file
clean_features.append(feature)
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print(f"The object does not exist. {output_file}")
else:
raise
# save and updated catalog with local paths
asset_catalog['features'] = clean_features
local_catalog = catalog.replace('sample', 'local-s3')
with open(local_catalog,'w') as jsonfile:
json.dump(asset_catalog, jsonfile)
return local_catalog
When run, this will create a new JSON file that will only include files that meet our cloud cover threshold and have S3 URLs.
[11]:
local_cat = write_local_data_and_catalog_s3('./sample.json', ['B02','B03','B04','B05','B06','B07','Fmask'], './', 60, s3_path="s3://")
Now that we have images that meet our requirements, we will composite them into a multiband image for our AOI. We will composite the image on a band-by-band basis, so we first have to get a list of all the file paths for each band.
[12]:
def GetBandLists(inJSON, bandnum, tiles=['']):
bands = dict({2:'B02', 3:'B03', 4:'B04', 5:'B05', 6:'B06', 7:'B07',8:'Fmask'})
BandList = []
with open(inJSON) as f:
response = json.load(f)
for i in range(len(response['features'])):
try:
getBand = response['features'][i]['assets'][bands[bandnum]]['href']
# check 's3' is at position [:2]
if getBand.startswith('s3', 0, 2):
BandList.append(getBand)
except Exception as e:
print(e)
BandList.sort()
return BandList
We will build a band list of file paths for each image band. We will also access the ‘fmask’ band to mask out clouds.
[13]:
blue_bands = GetBandLists('./local-s3.json', 2)
green_bands = GetBandLists('./local-s3.json', 3)
red_bands = GetBandLists('./local-s3.json', 4)
nir_bands = GetBandLists('./local-s3.json', 5)
swir_bands = GetBandLists('./local-s3.json', 6)
swir2_bands = GetBandLists('./local-s3.json', 7)
fmask_bands = GetBandLists('./local-s3.json', 8)
print('number of each images in each band = ', len(blue_bands))
number of each images in each band = 21
Reading in HLS data and creating composite¶
We will not read all of the HLS data, as we only need what’s included in our AOI. To do this “windowed read” we need to know the dimensions, in pixels, of the window. To do this we need to convert our AOI to a projected coordinate system (UTM) and calculate the number of columns and rows we will need, using a pixel resolution of 30m.
[14]:
def get_shape(bbox, res=30):
left, bottom, right, top = bbox
width = int((right-left)/res)
height = int((top-bottom)/res)
return height,width
# convert to m
AOI_utm = AOI.to_crs('epsg:32617')
height, width = get_shape(AOI_utm.bounds.iloc[0].to_list())
When building a maximum-NDVI composite, the first data we need is the red and NIR bands to make an NDVI band for each image. We will use ‘riotiler’ to perform a windowed read of our data. We will also read the ‘fmask’ layer as we can use this to mask out unwanted pixels.
[15]:
def ReadData(file, in_bbox, height, width, epsg="epsg:4326", dst_crs="epsg:4326"):
'''Read a window of data from the raster matching the tile bbox'''
with COGReader(file) as cog:
img = cog.part(in_bbox, bounds_crs=epsg, max_size=None, dst_crs=dst_crs, height=height, width=width)
return (np.squeeze(img.as_masked().astype(np.float32)) * 0.0001)
Our AWS session has an expiry time. Now would be a good time to renew our access key to ensure we do not encounter any timeout issues.
[16]:
print('Getting AWS credentials...')
aws_session = get_aws_session_DAAC()
print('Finished')
Getting AWS credentials...
Finished
Using our renewed session, for each band we will read in the relevant band in each of our images, creating an array of image bands.
[17]:
with rio.Env(aws_session):
print('reading red bands...')
red_stack = np.array([ReadData(red_bands[i], AOI_bbox, height, width, epsg="epsg:4326", dst_crs="epsg:4326") for i in range(len(red_bands))])
print('reading nir bands...')
nir_stack = np.array([ReadData(nir_bands[i], AOI_bbox, height, width, epsg="epsg:4326", dst_crs="epsg:4326") for i in range(len(nir_bands))])
print('reading fmask bands...')
fmask_stack = np.array([ReadData(fmask_bands[i], AOI_bbox, height, width, epsg="epsg:4326", dst_crs="epsg:4326") for i in range(len(fmask_bands))])
print('finished')
print("number of red_bands = ", np.shape(red_stack)[0])
reading red bands...
reading nir bands...
reading fmask bands...
finished
number of red_bands = 21
Now that we have our red band array and NIR band array we can make an NDVI image. While we do this, we will also apply our ‘fmask’ to remove any pixels that contain clouds.
[18]:
ndvi_stack = np.ma.array(np.where(((fmask_stack==1)), -9999, (nir_stack-red_stack)/(nir_stack+red_stack)))
ndvi_stack = np.where((~np.isfinite(ndvi_stack)) | (ndvi_stack>1), -9999, ndvi_stack)
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
"""Entry point for launching an IPython kernel.
At this point, we can plot our images and see which ones contain cloud coverage. These images have gaps where there is no data or cloud coverage.
[19]:
fig, axes = plt.subplots(3,7, figsize=(33,30))
for i, ax in enumerate(axes.flat):
ndvi_stack[i] = np.where((ndvi_stack[i]>1) | (ndvi_stack[i]<-1), 0, ndvi_stack[i])
ax.axes.xaxis.set_visible(False)
ax.axes.yaxis.set_visible(False)
ax.imshow(ndvi_stack[i], cmap='viridis', clim=(0.1, 0.5))

Now that we have a stack of NDVI bands, we can create an index band that maps the pixel position from each band where the NDVI value is greatest. We can use this to locate the pixels we want to use in our composite.
[20]:
# Create Bool mask where there is no value in any of the NDVI layers
print("Make NDVI valid mask")
print("shape:\t\t", np.ma.array(ndvi_stack).shape)
MaxNDVI = np.ma.max(np.ma.array(ndvi_stack),axis=0)
BoolMask = np.ma.getmask(MaxNDVI)
del MaxNDVI
## Get the argmax index positions from the stack of NDVI images
print("Get stack nan mask")
ndvi_stack = np.ma.array(ndvi_stack)
print("Calculate Stack max NDVI image")
NDVImax = np.nanargmax(ndvi_stack,axis=0)
## create a tmp array (binary mask) of the same input shape
NDVItmp = np.ma.zeros(ndvi_stack.shape, dtype=bool)
## for each dimension assign the index position (flattens the array to a LUT)
print("Create LUT of max NDVI positions")
for i in range(np.shape(ndvi_stack)[0]):
NDVItmp[i,:,:]=NDVImax==i
Make NDVI valid mask
shape: (21, 3006, 2951)
Get stack nan mask
Calculate Stack max NDVI image
Create LUT of max NDVI positions
Now that we have our NDVI lookup table and a list of all the images for each band, we can make composites based on the maximum NDVI value. For this, we will use the following two functions.
[21]:
def CollapseBands(inArr, NDVItmp, BoolMask):
inArr = np.ma.masked_equal(inArr, 0)
inArr[np.logical_not(NDVItmp)]=0
compImg = np.ma.masked_array(inArr.sum(0), BoolMask)
#print(compImg)
return compImg
def CreateComposite(file_list, NDVItmp, BoolMask, in_bbox, height, width, epsg, dst_crs):
MaskedFile = [ReadData(file_list[i], in_bbox, height, width, epsg, dst_crs) for i in range(len(file_list))]
Composite=CollapseBands(MaskedFile, NDVItmp, BoolMask)
return Composite
For each band, we will read all the images (for the required band) and create a composite based on the maximum NDVI value.
[22]:
aws_session = get_aws_session_DAAC()
with rio.Env(aws_session):
print('Creating Blue Composite')
blue_comp = CreateComposite(blue_bands, NDVItmp, BoolMask, AOI_bbox, height, width, "epsg:4326", "epsg:4326")
print('Creating Green Composite')
green_comp = CreateComposite(green_bands, NDVItmp, BoolMask, AOI_bbox, height, width, "epsg:4326", "epsg:4326")
print('Creating Red Composite')
red_comp = CreateComposite(red_bands, NDVItmp, BoolMask, AOI_bbox, height, width, "epsg:4326", "epsg:4326")
print('Creating NIR Composite')
nir_comp = CreateComposite(nir_bands, NDVItmp, BoolMask, AOI_bbox, height, width, "epsg:4326", "epsg:4326")
print('Creating SWIR Composite')
swir_comp = CreateComposite(swir_bands, NDVItmp, BoolMask, AOI_bbox, height, width, "epsg:4326", "epsg:4326")
print('Creating SWIR2 Composite')
swir2_comp = CreateComposite(swir2_bands, NDVItmp, BoolMask, AOI_bbox, height, width, "epsg:4326", "epsg:4326")
print('Creating NDVI Composite')
ndvi_comp = CollapseBands(ndvi_stack, NDVItmp, BoolMask)
print('Creating fmask Composite')
fmask_comp = CollapseBands(fmask_stack, NDVItmp, BoolMask)
Creating Blue Composite
Creating Green Composite
Creating Red Composite
Creating NIR Composite
Creating SWIR Composite
Creating SWIR2 Composite
Creating NDVI Composite
Creating fmask Composite
We can look at our NDVI composite image and see we now have a complete image for our AOI.
[23]:
fig, axes = plt.subplots(1,1, figsize=(20,20))
ax.axes.xaxis.set_visible(False)
ax.axes.yaxis.set_visible(False)
plt.imshow(np.where(fmask_comp==1, -9999, ndvi_comp))
[23]:
<matplotlib.image.AxesImage at 0x7ff8d3fdd5d0>

Now that we have a 7-band composite image, we can use these bands to calculate a suite of common vegetation indices using the following functions.
[24]:
# SAVI
def calcSAVI(red, nir):
savi = ((nir - red)/(nir + red + 0.5))*(1.5)
print('\tSAVI Created')
return savi
# NDMI
def calcNDMI(nir, swir):
ndmi = (nir - swir)/(nir + swir)
print('\tNDMI Created')
return ndmi
# EVI
def calcEVI(blue, red, nir):
evi = 2.5 * ((nir - red) / (nir + 6 * red - 7.5 * blue + 1))
print('\tEVI Created')
return evi
# NBR
def calcNBR(nir, swir2):
nbr = (nir - swir2)/(nir + swir2)
print('\tNBR Created')
return nbr
# MSAVI
def calcMSAVI(red, nir):
msavi = (2 * nir + 1 - np.sqrt((2 * nir + 1)**2 - 8 * (nir - red))) / 2
print('\tMSAVI Created')
return msavi
We can call these functions and make our additional indices.
[25]:
# calculate covars
print("Generating covariates")
SAVI = calcSAVI(red_comp, nir_comp)
#print("NDMI")
NDMI = calcNDMI(nir_comp, swir_comp)
#print("EVI")
EVI = calcEVI(blue_comp, red_comp, nir_comp)
#print("NBR")
NBR = calcNBR(nir_comp, swir2_comp)
MSAVI = calcMSAVI(red_comp, nir_comp)
Generating covariates
SAVI Created
NDMI Created
EVI Created
NBR Created
MSAVI Created
We have a suite of 12 bands now, and we can merge them together into a single 12-band image stack.
[26]:
print("\nCreating raster stack...\n")
stack = np.transpose([blue_comp, green_comp, red_comp, nir_comp, swir_comp, swir2_comp, ndvi_comp, SAVI, MSAVI, NDMI, EVI, NBR], [0, 1, 2])
stack = np.where(fmask_comp==1, -9999, stack)
print(np.shape(stack))
Creating raster stack...
(12, 3006, 2951)
Display Results¶
We can look at each of these bands by using ‘matplotlib’ to plot each one individually.
[27]:
n: int = len(stack)
#topo_cmaps = ["bone","Spectral", "magma", "RdBu", "coolwarm"]
topo_cmaps = ['Blues','Greens','Reds','PuRd','OrRd','BuPu','YlGn','GnBu','YlOrBr','inferno','plasma', 'viridis']
print(stack.shape)
bandnames = ['blue_comp', 'green_comp', 'red_comp', 'nir_comp', 'swir_comp', 'swir2_comp', 'ndvi_comp', 'SAVI', 'MSAVI', 'NDMI', 'EVI', 'NBR']
font = {'size' : 30}
matplotlib.rc('font', **font)
#axs = [[0,0],[0,1],[0,2],[0,3],[0,4],[0,5],[1,0],[1,1],[1,2],[1,3],[1,4],[1,5],[2,0],[2,1],[2,2],[2,3],[2,4],[2,5]]
fig, axes = plt.subplots(3,4, figsize=(33,30))
print(axes.flat)
for i, ax in enumerate(axes.flat):
ax.imshow(stack[i], cmap=topo_cmaps[i], clim=(np.percentile(stack[i], 10), np.percentile(stack[i], 90)))
ax.set_title(bandnames[i])
(12, 3006, 2951)
<numpy.flatiter object at 0x5609d86b7ce0>

We can also visualize our composite NDVI band on our interactive ‘folium’ map. You can see that even though we found multiple images, by using a windowed read we were able to just read and process the data we needed.
[28]:
m = folium.Map(location=[38.6, -78.5], zoom_start=9, tiles='CartoDB positron')
AOI_bx = AOI.bounds
#folium.GeoJson(AOI, style_function=lambda x: {'fillColor': 'orange','opacity':0}).add_to(m)
geo_r = folium.raster_layers.ImageOverlay(np.ma.getdata(ndvi_comp), opacity=1, bounds=[[AOI_bx['miny'][0],AOI_bx['minx'][0]],[AOI_bx['maxy'][0],AOI_bx['maxx'][0]]])
geo_r.add_to(m)
m
[28]: