Direct DAAC S3 Bucket Access (BETA)
Authors: Alex Mandel (Development Seed), Brian Freitag (NASA MSFC), Jamison French (Development Seed)
Date: September 27, 2023
Description: In this tutorial, we demonstrate how to assume the MAAP data reader role to access specific DAAC buckets.
This tutorial demonstrates an experimental feature to allow access to DAACs without using EarthDataLogin.
This method currently works for a select number of DAACs and their EarthDataCloud datasets which are stored in AWS S3: - NSIDC - ORNL - GES DISC - LPDAAC - PODAAC
Run This Notebook
To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the “Getting started with the MAAP” section of our documentation.
Disclaimer: this tutorial must be run within MAAP’s ADE to assume the necessary permissions. This tutorial was tested using the vanilla workspace image. If you encounter issues with the installs, ensure you have the latest version of pip installed.
Additional Resources
Importing Packages
If the packages below are not installed already, uncomment the following cell.
[ ]:
%pip install h5netcdf fsspec s3fs xarray rioxarray --quiet
[1]:
import boto3
import fsspec
import xarray
import rioxarray
import matplotlib.pyplot as plt
import rasterio
from rasterio.session import AWSSession
Access The Data
We’ll create a couple helper functions to setup the assumed role session and view the data.
[2]:
def assume_role_credentials(ssm_parameter_name):
# Create a session using your current credentials
session = boto3.Session()
# Retrieve the SSM parameter
ssm = session.client('ssm', "us-west-2")
parameter = ssm.get_parameter(
Name=ssm_parameter_name,
WithDecryption=True
)
parameter_value = parameter['Parameter']['Value']
# Assume the DAAC access role
sts = session.client('sts')
assumed_role_object = sts.assume_role(
RoleArn=parameter_value,
RoleSessionName='TutorialSession'
)
# From the response that contains the assumed role, get the temporary
# credentials that can be used to make subsequent API calls
credentials = assumed_role_object['Credentials']
return credentials
# We can pass assumed role credentials into fsspec
def fsspec_access(credentials):
return fsspec.filesystem(
"s3",
key=credentials['AccessKeyId'],
secret=credentials['SecretAccessKey'],
token=credentials['SessionToken']
)
# We can also pass assumed role credentials into rasterio AWSSession
def rasterio_access(credentials):
aws_session = AWSSession(
aws_access_key_id=credentials['AccessKeyId'],
aws_secret_access_key=credentials['SecretAccessKey'],
aws_session_token=credentials['SessionToken']
)
return rasterio.Env(aws_session)
Initialize the assumed role sessions
[3]:
s3_fsspec = fsspec_access(assume_role_credentials("/iam/maap-data-reader"))
s3_rasterio = rasterio_access(assume_role_credentials("/iam/maap-data-reader"))
NSIDC DAAC Access
We can use xarray
to open a specific GROUP within the HDF5 such as the “gt1l” track.
[4]:
nsidc_object = "s3://nsidc-cumulus-prod-protected/ATLAS/ATL08/006/2023/06/21/ATL08_20230621235543_00272011_006_01.h5"
with s3_fsspec.open(nsidc_object) as f:
ds = xarray.open_dataset(f, group='gt1l/land_segments', engine="h5netcdf", phony_dims='sort')
ds
[4]:
<xarray.Dataset> Dimensions: (delta_time: 21470, ds_geosegments: 5, ds_surf_type: 5) Coordinates: * delta_time (delta_time) datetime64[ns] 2023-06-21T23:55:51.147657... latitude (delta_time) float32 ... longitude (delta_time) float32 ... Dimensions without coordinates: ds_geosegments, ds_surf_type Data variables: (12/41) asr (delta_time) float32 ... atlas_pa (delta_time) float32 ... beam_azimuth (delta_time) float32 ... beam_coelev (delta_time) float32 ... brightness_flag (delta_time) float32 ... cloud_flag_atm (delta_time) float32 ... ... ... snr (delta_time) float32 ... solar_azimuth (delta_time) float32 ... solar_elevation (delta_time) float32 ... surf_type (delta_time, ds_surf_type) int8 ... terrain_flg (delta_time) float64 ... urban_flag (delta_time) float64 ... Attributes: Description: Contains data categorized as land at 100 meter intervals. data_rate: Data are stored as aggregates of 100 meters.
ORNL DAAC Access
We can also use rioxarray
to inspect our TIF objects.
[5]:
ornl_object = "s3://ornl-cumulus-prod-protected/gedi/GEDI_L4B_Gridded_Biomass_V2_1/data/GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif"
with s3_fsspec.open(ornl_object) as obj:
data_array = rioxarray.open_rasterio(obj)
data_array
[5]:
<xarray.DataArray (band: 1, y: 14616, x: 34704)> [507233664 values with dtype=float32] Coordinates: * band (band) int64 1 * x (x) float64 -1.737e+07 -1.737e+07 ... 1.737e+07 1.737e+07 * y (y) float64 7.314e+06 7.313e+06 ... -7.313e+06 -7.314e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Area _FillValue: -9999.0 scale_factor: 1.0 add_offset: 0.0
PO DAAC Access
[7]:
po_object = "s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T55GEM_20230813T235239Z_20230815T154108Z_S2B_30_v1.0_B01_WTR.tif"
with s3_fsspec.open(po_object) as obj:
data_array = rioxarray.open_rasterio(obj)
data_array
[7]:
<xarray.DataArray (band: 1, y: 3660, x: 3660)> [13395600 values with dtype=uint8] Coordinates: * band (band) int64 1 * x (x) float64 5e+05 5e+05 5.001e+05 ... 6.097e+05 6.098e+05 * y (y) float64 -4.8e+06 -4.8e+06 -4.8e+06 ... -4.91e+06 -4.91e+06 spatial_ref int64 0 Attributes: (12/48) ACCODE: ... AEROSOL_CLASS_REMAPPING_ENABLED: ... AEROSOL_NOT_WATER_TO_HIGH_CONF_WATER_FMASK_VALUES: ... AEROSOL_PARTIAL_SURFACE_AGGRESSIVE_TO_HIGH_CONF_WATER_FMASK_VALUES: ... AEROSOL_PARTIAL_SURFACE_WATER_CONSERVATIVE_TO_HIGH_CONF_WATER_FMASK_VALUE... AEROSOL_WATER_MODERATE_CONF_TO_HIGH_CONF_WATER_FMASK_VALUES: ... ... ... WORLDCOVER_COVERAGE: ... WORLDCOVER_SOURCE: ... _FillValue: ... scale_factor: ... add_offset: ... long_name: ...
GES DISC Access
[8]:
ges_disc_object = "s3://gesdisc-cumulus-prod-protected/Landslide/Global_Landslide_Nowcast.1.1/2020/Global_Landslide_Nowcast_v1.1_20201231.tif"
with s3_fsspec.open(ges_disc_object) as obj:
data_array = rioxarray.open_rasterio(obj)
data_array
[8]:
<xarray.DataArray (band: 1, y: 14400, x: 43200)> [622080000 values with dtype=uint8] Coordinates: * band (band) int64 1 * x (x) float64 -180.0 -180.0 -180.0 -180.0 ... 180.0 180.0 180.0 * y (y) float64 60.0 59.99 59.98 59.97 ... -59.98 -59.99 -60.0 spatial_ref int64 0 Attributes: AREA_OR_POINT: Area STATISTICS_MAXIMUM: 2 STATISTICS_MEAN: nan STATISTICS_MINIMUM: 0 STATISTICS_STDDEV: nan _FillValue: 255 scale_factor: 1.0 add_offset: 0.0
LP DAAC Access
We can also use rasterio to directly inspect our TIF objects.
[9]:
lp_object = "s3://lp-prod-protected/HLSL30.020/HLS.L30.T56JMN.2023225T234225.v2.0/HLS.L30.T56JMN.2023225T234225.v2.0.B11.tif"
with s3_rasterio:
with rasterio.open(lp_object) as src:
print(f'Width: {src.width}')
print(f'Height: {src.height}')
print(f'Bounds: {src.bounds}')
print(f'CRS: {src.crs}')
print(f'Count: {src.count}')
print(f'Data type: {src.dtypes}')
Width: 3660
Height: 3660
Bounds: BoundingBox(left=399960.0, bottom=-3309780.0, right=509760.0, top=-3199980.0)
CRS: EPSG:32656
Count: 1
Data type: ('int16',)
[ ]: