Direct DAAC S3 Bucket Access (BETA)

Authors: Alex Mandel (Development Seed), Brian Freitag (NASA MSFC), Jamison French (Development Seed)

Updated: October 16, 2025

Description: In this tutorial, we demonstrate how to assume the MAAP data reader role to access specific DAAC buckets.

This tutorial demonstrates an experimental feature to allow access to DAACs without using EarthDataLogin.

This method currently works for a select number of DAACs and their EarthDataCloud datasets which are stored in AWS S3:

Run This Notebook

To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the “Getting started with the MAAP” section of our documentation.

Disclaimer: this tutorial must be run within MAAP’s ADE to assume the necessary permissions. This tutorial was tested using the Pangeo workspace image. If you encounter issues with the installs, ensure you have the latest version of pip installed.

Additional Resources

Importing Packages

If the packages below are not installed already, uncomment the following cell.

[1]:
import boto3
import fsspec
import xarray
import rioxarray
import matplotlib.pyplot as plt
import rasterio
from rasterio.session import AWSSession

Access The Data

We’ll create a couple helper functions to setup the assumed role session and view the data.

[2]:
def assume_role_credentials(ssm_parameter_name):
    # Create a session using your current credentials
    session = boto3.Session()

    # Retrieve the SSM parameter
    ssm = session.client('ssm', "us-west-2")
    parameter = ssm.get_parameter(
        Name=ssm_parameter_name,
        WithDecryption=True
    )
    parameter_value = parameter['Parameter']['Value']

    # Assume the DAAC access role
    sts = session.client('sts')
    assumed_role_object = sts.assume_role(
        RoleArn=parameter_value,
        RoleSessionName='TutorialSession'
    )

    # From the response that contains the assumed role, get the temporary
    # credentials that can be used to make subsequent API calls
    credentials = assumed_role_object['Credentials']

    return credentials

# We can pass assumed role credentials into fsspec
def fsspec_access(credentials, requester_pays=False):
    return fsspec.filesystem(
        "s3",
        key=credentials['AccessKeyId'],
        secret=credentials['SecretAccessKey'],
        token=credentials['SessionToken'],
        requester_pays=requester_pays
    )

# We can also pass assumed role credentials into rasterio AWSSession
def rasterio_access(credentials, requester_pays=False):
    aws_session = AWSSession(
        aws_access_key_id=credentials['AccessKeyId'],
        aws_secret_access_key=credentials['SecretAccessKey'],
        aws_session_token=credentials['SessionToken'],
        requester_pays=requester_pays

    )
    return rasterio.Env(aws_session)

Accessing GES DISC, LP DAAC and NSIDC Requester Pays Buckets

All DAACs support Temporary S3 Credentials through Earthdata Login (EDL) authentication.

Some NASA DAACs, such as GES DISC, LP DAAC and NSIDC also store protected data in S3 buckets that use the Requester Pays model. To access these datasets, you need temporary AWS credentials and must explicitly declare that you agree to pay for data access. Below is an example of how to authenticate using aws sts assume-role and access a file using gdalinfo.

export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" \ $(aws sts assume-role \ --role-arn arn:aws:iam::884094767067:role/maap-data-reader \ --role-session-name GESDISC \ --query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \ --output text)) export AWS_REQUEST_PAYER=requester gdalinfo /vsis3/gesdisc-cumulus-prod-protected/Landslide/Global_Landslide_Nowcast.1.1/2020/Global_Landslide_Nowcast_v1.1_20201231.tif

Initialize the assumed role sessions

[3]:
s3_fsspec_requesterpays = fsspec_access(assume_role_credentials("/iam/maap-data-reader"), True)
s3_rasterio_requesterpays = rasterio_access(assume_role_credentials("/iam/maap-data-reader"), True)

LP DAAC Access

We can use rasterio to directly inspect our TIF objects.

[4]:
lp_object = "s3://lp-prod-protected/HLSL30.020/HLS.L30.T56JMN.2023225T234225.v2.0/HLS.L30.T56JMN.2023225T234225.v2.0.B11.tif"

with s3_rasterio_requesterpays:
    with rasterio.open(lp_object) as src:
        print(f'Width: {src.width}')
        print(f'Height: {src.height}')
        print(f'Bounds: {src.bounds}')
        print(f'CRS: {src.crs}')
        print(f'Count: {src.count}')
        print(f'Data type: {src.dtypes}')
Width: 3660
Height: 3660
Bounds: BoundingBox(left=399960.0, bottom=-3309780.0, right=509760.0, top=-3199980.0)
CRS: EPSG:32656
Count: 1
Data type: ('int16',)

GES DISC Access

[5]:
ges_disc_object = "s3://gesdisc-cumulus-prod-protected/Landslide/Global_Landslide_Nowcast.1.1/2020/Global_Landslide_Nowcast_v1.1_20201231.tif"

with s3_fsspec_requesterpays.open(ges_disc_object) as obj:
    data_array = rioxarray.open_rasterio(obj)
data_array
[5]:
<xarray.DataArray (band: 1, y: 14400, x: 43200)> Size: 622MB
[622080000 values with dtype=uint8]
Coordinates:
  * band         (band) int64 8B 1
  * x            (x) float64 346kB -180.0 -180.0 -180.0 ... 180.0 180.0 180.0
  * y            (y) float64 115kB 60.0 59.99 59.98 ... -59.98 -59.99 -60.0
    spatial_ref  int64 8B 0
Attributes:
    AREA_OR_POINT:       Area
    STATISTICS_MAXIMUM:  2
    STATISTICS_MEAN:     nan
    STATISTICS_MINIMUM:  0
    STATISTICS_STDDEV:   nan
    _FillValue:          255
    scale_factor:        1.0
    add_offset:          0.0

NSIDC DAAC Access

Initialize the assumed role session.

[6]:
nsidc_object = "s3://nsidc-cumulus-prod-protected/ATLAS/ATL08/006/2023/06/21/ATL08_20230621235543_00272011_006_02.h5"

fsspec_caching = {
    "cache_type": "blockcache",  # block cache stores blocks of fixed size and uses eviction using a LRU strategy.
    "block_size": 8
    * 1024
    * 1024,  # size in bytes per block, adjust depends on the file size but the recommended size is in the MB
}
ds = xarray.open_dataset(
    s3_fsspec_requesterpays.open(nsidc_object, "rb", **fsspec_caching),
    group="gt1l/land_segments",
    engine="h5netcdf",
    decode_times=True
)
ds
[6]:
<xarray.Dataset> Size: 5MB
Dimensions:            (delta_time: 21468, ds_geosegments: 5, ds_surf_type: 5)
Coordinates:
  * delta_time         (delta_time) datetime64[ns] 172kB 2023-06-21T23:55:51....
    latitude           (delta_time) float32 86kB ...
    longitude          (delta_time) float32 86kB ...
Dimensions without coordinates: ds_geosegments, ds_surf_type
Data variables: (12/41)
    asr                (delta_time) float32 86kB ...
    atlas_pa           (delta_time) float32 86kB ...
    beam_azimuth       (delta_time) float32 86kB ...
    beam_coelev        (delta_time) float32 86kB ...
    brightness_flag    (delta_time) float32 86kB ...
    cloud_flag_atm     (delta_time) float32 86kB ...
    ...                 ...
    snr                (delta_time) float32 86kB ...
    solar_azimuth      (delta_time) float32 86kB ...
    solar_elevation    (delta_time) float32 86kB ...
    surf_type          (delta_time, ds_surf_type) int8 107kB ...
    terrain_flg        (delta_time) float64 172kB ...
    urban_flag         (delta_time) float64 172kB ...
Attributes:
    Description:  Contains data categorized as land at 100 meter intervals.
    data_rate:    Data are stored as aggregates of 100 meters.

Accessing Standard DAAC Buckets

Initialize the assumed role sessions

[10]:
s3_fsspec = fsspec_access(assume_role_credentials("/iam/maap-data-reader"))

ORNL DAAC Access

We can also use rioxarray to inspect our TIF objects.

[12]:
ornl_object = "s3://ornl-cumulus-prod-protected/gedi/GEDI_L4B_Gridded_Biomass_V2_1/data/GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif"

with s3_fsspec.open(ornl_object) as obj:
    data_array = rioxarray.open_rasterio(obj)
data_array
[12]:
<xarray.DataArray (band: 1, y: 14616, x: 34704)> Size: 2GB
[507233664 values with dtype=float32]
Coordinates:
  * band         (band) int64 8B 1
  * x            (x) float64 278kB -1.737e+07 -1.737e+07 ... 1.737e+07 1.737e+07
  * y            (y) float64 117kB 7.314e+06 7.313e+06 ... -7.313e+06 -7.314e+06
    spatial_ref  int64 8B 0
Attributes:
    AREA_OR_POINT:  Area
    _FillValue:     -9999.0
    scale_factor:   1.0
    add_offset:     0.0

PO DAAC Access

[13]:
po_object = "s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T55GEM_20230813T235239Z_20230815T154108Z_S2B_30_v1.0_B01_WTR.tif"
with s3_fsspec.open(po_object) as obj:
    data_array = rioxarray.open_rasterio(obj)
data_array
[13]:
<xarray.DataArray (band: 1, y: 3660, x: 3660)> Size: 13MB
[13395600 values with dtype=uint8]
Coordinates:
  * band         (band) int64 8B 1
  * x            (x) float64 29kB 5e+05 5e+05 5.001e+05 ... 6.097e+05 6.098e+05
  * y            (y) float64 29kB -4.8e+06 -4.8e+06 ... -4.91e+06 -4.91e+06
    spatial_ref  int64 8B 0
Attributes: (12/48)
    ACCODE:                                                                  ...
    AEROSOL_CLASS_REMAPPING_ENABLED:                                         ...
    AEROSOL_NOT_WATER_TO_HIGH_CONF_WATER_FMASK_VALUES:                       ...
    AEROSOL_PARTIAL_SURFACE_AGGRESSIVE_TO_HIGH_CONF_WATER_FMASK_VALUES:      ...
    AEROSOL_PARTIAL_SURFACE_WATER_CONSERVATIVE_TO_HIGH_CONF_WATER_FMASK_VALUE...
    AEROSOL_WATER_MODERATE_CONF_TO_HIGH_CONF_WATER_FMASK_VALUES:             ...
    ...                                                                          ...
    WORLDCOVER_SOURCE:                                                       ...
    AREA_OR_POINT:                                                           ...
    _FillValue:                                                              ...
    scale_factor:                                                            ...
    add_offset:                                                              ...
    long_name:                                                               ...