Direct DAAC S3 Bucket Access from HUB
Authors: Harshini Girish(UAH), Alex Mandel (Development Seed), Brian Freitag (NASA MSFC), Jamison French (Development Seed)
Updated: Dec 8, 2025
Description: In this tutorial, we demonstrate how to assume the MAAP data reader role to access specific DAAC buckets.
This tutorial demonstrates an experimental feature to allow access to DAACs without using EarthDataLogin.
This method currently works for a select number of DAACs and their EarthDataCloud datasets which are stored in AWS S3:
Run This Notebook
To access and run this tutorial within MAAP hub, please refer to the “Getting started with the MAAP” section of our documentation.
Disclaimer: this tutorial must be run within MAAP’s hub to assume the necessary permissions. This tutorial was tested using the vanilla workspace image. If you encounter issues with the installs, ensure you have the latest version of pip installed.
Additional Resources
Importing Packages
If the packages below are not installed already, uncomment the following cell.
[1]:
import os
import boto3
import fsspec
import matplotlib.pyplot as plt
import rasterio
import xarray as xr
import rioxarray as rxr
from rasterio.session import AWSSession
Access The Data
We’ll create a couple helper functions to setup the assumed role session and view the data.
[2]:
def hub_boto3_session(region_name: str | None = None) -> boto3.Session:
return boto3.Session(region_name=region_name)
def hub_boto3_client(service_name: str, region_name: str | None = None):
return hub_boto3_session(region_name).client(service_name)
def fsspec_access_hub(requester_pays: bool = False):
return fsspec.filesystem("s3", requester_pays=requester_pays, anon=False)
def rasterio_access_hub(requester_pays: bool = False, region_name: str | None = None):
boto_sess = hub_boto3_session(region_name=region_name)
aws = AWSSession(session=boto_sess, requester_pays=requester_pays)
if requester_pays:
os.environ["AWS_REQUEST_PAYER"] = "requester"
return rasterio.Env(aws)
def assume_role_credentials(ssm_parameter_name: str | None = None):
return None
def fsspec_access(credentials=None, requester_pays: bool = False):
return fsspec_access_hub(requester_pays=requester_pays)
def rasterio_access(credentials=None, requester_pays: bool = False, region_name: str | None = None):
return rasterio_access_hub(requester_pays=requester_pays, region_name=region_name)
Accessing GES DISC, LP DAAC and NSIDC Requester Pays Buckets
Some NASA DAACs, such as GES DISC, LP DAAC and NSIDC, expose protected data in S3 buckets that use the Requester Pays model. On the MAAP Hub, your AWS credentials are already provided by the environment, so you do not need to call aws sts assume-role. To read from these buckets you only need to indicate that you accept requester-pays charges by setting AWS_REQUEST_PAYER="requester" and creating your fsspec / rasterio S3 clients with requester_pays=True, as shown in the
example below.
[3]:
os.environ["AWS_REQUEST_PAYER"] = "requester"
fspec_requesterpays = fsspec.filesystem("s3", requester_pays=True, anon=False)
hub_session = boto3.Session()
s3_rasterio_requesterpays = rasterio.Env(
AWSSession(session=hub_session, requester_pays=True)
)
LP DAAC Access
We can use rasterio to directly inspect our TIF objects.
[4]:
# LP DAAC Access
lp_object = "s3://lp-prod-protected/HLSL30.020/HLS.L30.T56JMN.2023225T234225.v2.0/HLS.L30.T56JMN.2023225T234225.v2.0.B02.tif"
with s3_rasterio_requesterpays:
with rasterio.open(lp_object) as src:
print(f'Width: {src.width}')
print(f'Height: {src.height}')
print(f'Bounds: {src.bounds}')
print(f'CRS: {src.crs}')
print(f'Count: {src.count}')
print(f'Data type: {src.dtypes}')
Width: 3660
Height: 3660
Bounds: BoundingBox(left=399960.0, bottom=-3309780.0, right=509760.0, top=-3199980.0)
CRS: EPSG:32656
Count: 1
Data type: ('int16',)
GES DISC Access
[5]:
fs = fsspec.filesystem("s3", requester_pays=True, anon=False)
ges_disc_object ="s3://gesdisc-cumulus-prod-protected/Landslide/Global_Landslide_Nowcast.1.1/2020/Global_Landslide_Nowcast_v1.1_20201231.tif"
with fs.open(ges_disc_object, "rb") as obj:
data_array = rxr.open_rasterio(obj, masked=True)
data_array
[5]:
<xarray.DataArray (band: 1, y: 14400, x: 43200)> Size: 2GB
[622080000 values with dtype=float32]
Coordinates:
* band (band) int64 8B 1
* x (x) float64 346kB -180.0 -180.0 -180.0 ... 180.0 180.0 180.0
* y (y) float64 115kB 60.0 59.99 59.98 ... -59.98 -59.99 -60.0
spatial_ref int64 8B 0
Attributes:
AREA_OR_POINT: Area
STATISTICS_MAXIMUM: 2
STATISTICS_MEAN: nan
STATISTICS_MINIMUM: 0
STATISTICS_STDDEV: nan
scale_factor: 1.0
add_offset: 0.0NSIDC DAAC Access
Additional resource: Accessing Data in Cloud-Optimized GeoTIFFs
[6]:
# Tip: Tune fsspec caching for performance
os.environ["AWS_REQUEST_PAYER"] = "requester"
nsidc_object = "s3://nsidc-cumulus-prod-protected/ATLAS/ATL08/006/2023/06/21/ATL08_20230621235543_00272011_006_02.h5"
fsspec_caching = {
"cache_type": "blockcache",
"block_size": 8 * 1024 * 1024,
}
s3_fsspec_requesterpays = fsspec.filesystem("s3", requester_pays=True, anon=False)
ds = xr.open_dataset(
s3_fsspec_requesterpays.open(nsidc_object, "rb", **fsspec_caching),
group="gt1l/land_segments",
engine="h5netcdf",
phony_dims="sort",
decode_times=True,
)
ds
[6]:
<xarray.Dataset> Size: 5MB
Dimensions: (delta_time: 21468, ds_geosegments: 5, ds_surf_type: 5)
Coordinates:
* delta_time (delta_time) datetime64[ns] 172kB 2023-06-21T23:55:51....
latitude (delta_time) float32 86kB ...
longitude (delta_time) float32 86kB ...
Dimensions without coordinates: ds_geosegments, ds_surf_type
Data variables: (12/41)
asr (delta_time) float32 86kB ...
atlas_pa (delta_time) float32 86kB ...
beam_azimuth (delta_time) float32 86kB ...
beam_coelev (delta_time) float32 86kB ...
brightness_flag (delta_time) float32 86kB ...
cloud_flag_atm (delta_time) float32 86kB ...
... ...
snr (delta_time) float32 86kB ...
solar_azimuth (delta_time) float32 86kB ...
solar_elevation (delta_time) float32 86kB ...
surf_type (delta_time, ds_surf_type) int8 107kB ...
terrain_flg (delta_time) float64 172kB ...
urban_flag (delta_time) float64 172kB ...
Attributes:
Description: Contains data categorized as land at 100 meter intervals.
data_rate: Data are stored as aggregates of 100 meters.