{ "cells": [ { "cell_type": "markdown", "id": "4eda0aba-e51c-4373-9148-412f56ddd7ba", "metadata": {}, "source": [ "# Direct DAAC S3 Bucket Access (BETA)\n", "Authors: Alex Mandel (Development Seed), Brian Freitag (NASA MSFC), Jamison French (Development Seed)\n", "\n", "Updated: October 16, 2025\n", "\n", "Description: In this tutorial, we demonstrate how to assume the MAAP data reader role to access specific DAAC buckets.\n", "\n", "***This tutorial demonstrates an experimental feature to allow access to DAACs without using EarthDataLogin***.\n", "\n", "This method currently works for a select number of DAACs and their EarthDataCloud datasets which are stored in AWS S3:\n", "- [GES DISC](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=Goddard%2BEarth%2BSciences%2BData%2Band%2BInformation%2BServices%2BCenter%2B%2528GES%2BDISC%2529)\n", "- [LPDAAC](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=Land%2BProcess%2BDistributed%2BActive%2BArchive%2BCenter%2B%2528LPDAAC%2529)\n", "- [NSIDC](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=National%2BSnow%2Band%2BIce%2BData%2BCenter%2BDistributed%2BActive%2BArchive%2BCenter%2B%2528NSIDC%2BDAAC%2529)\n", "- [ORNL](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=Oak%2BRidge%2BNational%2BLaboratory%2BDistributed%2BActive%2BArchive%2BCenter%2B%2528ORNL%2BDAAC%2529)\n", "- [PODAAC](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=Physical%2BOceanography%2BDistributed%2BActive%2BArchive%2BCenter%2B%2528PO.DAAC%2529)" ] }, { "cell_type": "markdown", "id": "da9c14ba-0b74-4a38-b108-80da160ed0bc", "metadata": {}, "source": [ "## Run This Notebook\n", "To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the [\"Getting started with the MAAP\"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.\n", "\n", "Disclaimer: this tutorial **must** be run within MAAP's ADE to assume the necessary permissions. This tutorial was tested using the **Pangeo** workspace image. If you encounter issues with the installs, ensure you have the latest version of pip installed." ] }, { "cell_type": "markdown", "id": "e6b04063-7cee-409a-a30a-879a3b837017", "metadata": {}, "source": [ "## Additional Resources\n", "- [Searching Granules in CMR](../search/granules.ipynb)\n", "- [Searching Collections in CMR](../search/granules.ipynb)\n", "- [Package: fsspec s3fs](https://s3fs.readthedocs.io/en/latest/)" ] }, { "cell_type": "markdown", "id": "3aef8bdf-7876-48d8-8a75-61e7f9aadc92", "metadata": {}, "source": [ "## Importing Packages\n", "If the packages below are not installed already, uncomment the following cell." ] }, { "cell_type": "code", "execution_count": 1, "id": "e36fdec3-6d4d-4f8c-8d66-b65e875e1808", "metadata": { "tags": [] }, "outputs": [], "source": [ "import boto3\n", "import fsspec\n", "import xarray\n", "import rioxarray\n", "import matplotlib.pyplot as plt\n", "import rasterio\n", "from rasterio.session import AWSSession" ] }, { "cell_type": "markdown", "id": "b30f64c7-93f8-4d6f-bafe-4d359eff58b5", "metadata": {}, "source": [ "## Access The Data\n", "We'll create a couple helper functions to setup the assumed role session and view the data." ] }, { "cell_type": "code", "execution_count": 2, "id": "f9211edc-62c8-411b-a80f-72b29d0500a9", "metadata": { "tags": [] }, "outputs": [], "source": [ "def assume_role_credentials(ssm_parameter_name):\n", " # Create a session using your current credentials\n", " session = boto3.Session()\n", "\n", " # Retrieve the SSM parameter\n", " ssm = session.client('ssm', \"us-west-2\")\n", " parameter = ssm.get_parameter(\n", " Name=ssm_parameter_name, \n", " WithDecryption=True\n", " )\n", " parameter_value = parameter['Parameter']['Value']\n", "\n", " # Assume the DAAC access role\n", " sts = session.client('sts')\n", " assumed_role_object = sts.assume_role(\n", " RoleArn=parameter_value,\n", " RoleSessionName='TutorialSession'\n", " )\n", "\n", " # From the response that contains the assumed role, get the temporary \n", " # credentials that can be used to make subsequent API calls\n", " credentials = assumed_role_object['Credentials']\n", "\n", " return credentials\n", "\n", "# We can pass assumed role credentials into fsspec\n", "def fsspec_access(credentials, requester_pays=False):\n", " return fsspec.filesystem(\n", " \"s3\",\n", " key=credentials['AccessKeyId'],\n", " secret=credentials['SecretAccessKey'],\n", " token=credentials['SessionToken'],\n", " requester_pays=requester_pays\n", " )\n", "\n", "# We can also pass assumed role credentials into rasterio AWSSession\n", "def rasterio_access(credentials, requester_pays=False):\n", " aws_session = AWSSession(\n", " aws_access_key_id=credentials['AccessKeyId'],\n", " aws_secret_access_key=credentials['SecretAccessKey'],\n", " aws_session_token=credentials['SessionToken'],\n", " requester_pays=requester_pays\n", " \n", " )\n", " return rasterio.Env(aws_session)" ] }, { "cell_type": "markdown", "id": "d61d82c6-afcd-475c-89b5-e750496d8d73", "metadata": {}, "source": [ "## Accessing GES DISC, LP DAAC and NSIDC Requester Pays Buckets \n", "\n", "All DAACs support Temporary S3 Credentials through Earthdata Login (EDL) authentication.\n", "\n", "Some NASA DAACs, such as GES DISC, LP DAAC and NSIDC also store protected data in S3 buckets that use the `Requester Pays` model. To access these datasets, you need temporary AWS credentials and must explicitly declare that you agree to pay for data access. Below is an example of how to authenticate using `aws sts assume-role` and access a file using `gdalinfo`.\n", "\n" ] }, { "cell_type": "raw", "id": "4c8318f9-0184-415e-bc2b-9062d8f2639a", "metadata": {}, "source": [ "\n", "\n", "export $(printf \"AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s\" \\\n", " $(aws sts assume-role \\\n", " --role-arn arn:aws:iam::884094767067:role/maap-data-reader \\\n", " --role-session-name GESDISC \\\n", " --query \"Credentials.[AccessKeyId,SecretAccessKey,SessionToken]\" \\\n", " --output text))\n", "\n", "export AWS_REQUEST_PAYER=requester\n", "\n", "gdalinfo /vsis3/gesdisc-cumulus-prod-protected/Landslide/Global_Landslide_Nowcast.1.1/2020/Global_Landslide_Nowcast_v1.1_20201231.tif" ] }, { "cell_type": "markdown", "id": "0fc5dd4d-2913-4810-a026-97e8be98ee7f", "metadata": {}, "source": [ "Initialize the assumed role sessions" ] }, { "cell_type": "code", "execution_count": 3, "id": "423ba777-7bb1-49e2-8ced-3cb490430601", "metadata": { "tags": [] }, "outputs": [], "source": [ "s3_fsspec_requesterpays = fsspec_access(assume_role_credentials(\"/iam/maap-data-reader\"), True)\n", "s3_rasterio_requesterpays = rasterio_access(assume_role_credentials(\"/iam/maap-data-reader\"), True)" ] }, { "cell_type": "markdown", "id": "2605f8ab-8912-480a-83df-12b206146580", "metadata": {}, "source": [ "### LP DAAC Access\n", "We can use rasterio to directly inspect our TIF objects." ] }, { "cell_type": "code", "execution_count": 4, "id": "5fe56fb8-943b-425f-a93f-10db4fb4dcd5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Width: 3660\n", "Height: 3660\n", "Bounds: BoundingBox(left=399960.0, bottom=-3309780.0, right=509760.0, top=-3199980.0)\n", "CRS: EPSG:32656\n", "Count: 1\n", "Data type: ('int16',)\n" ] } ], "source": [ "lp_object = \"s3://lp-prod-protected/HLSL30.020/HLS.L30.T56JMN.2023225T234225.v2.0/HLS.L30.T56JMN.2023225T234225.v2.0.B11.tif\"\n", "\n", "with s3_rasterio_requesterpays:\n", " with rasterio.open(lp_object) as src:\n", " print(f'Width: {src.width}')\n", " print(f'Height: {src.height}')\n", " print(f'Bounds: {src.bounds}')\n", " print(f'CRS: {src.crs}')\n", " print(f'Count: {src.count}')\n", " print(f'Data type: {src.dtypes}')" ] }, { "cell_type": "markdown", "id": "ed2c9930-f319-4706-a2b5-29e7f9f1df43", "metadata": {}, "source": [ "### GES DISC Access" ] }, { "cell_type": "code", "execution_count": 5, "id": "1da22ead-0bf4-49a6-a620-ac7f78d588e8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (band: 1, y: 14400, x: 43200)> Size: 622MB\n",
       "[622080000 values with dtype=uint8]\n",
       "Coordinates:\n",
       "  * band         (band) int64 8B 1\n",
       "  * x            (x) float64 346kB -180.0 -180.0 -180.0 ... 180.0 180.0 180.0\n",
       "  * y            (y) float64 115kB 60.0 59.99 59.98 ... -59.98 -59.99 -60.0\n",
       "    spatial_ref  int64 8B 0\n",
       "Attributes:\n",
       "    AREA_OR_POINT:       Area\n",
       "    STATISTICS_MAXIMUM:  2\n",
       "    STATISTICS_MEAN:     nan\n",
       "    STATISTICS_MINIMUM:  0\n",
       "    STATISTICS_STDDEV:   nan\n",
       "    _FillValue:          255\n",
       "    scale_factor:        1.0\n",
       "    add_offset:          0.0
" ], "text/plain": [ " Size: 622MB\n", "[622080000 values with dtype=uint8]\n", "Coordinates:\n", " * band (band) int64 8B 1\n", " * x (x) float64 346kB -180.0 -180.0 -180.0 ... 180.0 180.0 180.0\n", " * y (y) float64 115kB 60.0 59.99 59.98 ... -59.98 -59.99 -60.0\n", " spatial_ref int64 8B 0\n", "Attributes:\n", " AREA_OR_POINT: Area\n", " STATISTICS_MAXIMUM: 2\n", " STATISTICS_MEAN: nan\n", " STATISTICS_MINIMUM: 0\n", " STATISTICS_STDDEV: nan\n", " _FillValue: 255\n", " scale_factor: 1.0\n", " add_offset: 0.0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ges_disc_object = \"s3://gesdisc-cumulus-prod-protected/Landslide/Global_Landslide_Nowcast.1.1/2020/Global_Landslide_Nowcast_v1.1_20201231.tif\"\n", "\n", "with s3_fsspec_requesterpays.open(ges_disc_object) as obj:\n", " data_array = rioxarray.open_rasterio(obj)\n", "data_array" ] }, { "cell_type": "markdown", "id": "35cd1296-fa27-46ef-a6bc-8474a0046eb4", "metadata": {}, "source": [ "### NSIDC DAAC Access" ] }, { "cell_type": "markdown", "id": "28d2744a-a09e-482e-90fc-e5debcacb3c7", "metadata": {}, "source": [ "Initialize the assumed role session." ] }, { "cell_type": "code", "execution_count": 6, "id": "f2bb73b3-9cb0-4472-bac6-1f05f83027d7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 5MB\n",
       "Dimensions:            (delta_time: 21468, ds_geosegments: 5, ds_surf_type: 5)\n",
       "Coordinates:\n",
       "  * delta_time         (delta_time) datetime64[ns] 172kB 2023-06-21T23:55:51....\n",
       "    latitude           (delta_time) float32 86kB ...\n",
       "    longitude          (delta_time) float32 86kB ...\n",
       "Dimensions without coordinates: ds_geosegments, ds_surf_type\n",
       "Data variables: (12/41)\n",
       "    asr                (delta_time) float32 86kB ...\n",
       "    atlas_pa           (delta_time) float32 86kB ...\n",
       "    beam_azimuth       (delta_time) float32 86kB ...\n",
       "    beam_coelev        (delta_time) float32 86kB ...\n",
       "    brightness_flag    (delta_time) float32 86kB ...\n",
       "    cloud_flag_atm     (delta_time) float32 86kB ...\n",
       "    ...                 ...\n",
       "    snr                (delta_time) float32 86kB ...\n",
       "    solar_azimuth      (delta_time) float32 86kB ...\n",
       "    solar_elevation    (delta_time) float32 86kB ...\n",
       "    surf_type          (delta_time, ds_surf_type) int8 107kB ...\n",
       "    terrain_flg        (delta_time) float64 172kB ...\n",
       "    urban_flag         (delta_time) float64 172kB ...\n",
       "Attributes:\n",
       "    Description:  Contains data categorized as land at 100 meter intervals.\n",
       "    data_rate:    Data are stored as aggregates of 100 meters.
" ], "text/plain": [ " Size: 5MB\n", "Dimensions: (delta_time: 21468, ds_geosegments: 5, ds_surf_type: 5)\n", "Coordinates:\n", " * delta_time (delta_time) datetime64[ns] 172kB 2023-06-21T23:55:51....\n", " latitude (delta_time) float32 86kB ...\n", " longitude (delta_time) float32 86kB ...\n", "Dimensions without coordinates: ds_geosegments, ds_surf_type\n", "Data variables: (12/41)\n", " asr (delta_time) float32 86kB ...\n", " atlas_pa (delta_time) float32 86kB ...\n", " beam_azimuth (delta_time) float32 86kB ...\n", " beam_coelev (delta_time) float32 86kB ...\n", " brightness_flag (delta_time) float32 86kB ...\n", " cloud_flag_atm (delta_time) float32 86kB ...\n", " ... ...\n", " snr (delta_time) float32 86kB ...\n", " solar_azimuth (delta_time) float32 86kB ...\n", " solar_elevation (delta_time) float32 86kB ...\n", " surf_type (delta_time, ds_surf_type) int8 107kB ...\n", " terrain_flg (delta_time) float64 172kB ...\n", " urban_flag (delta_time) float64 172kB ...\n", "Attributes:\n", " Description: Contains data categorized as land at 100 meter intervals.\n", " data_rate: Data are stored as aggregates of 100 meters." ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nsidc_object = \"s3://nsidc-cumulus-prod-protected/ATLAS/ATL08/006/2023/06/21/ATL08_20230621235543_00272011_006_02.h5\"\n", "\n", "fsspec_caching = {\n", " \"cache_type\": \"blockcache\", # block cache stores blocks of fixed size and uses eviction using a LRU strategy.\n", " \"block_size\": 8\n", " * 1024\n", " * 1024, # size in bytes per block, adjust depends on the file size but the recommended size is in the MB\n", "}\n", "ds = xarray.open_dataset(\n", " s3_fsspec_requesterpays.open(nsidc_object, \"rb\", **fsspec_caching),\n", " group=\"gt1l/land_segments\",\n", " engine=\"h5netcdf\",\n", " decode_times=True\n", ")\n", "ds" ] }, { "cell_type": "markdown", "id": "77b8e97d-0eb0-47c8-a1f6-9c7a2ea4e282", "metadata": {}, "source": [ "## Accessing Standard DAAC Buckets" ] }, { "cell_type": "markdown", "id": "87d7a738-d341-4239-9618-235eb8452c42", "metadata": {}, "source": [ "Initialize the assumed role sessions" ] }, { "cell_type": "code", "execution_count": 10, "id": "de28e852-1cbf-435c-8b71-dcc6c9926070", "metadata": {}, "outputs": [], "source": [ "s3_fsspec = fsspec_access(assume_role_credentials(\"/iam/maap-data-reader\"))" ] }, { "cell_type": "markdown", "id": "435fc7d9-64d1-470b-bf3f-5adbe2242b99", "metadata": {}, "source": [ "### ORNL DAAC Access\n", "We can also use `rioxarray` to inspect our TIF objects." ] }, { "cell_type": "code", "execution_count": 12, "id": "228f1363-ff38-41d5-9945-1ad6c402ea3a", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (band: 1, y: 14616, x: 34704)> Size: 2GB\n",
       "[507233664 values with dtype=float32]\n",
       "Coordinates:\n",
       "  * band         (band) int64 8B 1\n",
       "  * x            (x) float64 278kB -1.737e+07 -1.737e+07 ... 1.737e+07 1.737e+07\n",
       "  * y            (y) float64 117kB 7.314e+06 7.313e+06 ... -7.313e+06 -7.314e+06\n",
       "    spatial_ref  int64 8B 0\n",
       "Attributes:\n",
       "    AREA_OR_POINT:  Area\n",
       "    _FillValue:     -9999.0\n",
       "    scale_factor:   1.0\n",
       "    add_offset:     0.0
" ], "text/plain": [ " Size: 2GB\n", "[507233664 values with dtype=float32]\n", "Coordinates:\n", " * band (band) int64 8B 1\n", " * x (x) float64 278kB -1.737e+07 -1.737e+07 ... 1.737e+07 1.737e+07\n", " * y (y) float64 117kB 7.314e+06 7.313e+06 ... -7.313e+06 -7.314e+06\n", " spatial_ref int64 8B 0\n", "Attributes:\n", " AREA_OR_POINT: Area\n", " _FillValue: -9999.0\n", " scale_factor: 1.0\n", " add_offset: 0.0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ornl_object = \"s3://ornl-cumulus-prod-protected/gedi/GEDI_L4B_Gridded_Biomass_V2_1/data/GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif\"\n", "\n", "with s3_fsspec.open(ornl_object) as obj:\n", " data_array = rioxarray.open_rasterio(obj)\n", "data_array" ] }, { "cell_type": "markdown", "id": "b528fa92-714a-4577-b613-7b641326c5fa", "metadata": {}, "source": [ "### PO DAAC Access" ] }, { "cell_type": "code", "execution_count": 13, "id": "3bcc788f-6e0b-4c8f-bd94-e1e232109ad3", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (band: 1, y: 3660, x: 3660)> Size: 13MB\n",
       "[13395600 values with dtype=uint8]\n",
       "Coordinates:\n",
       "  * band         (band) int64 8B 1\n",
       "  * x            (x) float64 29kB 5e+05 5e+05 5.001e+05 ... 6.097e+05 6.098e+05\n",
       "  * y            (y) float64 29kB -4.8e+06 -4.8e+06 ... -4.91e+06 -4.91e+06\n",
       "    spatial_ref  int64 8B 0\n",
       "Attributes: (12/48)\n",
       "    ACCODE:                                                                  ...\n",
       "    AEROSOL_CLASS_REMAPPING_ENABLED:                                         ...\n",
       "    AEROSOL_NOT_WATER_TO_HIGH_CONF_WATER_FMASK_VALUES:                       ...\n",
       "    AEROSOL_PARTIAL_SURFACE_AGGRESSIVE_TO_HIGH_CONF_WATER_FMASK_VALUES:      ...\n",
       "    AEROSOL_PARTIAL_SURFACE_WATER_CONSERVATIVE_TO_HIGH_CONF_WATER_FMASK_VALUE...\n",
       "    AEROSOL_WATER_MODERATE_CONF_TO_HIGH_CONF_WATER_FMASK_VALUES:             ...\n",
       "    ...                                                                          ...\n",
       "    WORLDCOVER_SOURCE:                                                       ...\n",
       "    AREA_OR_POINT:                                                           ...\n",
       "    _FillValue:                                                              ...\n",
       "    scale_factor:                                                            ...\n",
       "    add_offset:                                                              ...\n",
       "    long_name:                                                               ...
" ], "text/plain": [ " Size: 13MB\n", "[13395600 values with dtype=uint8]\n", "Coordinates:\n", " * band (band) int64 8B 1\n", " * x (x) float64 29kB 5e+05 5e+05 5.001e+05 ... 6.097e+05 6.098e+05\n", " * y (y) float64 29kB -4.8e+06 -4.8e+06 ... -4.91e+06 -4.91e+06\n", " spatial_ref int64 8B 0\n", "Attributes: (12/48)\n", " ACCODE: ...\n", " AEROSOL_CLASS_REMAPPING_ENABLED: ...\n", " AEROSOL_NOT_WATER_TO_HIGH_CONF_WATER_FMASK_VALUES: ...\n", " AEROSOL_PARTIAL_SURFACE_AGGRESSIVE_TO_HIGH_CONF_WATER_FMASK_VALUES: ...\n", " AEROSOL_PARTIAL_SURFACE_WATER_CONSERVATIVE_TO_HIGH_CONF_WATER_FMASK_VALUE...\n", " AEROSOL_WATER_MODERATE_CONF_TO_HIGH_CONF_WATER_FMASK_VALUES: ...\n", " ... ...\n", " WORLDCOVER_SOURCE: ...\n", " AREA_OR_POINT: ...\n", " _FillValue: ...\n", " scale_factor: ...\n", " add_offset: ...\n", " long_name: ..." ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "po_object = \"s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T55GEM_20230813T235239Z_20230815T154108Z_S2B_30_v1.0_B01_WTR.tif\"\n", "with s3_fsspec.open(po_object) as obj:\n", " data_array = rioxarray.open_rasterio(obj)\n", "data_array" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.10" } }, "nbformat": 4, "nbformat_minor": 5 }