{ "cells": [ { "cell_type": "markdown", "id": "73d8bb73-1c20-4ead-8fbd-3f732d016f55", "metadata": {}, "source": [ "# Direct DAAC S3 Bucket Access from HUB " ] }, { "cell_type": "markdown", "id": "59449000-06bc-4641-87ce-527ff2dcb2a2", "metadata": {}, "source": [ "\n", "Authors: Harshini Girish(UAH), Alex Mandel (Development Seed), Brian Freitag (NASA MSFC), Jamison French (Development Seed)\n", "\n", "Updated: Dec 8, 2025\n", "\n", "Description: In this tutorial, we demonstrate how to assume the MAAP data reader role to access specific DAAC buckets.\n", "\n", "***This tutorial demonstrates an experimental feature to allow access to DAACs without using EarthDataLogin***.\n", "\n", "This method currently works for a select number of DAACs and their EarthDataCloud datasets which are stored in AWS S3:\n", "- [GES DISC](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=Goddard%2BEarth%2BSciences%2BData%2Band%2BInformation%2BServices%2BCenter%2B%2528GES%2BDISC%2529)\n", "- [LPDAAC](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=Land%2BProcess%2BDistributed%2BActive%2BArchive%2BCenter%2B%2528LPDAAC%2529)\n", "- [NSIDC](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=National%2BSnow%2Band%2BIce%2BData%2BCenter%2BDistributed%2BActive%2BArchive%2BCenter%2B%2528NSIDC%2BDAAC%2529)\n" ] }, { "cell_type": "markdown", "id": "75b72e12-7e76-4e14-9443-cd0d5fd0c63b", "metadata": {}, "source": [ "## Run This Notebook\n", "To access and run this tutorial within MAAP hub, please refer to the [\"Getting started with the MAAP\"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.\n", "\n", "Disclaimer: this tutorial **must** be run within MAAP's hub to assume the necessary permissions. This tutorial was tested using the **vanilla** workspace image. If you encounter issues with the installs, ensure you have the latest version of pip installed." ] }, { "cell_type": "markdown", "id": "c9d69df1-0490-45a7-83e3-49055e338156", "metadata": {}, "source": [ "## Additional Resources\n", "- [Searching Granules in CMR](https://docs.maap-project.org/en/latest/technical_tutorials/search/granules.html)\n", "- [Searching Collections in CMR](https://docs.maap-project.org/en/latest/technical_tutorials/search/collections.html)\n" ] }, { "cell_type": "markdown", "id": "428c1849-f4ee-460b-958b-498d85c6b834", "metadata": {}, "source": [ "## Importing Packages\n", "If the packages below are not installed already, uncomment the following cell." ] }, { "cell_type": "code", "execution_count": 1, "id": "d3b84a21-ff24-408b-ac25-31a05b42c578", "metadata": {}, "outputs": [], "source": [ "import os\n", "import boto3\n", "import fsspec\n", "import matplotlib.pyplot as plt\n", "import rasterio\n", "import xarray as xr\n", "import rioxarray as rxr\n", "from rasterio.session import AWSSession" ] }, { "cell_type": "markdown", "id": "4cfee179-4f95-401c-a2d8-2ed28a8b4b2e", "metadata": {}, "source": [ "## Access The Data\n", "We'll create a couple helper functions to setup the assumed role session and view the data." ] }, { "cell_type": "code", "execution_count": 2, "id": "9a60acef-a850-44cc-9879-aa210602b106", "metadata": {}, "outputs": [], "source": [ "def hub_boto3_session(region_name: str | None = None) -> boto3.Session:\n", " return boto3.Session(region_name=region_name)\n", "\n", "def hub_boto3_client(service_name: str, region_name: str | None = None):\n", " return hub_boto3_session(region_name).client(service_name)\n", "\n", "def fsspec_access_hub(requester_pays: bool = False):\n", " return fsspec.filesystem(\"s3\", requester_pays=requester_pays, anon=False)\n", "\n", "def rasterio_access_hub(requester_pays: bool = False, region_name: str | None = None):\n", " boto_sess = hub_boto3_session(region_name=region_name)\n", " aws = AWSSession(session=boto_sess, requester_pays=requester_pays)\n", " if requester_pays:\n", " os.environ[\"AWS_REQUEST_PAYER\"] = \"requester\"\n", " return rasterio.Env(aws)\n", "\n", "def assume_role_credentials(ssm_parameter_name: str | None = None):\n", " return None\n", "\n", "def fsspec_access(credentials=None, requester_pays: bool = False):\n", " return fsspec_access_hub(requester_pays=requester_pays)\n", "\n", "def rasterio_access(credentials=None, requester_pays: bool = False, region_name: str | None = None):\n", " return rasterio_access_hub(requester_pays=requester_pays, region_name=region_name)\n" ] }, { "cell_type": "markdown", "id": "a294de63-7a25-4789-ab34-0b153a5c6248", "metadata": {}, "source": [ "## Accessing GES DISC, LP DAAC and NSIDC Requester Pays Buckets\n", "\n", "Some NASA DAACs, such as GES DISC, LP DAAC and NSIDC, expose protected data in S3 buckets that use the *Requester Pays* model. On the MAAP Hub, your AWS credentials are already provided by the environment, so you do **not** need to call `aws sts assume-role`. To read from these buckets you only need to indicate that you accept requester-pays charges by setting `AWS_REQUEST_PAYER=\"requester\"` and creating your `fsspec` / `rasterio` S3 clients with `requester_pays=True`, as shown in the example below.\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "df10887d-d209-45ae-a7c8-80a0ff977d46", "metadata": {}, "outputs": [], "source": [ "os.environ[\"AWS_REQUEST_PAYER\"] = \"requester\"\n", "fspec_requesterpays = fsspec.filesystem(\"s3\", requester_pays=True, anon=False)\n", "hub_session = boto3.Session() \n", "s3_rasterio_requesterpays = rasterio.Env(\n", " AWSSession(session=hub_session, requester_pays=True)\n", ")" ] }, { "cell_type": "markdown", "id": "f06d989d-01af-4ae1-973c-15c36ae185da", "metadata": {}, "source": [ "### LP DAAC Access\n", "We can use rasterio to directly inspect our TIF objects." ] }, { "cell_type": "code", "execution_count": 4, "id": "80af8d0b-2eab-4aa5-846e-68c996ab67d2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Width: 3660\n", "Height: 3660\n", "Bounds: BoundingBox(left=399960.0, bottom=-3309780.0, right=509760.0, top=-3199980.0)\n", "CRS: EPSG:32656\n", "Count: 1\n", "Data type: ('int16',)\n" ] } ], "source": [ "# LP DAAC Access \n", "lp_object = \"s3://lp-prod-protected/HLSL30.020/HLS.L30.T56JMN.2023225T234225.v2.0/HLS.L30.T56JMN.2023225T234225.v2.0.B02.tif\"\n", "\n", "with s3_rasterio_requesterpays:\n", " with rasterio.open(lp_object) as src:\n", " print(f'Width: {src.width}')\n", " print(f'Height: {src.height}')\n", " print(f'Bounds: {src.bounds}')\n", " print(f'CRS: {src.crs}')\n", " print(f'Count: {src.count}')\n", " print(f'Data type: {src.dtypes}')\n" ] }, { "cell_type": "markdown", "id": "d214198e-4045-4dcf-95b8-a13a68d74255", "metadata": {}, "source": [ "### GES DISC Access" ] }, { "cell_type": "code", "execution_count": 5, "id": "253806fe-eba2-4194-a09e-e3d043ad63f0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.DataArray (band: 1, y: 14400, x: 43200)> Size: 2GB\n",
"[622080000 values with dtype=float32]\n",
"Coordinates:\n",
" * band (band) int64 8B 1\n",
" * x (x) float64 346kB -180.0 -180.0 -180.0 ... 180.0 180.0 180.0\n",
" * y (y) float64 115kB 60.0 59.99 59.98 ... -59.98 -59.99 -60.0\n",
" spatial_ref int64 8B 0\n",
"Attributes:\n",
" AREA_OR_POINT: Area\n",
" STATISTICS_MAXIMUM: 2\n",
" STATISTICS_MEAN: nan\n",
" STATISTICS_MINIMUM: 0\n",
" STATISTICS_STDDEV: nan\n",
" scale_factor: 1.0\n",
" add_offset: 0.0<xarray.Dataset> Size: 5MB\n",
"Dimensions: (delta_time: 21468, ds_geosegments: 5, ds_surf_type: 5)\n",
"Coordinates:\n",
" * delta_time (delta_time) datetime64[ns] 172kB 2023-06-21T23:55:51....\n",
" latitude (delta_time) float32 86kB ...\n",
" longitude (delta_time) float32 86kB ...\n",
"Dimensions without coordinates: ds_geosegments, ds_surf_type\n",
"Data variables: (12/41)\n",
" asr (delta_time) float32 86kB ...\n",
" atlas_pa (delta_time) float32 86kB ...\n",
" beam_azimuth (delta_time) float32 86kB ...\n",
" beam_coelev (delta_time) float32 86kB ...\n",
" brightness_flag (delta_time) float32 86kB ...\n",
" cloud_flag_atm (delta_time) float32 86kB ...\n",
" ... ...\n",
" snr (delta_time) float32 86kB ...\n",
" solar_azimuth (delta_time) float32 86kB ...\n",
" solar_elevation (delta_time) float32 86kB ...\n",
" surf_type (delta_time, ds_surf_type) int8 107kB ...\n",
" terrain_flg (delta_time) float64 172kB ...\n",
" urban_flag (delta_time) float64 172kB ...\n",
"Attributes:\n",
" Description: Contains data categorized as land at 100 meter intervals.\n",
" data_rate: Data are stored as aggregates of 100 meters.