{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Searching for Collections in NASA's Operational CMR using maap-py\n", "\n", "Authors: Samuel Ayers (UAH), Aimee Barciauskas (Development Seed), Alex Mandel (Development Seed)\n", "\n", "Date: November 2, 2020\n", "\n", "Description: These examples walk through the MAAP API functionality of searching for collections within NASA's Common Metadata Repository (CMR) based on specific parameters. Collections are groupings of files that share the same product specification. Searching for collections can be useful for finding individual files, known as granules, which are used for processing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run This Notebook\n", "To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the [\"Getting started with the MAAP\"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.\n", "\n", "Disclaimer: it is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Additional Resources\n", "- [NASA's CMR API Documentation](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing and Installing Packages\n", "\n", "We begin by importing the `maap` package and creating a new MAAP class." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# import the MAAP package to handle queries\n", "from maap.maap import MAAP\n", "\n", "# import printing package to help display outputs\n", "from pprint import pprint\n", "\n", "# invoke the MAAP search client\n", "maap = MAAP()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## About searchCollection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the `maap.searchCollection` function to return a list of desired collections. Before using this function, let's use the `help` function to view the specific arguments and keywords for `maap.searchCollection`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on method searchCollection in module maap.maap:\n", "\n", "searchCollection(limit=100, **kwargs) method of maap.maap.MAAP instance\n", " Search the CMR collections\n", " :param limit: limit of the number of results\n", " :param kwargs: search parameters\n", " :return: list of results ()\n", "\n" ] } ], "source": [ "# view help for the searchCollection function\n", "help(maap.searchCollection)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The help text is showing that `maap.searchCollection` accepts a limit and search parameters. The limit parameter limits the number of resulting collections returned by `maap.searchCollection`. Note that `limit=100` means that the *default limit* for results from the MAAP API is 100. `maap.searchCollection` accepts any additional search parameters that are included in the CMR. For a list of accepted parameters, please refer to the [CMR Search Collections API reference](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#collection-search-by-parameters)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example we will explore search options including:\n", "\n", "1. Finding all Collections\n", "2. Searching by temporal filter\n", "3. Searching by spatial filter\n", "4. Using the results from one search as inputs into another\n", "5. Searching by additional attributes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding all Collections\n", "\n", "Here we will demonstrate how to create a list containing all of the collections contained within the CMR. To do this, we will use the `maap.searchCollection` function without any additional search parameters. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'Got 100 results'\n" ] } ], "source": [ "# search all collections\n", "results = maap.searchCollection(cmr_host=\"cmr.earthdata.nasa.gov\")\n", "\n", "# print the number of collections\n", "pprint(f'Got {len(results)} results')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We get 100 results because of the default page limit. The result from the MAAP API is a list of collections where each element in the list is the metadata for that particular collection. To change the limit, type `limit=` and then a value within the parentheses after `maap.searchCollection()`.\n", "\n", "Let's look at the metadata for the first collection in our list of results (`results[0]`) using `pprint`. For formatting purposes, we can use the `depth` parameter to control the number of levels of metadata detail to display. By default, there is no constraint on the depth. By setting a `depth` parameter (in this case `depth=2`), we can ensure that the next contained level is replaced by an ellipsis." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Collection': {'AssociatedBrowseImageUrls': {...},\n", " 'CollectionState': 'COMPLETE',\n", " 'Contacts': {...},\n", " 'DOI': {...},\n", " 'DataSetId': \"'Latent reserves' within the Swiss NFI\",\n", " 'Description': 'The files refer to the data used in Portier et '\n", " 'al. \"u2018Latent reservesu2019: a hidden '\n", " 'treasure in National Forest Inventories\" '\n", " '(2020) *Journal of Ecology*. '\n", " \"**'Latent reserves'** are defined as plots in \"\n", " 'National Forest Inventories (NFI) that have '\n", " 'been free of human influence for >40 to >70 '\n", " 'years. They can be used to investigate and '\n", " 'acquire a deeper understanding of attributes '\n", " 'and processes of near-natural forests using '\n", " 'existing long-term data. To determine which '\n", " 'NFI sample plots could be considered '\n", " 'u2018latent reservesu2019, criteria were '\n", " 'defined based on the information available in '\n", " 'the Swiss NFI database: * Shrub '\n", " 'forests were excluded. * Plots must have been '\n", " 'free of any kind of management, including '\n", " 'salvage logging or sanitary cuts, for a '\n", " 'minimum amount of time. Thresholds of 40, 50, '\n", " '60 and 70 years without intervention were '\n", " 'tested. * To ensure that species composition '\n", " 'was not influenced by past management, plots '\n", " 'where potential vegetation was classified as '\n", " 'deciduous by Ellenberg & Klu00f6tzli (1972) '\n", " 'had to have an observed proportion of '\n", " 'deciduous trees matching the theoretical '\n", " 'proportion expected in a natural deciduous '\n", " 'forest, as defined by Kienast, Brzeziecki, & '\n", " 'Wildi (1994). * Plots had to originate from '\n", " 'natural regeneration. * Intensive livestock '\n", " 'grazing must never have occurred on the '\n", " 'plots. The tables stored here were '\n", " 'derived from the first, second and third '\n", " 'campaigns of the Swiss NFI. The raw data from '\n", " 'the Swiss NFI can be provided free of charge '\n", " 'within the scope of a contractual agreement '\n", " '(http://www.lfi.ch/dienstleist/daten-en.php). '\n", " \"**** The files 'Data figure 2' to 'Data \"\n", " \"figure 8' are publicly available and contain \"\n", " 'the data used to produce the figures published '\n", " \"in the paper. The files 'Plot-level data \"\n", " \"for characterisation of 'latent reserves' and \"\n", " \"'Tree-level data for characterisation of \"\n", " \"'latent reserves' contain all the data \"\n", " 'required to reproduce the section of the '\n", " 'article concerning the characterisation of '\n", " \"'latent reserves' and the comparison to \"\n", " \"managed forests. The file 'Data for mortality \"\n", " \"analyses' contains the data required to \"\n", " 'reproduce the section of the article '\n", " \"concerning tree mortality in 'latent \"\n", " \"reserves'. The access to these three files is \"\n", " 'restricted as they contain some raw data from '\n", " 'the Swiss NFI, submitted to the Swiss law and '\n", " 'only accessible upon contractual agreement. ',\n", " 'InsertTime': '2020-07-29T14:18:59.791Z',\n", " 'LastUpdate': '2021-02-04T04:39:30.512Z',\n", " 'LongName': 'Not provided',\n", " 'OnlineResources': {...},\n", " 'Platforms': {...},\n", " 'ProcessingLevelId': 'Not provided',\n", " 'RestrictionComment': 'Access to the data upon request',\n", " 'RevisionDate': '2021-02-04T04:39:30.512Z',\n", " 'ScienceKeywords': {...},\n", " 'ShortName': 'latent-reserves-in-the-swiss-nfi',\n", " 'Spatial': {...},\n", " 'Temporal': {...},\n", " 'UseConstraints': {...},\n", " 'VersionId': '1.0'},\n", " 'concept-id': 'C1931110427-SCIOPS',\n", " 'format': 'application/echo10+xml',\n", " 'revision-id': '6'}\n" ] } ], "source": [ "# print the metadata for the first collection\n", "# we use the depth parameter to set the layer of metadata detail in the results, with (1) having the least detail\n", "# (1) displays the concept ID, format, and revision ID\n", "# adjust the depth to a larger value (6) if you would like to view all of the metadata\n", "pprint(results[0], depth=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Collection` key has all of the collection information including attributes, the archive center, spatial, and temporal information. The `concept-id` is a unique identifier for this collection. It can be used to further refine search results from the CMR, such as when searching for granule information." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Searching by Temporal Filter\n", "\n", "Here we use a temporal filter to narrow down our results using the `temporal` keyword in our search. The temporal keyword takes datetime information in a [specific format](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#c-temporal). The date format used is `YYYY-MM-DDThh:mm:ssZ` and temporal search criteria may be either a single date or a date range. If one date is provided then it can be inferred as the start or end date. To define a start date and return all collections after the date, put a comma after the date (`YYYY-MM-DDThh:mm:ssZ,`). To define a end date and return all granules prior to the data, put a comma before the date (`,YYYY-MM-DDThh:mm:ssZ`). Lastly, to get a date range, provide the start date and end date separated by a comma (`YYYY-MM-DDThh:mm:ssZ,YYYY-MM-DDThh:mm:ssZ`).\n", "\n", "In this example we will search for one month of data." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'Got 100 results'\n" ] } ], "source": [ "datetimeRange = '2000-01-01T00:00:00Z,2000-01-31T23:59:59Z' # specify datetime range to search for data in January 2000\n", "\n", "results = maap.searchCollection(\n", " cmr_host = \"cmr.earthdata.nasa.gov\",\n", " temporal = datetimeRange\n", ")\n", "pprint(f'Got {len(results)} results')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'Collection ERS-2_L0 was acquired starting at 1995-10-01T03:13:03.000Z'\n" ] } ], "source": [ "collectionName = results[0]['Collection']['ShortName'] # get the collection short name\n", "collectionDate = results[0]['Collection']['Temporal']['RangeDateTime']['BeginningDateTime'] # get the collection start time\n", "\n", "pprint(\n", " f'Collection {collectionName} was acquired starting at {collectionDate}', width=100)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It appears the first result correctly matches with the beginning and ending temporal search parameters. Keep in mind that the results are limited to 100 so the final collection returned may not match the end date that was searched for." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Searching by Spatial Filter\n", "\n", "Here we will illustrate how to search for collections by a spatial filter. There are a couple of [spatial filters available to search by](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#c-spatial) in the CMR including point, line, polygon, and bounding box. In this example, we will explore filtering with a bounding box which is a sequence of four latitude and longitude values in the order of `[W,S,E,N]`. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'Got 100 results'\n" ] } ], "source": [ "collectionDomain = '-42,10,42,20' # specify bounding box to search by\n", "\n", "results = maap.searchCollection(\n", " cmr_host = \"cmr.earthdata.nasa.gov\", \n", " bounding_box = collectionDomain\n", ")\n", "pprint(f'Got {len(results)} results')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'Collection gov.noaa.nodc:0000029 was acquired within the following geometry: '\n", "{'BoundingRectangle': {'EastBoundingCoordinate': '-16.25',\n", " 'NorthBoundingCoordinate': '46.263167',\n", " 'SouthBoundingCoordinate': '0.766667',\n", " 'WestBoundingCoordinate': '-124.041667'},\n", " 'CoordinateSystem': 'CARTESIAN'}\n" ] } ], "source": [ "collectionName = results[3]['Collection']['ShortName'] # get a collection short name\n", "collectionGeometry = results[3]['Collection']['Spatial']['HorizontalSpatialDomain']['Geometry'] # grab the spatial information from collection\n", "\n", "pprint(f'Collection {collectionName} was acquired within the following geometry: ', width=100)\n", "pprint(collectionGeometry)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see from the first collection that the spatial coordinates of the collection intersect our search box." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" } }, "nbformat": 4, "nbformat_minor": 4 }