{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### An overview of the MAAP platform\n", "\n", "The MAAP is a cloud-based system to write science-analysis code and then run it at scale. This lets you keep all of the input and output data “in the cloud”.\n", "\n", "The typical work practice for a scientist is to start out by doing interactive analysis of some data using a Jupyter notebook in a MAAP \"workspace\". This might be with Python or with R, and may be in combination with other tools you also use.\n", "\n", "If the analysis that you are running needs to be scaled up to run over larger data sets, for example going from running analysis on a single city or country to a global analysis, then the notebook code needs to be converted into a standalone script and \"registered\" as a MAAP Algorithm. \n", "\n", "MAAP Algorithms can be executed using the Data Processing System (DPS) as a batch. A single execution is called a Job. Often batches of Jobs are run using a different Jupyter notebook (often called a \"control notebook\" or \"wrapper notebook\") and a Python library called maap.py that has helper-functions to execute and monitor Job execution. A graphical interface is also provided as a Jupyter extension (a tab in Jupyter).\n", "\n", "The tools used typically looks like:\n", "\n", "1. Interactive Jupyter notebook to do analysis, code-editing and testing\n", "2. Conversion of the interactive notebook to a non-interactive script\n", "3. Some additional scripts to help register your code as an Algorithm\n", "4. Jupyter GUI tools to help register your Algorithm\n", "5. An interactive control-notebook also in Jupyter to execute and monitor a batch of Jobs using maap.py\n", "\n", "This is represented here:\n", "\n", "![MAAP Overview Diagram](_static/maap_overview_diagram.png) \n", "\n", " - The **Algorithm Development Environment (ADE)** is a tool that helps with the development of algorithms in a consistent, standardized environment that helps with the development and testing of algorithms and facilitates large scale data processing. MAAP's primary user interface is Jupyterlab, where code is written and tested before pushed to the large scale data processing system. Code is stored and checked out from Git-based repositories, including Github and MAAP's own code repository subsystem.\n", " - The **Data Processing System (DPS)** is where registered algorithms (see Algorithm Catalog) can be run at scale in the cloud. The MAAP system provides a Jupyter GUI to run Jobs, or the maap.py library can be used to run a batch of Jobs in a loop using Python. The DPS also has monitoring capabilities, and again the MAAP system provides a Jupyter GUI to help monitor Jobs. This can also be done using maap.py in Python.\n", " - The **Algorithm Catalog**, where your algorithms from the ADE can be registered and compiled for use by the DPS. The MAAP system provides API and GUI tools to help you register and view your algorithms.\n", "- The **Code Repository** is a git-based repository to store user code. It is also used to store the configuration files necessary for building algorithms to store in the algorithm catalog and for execution in the DPS.\n", " - Input data comes from a few **Data Catalogs**. Currently there is a MAAP [STAC Catalog](https://stacspec.org/en/about/) and the [NASA CMR Catalog](https://www.earthdata.nasa.gov/eosdis/science-system-description/eosdis-components/cmr). More information can be found in the [search tutorials section](https://docs.maap-project.org/en/latest/technical_tutorials/search/catalog.html).\n", " " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }