{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Running Algorithms at Scale\n", "\n", "In order to run algorithms in the scaled-up cloud compute environment, they must first be \"registered\" in the Algorithm Catalog. This will make them available to other MAAP users, clearly define their inputs and outputs, and prepare them to be run easily in the Data Processing System (DPS).\n", "\n", "A single execution of a registered Algorithm is called a Job. A single Job is easy to run using the Submit Job UI.\n", "\n", "Batches of Jobs are run using a Jupyter notebook (often called a “control notebook” or “wrapper notebook”) and a Python library called maap.py that has helper-functions to execute and monitor Job execution.\n", "\n", "Running Jobs can be monitored via the View Jobs UI and/or the maap.py helper functions.\n", "![Running Algorithms overview in context diagram](_static/running_algorithms_overview.png)\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Register an Algorithm\n", "To register an Algorithm that can be run in the DPS, the code should be placed in a public Git repo (either Github or Gitlab).\n", "\n", "1. Open the Register Algorithm tool in the MAAP Extensions section of the Launcher. To open the Launcher, choose File -> New Launcher, or press the blue \"+\" button above the Jupyter file browser. Then select the Algorithm Catalog under MAAP plugins\n", "![Register Algorithm tool in Launcher](_static/launcher-register-algorithm.png)\n", "\n", "2. Click the blue Register New Algorithm in the top right corner\n", "\n", "3. Now fill in the information for this form:\n", "![Register Algorithm form](_static/register-algorithm-form.png)\n", "You can also select an existing algorithm configuration to autofill this form \n", "\n", "First you fill in the algorithm name and version. Your algorithm can be identified by the algorithm name, version and who deployed it. \n", "\n", "- The Repository URL is the .git URL. For example, \n", "```\n", "https://github.com/MAAP-Project/dps-unit-test.git\n", "```\n", "- The Run and Build Commands must be the path of the scripts that will be used by the DPS to build and execute the algorithm. Typically these will be the repository_name/script_name.sh:\n", "```\n", "dps-unit-test/run-test.sh\n", "```\n", "For this algorithm, there is no build script, so the Build Command may be left empty. In other examples, you can add a build command to add packages on top of an existing docker image \n", "\n", "- The **Base Container URL** is a URL of the Stack (workspace image environment) you are using as a base for the algorithm. This is a dropdown where the default is a standard minimal container called `maap_base` image such as `mas.maap-project.org/root/maap-workspaces/custom_images/maap_base:v5.0.0`. The other option is the Container of your current workspace (i.e. R, pangeo, etc.). These containers will have numerous conda packages installed which may or may not be useful for you. Just a note if you want the default conda packages for your current workspace container, if you successfully ran the Algorithm in a Terminal without adding additional packages, then you should be able to successfully use your current workspace container as the **Container URL** for your algorithm. \n", "We recommend using `maap_base` as it makes algorithm registration faster, although using it means you need to manage your own conda packages. More information how to make a custom conda environment [here](../system_reference_guide/custom-environments.html#Custom-environments). See the Algorithm Registration documentation for [more information on Containers](../system_reference_guide/algorithm_registration.ipynb#Container-URLs).\n", "You can also click \"Use pre-built algorithm container\" to bring your own already built image. \n", "\n", "4. Once that is complete enter some \"Resource Requirements\" and \"Metadata\"\n", "\n", "5. Fill in the Input section. For each input you can add a Name, Label, Description, Type, and Default Value. \n", "\n", ".. note:: **Understanding How the Algorithm Registration Form Relates to Job Execution in DPS**:\n", "When you run a Job in the DPS, the MAAP system will start up a \"worker\" computer in the cloud based on the Resource Allocation parameter. It will then run the build script to make sure that your runtime environment is set up properly, and then the run script indicated during the registration process to handle the input parameters and run the algorithm code. \n", "\n", ".. note:: As part of execution, the DPS will create a directory called `/inputs`. Copies of the Inputs are placed into `/inputs` in the working directory of your job. A directory called `/outputs` is also created, to store any file outputs.\n", "\n", "\n", "6. When it looks good, press Register Algorithm at the bottom of the page. A few seconds later you should see a notification with a link to your \"My Builds and Deployments\" page. \n", "![Register Algorithm submitted](_static/register-4-modal.png)\n", "\n", "\n", "7. You can monitor the progress of registration and see any error messages in your \"My Builds and Deployments\" page. You can click the link in the Build Status column to see the GitLab pipeline of your building algorithm \n", "\n", "Here is an example of a building algorithm:\n", "![Register Algorithm Building](_static/register-5-status.png)\n", "\n", "Once the Build Status is successful, it is ready to be run in the DPS.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Run the Algorithm as a Job and Monitor it\n", "\n", "#### Using The Jobs UI\n", "\n", "MAAP is configured to run up to 4,000 concurrent jobs. There are two ways to run a Job: via the Submit Jobs UI in the Launcher, or via a call to the maap-py Python library.\n", "\n", "1. The Submit Jobs UI will let you run and monitor jobs easily. Open it up. You can find full documentation in the system reference guide for [the Jobs UI](../system_reference_guide/jobsui.ipynb) in the System Reference Guide FAQs.\n", "\n", "![Jobs UI in Launcher](_static/run-1-launcher.png)\n", "\n", "2. You can run your newly-registered Algorithm here. You will see it in the Submit Jobs tool in the Algorithm drop-down menu. If you open the dropdown menu, you can type a few letters to filter the list. Your Algorithm will be labeled with the name you put into the Algorithm Name field in the registration form you just submitted. Select the desired Algorithm version and Deployed By fields \n", "\n", "3. Next you can select a resource queue, the queues you see are the ones you have access to. Job tag is an identifier for your job you can use as a label. It doesn't need to be unique \n", "\n", "4. Now, you can configure the inputs which are based on the algorithm you submitted \n", "\n", "As a note, input files can be any file that is publicly accessible to MAAP, for example any file on the web. \n", "\n", "![Filled Submit Job form](_static/run-2-filledform.png)\n", "\n", "4. Press **Submit Job**. A few seconds later a notification should appear indicating a successful job submission.\n", "\n", "5. Next, in the Launcher open the View Jobs tool (next to the Submit Jobs tool). If you do not see your test Job, you may need to refresh the table that opens up by pressing the blue button next to the \"Last updated\" message.\n", "![Job List](_static/run-4-viewjobs.png)\n", "\n", "6. Your Job should finish shortly (use the refresh button to update the table as needed). Click on your Job in the table and the bottom panel will show the Job Details for that Job. Explore the various sections on your own to familiarize yourself with the information available.\n", "\n", "If you select the Outputs section, you can select \"Open in Workspace\" to open up the file panel in Jupyter to the output path, as shown here. \n", "![Output File Browser](_static/run-5-viewoutputs.png)\n", "\n", "One way to get your output files is to right-click in the File Browser and choose \"Download\".\n", "![Download File](_static/run-6-download.png)\n", "\n", ".. note::\n", "Congratulations, you have run your first DPS Algorithm in the cloud!\n", "\n", "#### Using maap-py\n", "\n", "To assist connections to the MAAP system from a Jupyter notebook, a helper library called `maap.py` provides Python-native calls to the underlying RESTful MAAP API. Often a separate Jupyter notebook is used to run and monitor jobs with API calls. \n", "\n", "You can find documentation on [using maap-py](../system_reference_guide/jobs_maappy.ipynb) with Python notebooks in the System Reference Guide.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scaling Up\n", "\n", "This basic example demonstrates the execution of a single job. You may be wondering how you would manage the cloud execution if you wanted to run many jobs at once. The answer is that you can simply keep submitting more jobs, and the system will handle the parallelization and scaling for you. \n", "\n", "You can press Submit Job repeatedly to create additional new executions of the same algorithm (perhaps you might change the `input_file` for each Job) and a queue will be created that begins executing your Jobs in parallel on the cloud.\n", "\n", "If you need more compute power for each single job (e.g. your algorithm is computationally and/or memory intensive, or if it requires a GPU) then you will select a different Resource to run on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion\n", "\n", "At this point you have gone through the basic steps of setting up and using the MAAP to register and execute an Algorithm in the DPS. This is an example of the first iteration of an algorithm development process that includes writing code, registering it, testing it, making modifications (re-writing code) and re-registering it, and so on.\n", "\n", "Next you may want to explore the [science example notebooks](../science_examples.rst) or the [DPS in-depth tutorial](../technical_tutorials/dps_tutorial/dps_tutorial_demo.ipynb).\n", "\n", "If you have questions or problems to discuss, please join us at the [MAAP Community site](https://github.com/orgs/MAAP-Project/discussions/categories/platform)!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.13.0 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" }, "vscode": { "interpreter": { "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e" } } }, "nbformat": 4, "nbformat_minor": 4 }