Running Algorithms at Scale

In order to run algorithms in the scaled-up cloud compute environment, they must first be “registered” in the Algorithm Catalog. This will make them available to other MAAP users, clearly define their inputs and outputs, and prepare them to be run easily in the Data Processing System (DPS).

Running Algorithms overview in context diagram

Register an Algorithm

Clone a test algorithm

This is an example algorithm you can use for this getting started guide: https://github.com/MAAP-Project/dps-unit-test

Demo GitHub repo In the repo there are a few files that you will typically have, or which are required:

  • algorithm_config.yml is a required file that has a description of the inputs and outputs of the algorithm along with other parameters like the run command.
  • run_test.sh is the run command for this algorithm. It is typical to have a shell script to tell the system how to run the algorithm and set some environmental variables.
  1. Make a new folder for your test algorithm. Open a terminal here (File > New > Terminal or use the blue “+” button above the Jupyter file browser). New Folder
  2. Copy the Github clone link from https://github.com/MAAP-Project/dps-unit-test Copy .git link
  3. Open the built-in Jupyter Github UI to the left of the file browser. Choose “Clone a Repository” and paste in the .git link you copied from the Github repository. Clone a Repository Paste .git link
  4. You should see a new folder created with the repo you cloned. If you browse to that folder and open up the Jupyter Github UI again, it will show you some info about that repo. Algorithm folder was created Browse to folder Look at Github UI
  5. If you want to make changes to the code and have your own copy of it to register, Clone the code into your MAAP GitLab. The git link to your code is indicated in the algorithm_config.yml. If you would prefer to skip this for now, leave the repository_url in algorithm_config.yml pointed at the “root” user (repository_url: https://repo.dit.maap-project.org/root/dps-unit-test.git)
  6. Rename the algorithm to personalize it. You do this by opening up the algorithm_config.yml file and changing the algo_name field. Rename the Algorithm

Register the algorithm

  1. Make sure code is ready and saved -> right click file -> “Register as MAS Algorithm” Register as algorithm contextual menu
  2. This automatically creates algorithm_config.yaml file with the presets if it is not already present (which, in this example case, it is present). There is only one for any directory. At this point you would normally edit the configuration file, then repeat step 1 and click “OK” to register. For this example we did this in step 5 in the previous section.
  3. Outputs (if any) should be written to a folder named outputs. There are none in the example we are using here.

Note

It can take some time to register an algorithm. You can determine if it has completed when you see it appear in the Jobs UI (see below) or in the menus under DPS/MAS Operations > List Algorithms.

Run the Algorithm as a Job and Monitor it

The Jobs UI

MAAP is configured to run up to 4,000 concurrent jobs. There are two additional ways to run a job: via the Jobs UI in the Launcher, or via a call to the maap-py Python library.

The Jobs UI will let you run and monitor jobs easily. You can find full documentation in the system reference guide for the Jobs UI. You can also find specific documentation on how to submit jobs and how to monitor jobs in the System Reference Guide FAQs.

job_ui_access

Some alternative methods of running the job are found below.

Pop-up

  • Click DPS/MAS Operations menu -> Execute DPS Job
  • Select your algorithm from the dropdown
  • A new popup will ask for inputs; if it doesn’t take inputs, the popup will say so.
  • Click OK again to view the ID for the job just submitted.

OR

maap-py

Import the maap-py library: if in Jupyter, click the small blue MAAP button in the top left corner to automatically insert code. If using a script, add these lines manually at the top of your notebook:

from maap.maap import MAAP
maap = MAAP()

Pass your algorithm’s name, version, required inputs, and username to the function maap.submitJob (identifier is job- algo_name:algo_version) Check result: maap.getJobResult()