Running Algorithms at Scale

In order to run algorithms in the scaled-up cloud compute environment, they must first be “registered” in the Algorithm Catalog. This will make them available to other MAAP users, clearly define their inputs and outputs, and prepare them to be run easily in the Data Processing System (DPS).

Running Algorithms overview in context diagram

Register an Algorithm

To register an Algorithm that can be run in the DPS, the code should be placed in a public Git repo (either Github or Gitlab).

  1. Open the Launcher -> Register Algorithm tool in the MAAP Extensions section Register Algorithm tool in Launcher

  2. First you fill in the public code-repository information.

  • The Repository URL is the .git URL.

  • Repository Branch is used as a version when this algorithm is registered.

  • The Run and Build Commands must be the full path of the scripts that will be used by the DPS to build and execute the algorithm. Typically these will be the repository_name/script_name.sh, as demonstrated in this screenshot: Register Algorithm repository information

  1. Once that is complete “Add General Information”.

  • The Algorithm Name will be the unique identifier for the algorithm in the MAAP system.

  • Algorithm Description is additional free-form text to describe what this algorithm does.

  • Disk Space is the minimum amount of space you expect—including all inputs, scratch, and outputs—it gives the DPS an approximation to help optimize the run.

  • The Container URL is a URL of the Stack (workspace image environment) you are using as a base for the algorithm. In this example we use: https://mas.maap-project.org/root/maap-workspaces/base_images/vanilla:v3.0.1 Register Algorithm general information

Container URLs

To find another Container URL, go to: https://repo.maap-project.org/root/maap-workspaces/container_registry (choose Packages and Registries > Container registry if you go to the main maap-workspaces area). Find your base Stack and dig in until you can copy the link of the specific version of Stack that you need, as demonstrated in these screenshots: Container registry Container vanilla Container copy link

  1. Fill in the Input section. There are File Inputs and Positional Inputs (command-line parameters to adjust how the algorithm runs). In our example we have on File Input called input_file. For each input you can add a Description, a Default Value, and mark whether it’s required or optional.

Input files are copied into /inputs in the working directory of your job.

Register Algorithm file inputs

  1. When it looks good, press Register Algorithm at the bottom of the page. A few seconds later you should see a modal dialog with a link to the algorithm registration process. Register Algorithm submitted

  2. If you open that link in a new page or tab, you can monitor the progress of registration and see any error messages. By opening it in a new tab/window you can keep the Register Algorithm tool open and re-submit with the same values to correct any errors.

Here is an example error message: Register Algorithm error

If the process continues without failing (this may take some time) you will ultimately see “Job succeeded”: Register Algorithm success

  1. Now that the algorithm has registered, you will see it in the View & Submit Jobs tool in the Algorithm drop-down menu. It will be labeled with the name you put into the Algorithm Name field in the registration form you just submitted (in this example, rob_test_registration with version/branch main)

Jobs UI with new Algorithm

Run the Algorithm as a Job and Monitor it

MAAP is configured to run up to 4,000 concurrent jobs. There are two additional ways to run a job: via the Jobs UI in the Launcher, or via a call to the maap-py Python library.

The Jobs UI will let you run and monitor jobs easily. You can find full documentation in the system reference guide for the Jobs UI or using maap-py with Python in the System Reference Guide FAQs.

job_ui_access

Some alternative methods of running the job are found below.

  • Click DPS/MAS Operations menu -> Execute DPS Job

  • Select your algorithm from the dropdown

  • A new popup will ask for inputs; if it doesn’t take inputs, the popup will say so.

  • Click OK again to view the ID for the job just submitted.

OR

Import the maap-py library: if in Jupyter, click the small blue MAAP button in the top left corner to automatically insert code. If using a script, add these lines manually at the top of your notebook:

from maap.maap import MAAP
maap = MAAP()

Pass your algorithm’s name, version, required inputs, and username to the function maap.submitJob (identifier is job- algo_name:algo_version) Check result: maap.getJobResult()