Running Algorithms at Scale
In order to run algorithms in the scaled-up cloud compute environment, they must first be “registered” in the Algorithm Catalog. This will make them available to other MAAP users, clearly define their inputs and outputs, and prepare them to be run easily in the Data Processing System (DPS).
Register an Algorithm
To register an Algorithm that can be run in the DPS, the code should be placed in a public Git repo (either Github or Gitlab).
Open the Launcher -> Register Algorithm tool in the MAAP Extensions section
First you fill in the public code-repository information.
The Repository URL is the .git URL.
Repository Branch is used as a version when this algorithm is registered.
The Run and Build Commands must be the full path of the scripts that will be used by the DPS to build and execute the algorithm. Typically these will be the repository_name/script_name.sh, as demonstrated in this screenshot:
Once that is complete “Add General Information”.
The Algorithm Name will be the unique identifier for the algorithm in the MAAP system.
Algorithm Description is additional free-form text to describe what this algorithm does.
Disk Space is the minimum amount of space you expect—including all inputs, scratch, and outputs—it gives the DPS an approximation to help optimize the run.
The Container URL is a URL of the Stack (workspace image environment) you are using as a base for the algorithm. In this example we use:
https://mas.maap-project.org/root/maap-workspaces/base_images/vanilla:v3.0.1
Container URLs
To find another Container URL, go to: https://repo.maap-project.org/root/maap-workspaces/container_registry (choose Packages and Registries > Container registry
if you go to the main maap-workspaces area). Find your base Stack and dig in until you can copy the link of the specific version of Stack that you need, as demonstrated in these screenshots:
Fill in the Input section. There are File Inputs and Positional Inputs (command-line parameters to adjust how the algorithm runs). In our example we have on File Input called
input_file
. For each input you can add a Description, a Default Value, and mark whether it’s required or optional.
Input files are copied into /inputs
in the working directory of your job.
When it looks good, press Register Algorithm at the bottom of the page. A few seconds later you should see a modal dialog with a link to the algorithm registration process.
If you open that link in a new page or tab, you can monitor the progress of registration and see any error messages. By opening it in a new tab/window you can keep the Register Algorithm tool open and re-submit with the same values to correct any errors.
Here is an example error message:
If the process continues without failing (this may take some time) you will ultimately see “Job succeeded”:
Now that the algorithm has registered, you will see it in the View & Submit Jobs tool in the Algorithm drop-down menu. It will be labeled with the name you put into the Algorithm Name field in the registration form you just submitted (in this example,
rob_test_registration
with version/branchmain
)
Run the Algorithm as a Job and Monitor it
MAAP is configured to run up to 4,000 concurrent jobs. There are two additional ways to run a job: via the Jobs UI in the Launcher, or via a call to the maap-py Python library.
The Jobs UI will let you run and monitor jobs easily. You can find full documentation in the system reference guide for the Jobs UI or using maap-py with Python in the System Reference Guide FAQs.
Some alternative methods of running the job are found below.
Click DPS/MAS Operations menu -> Execute DPS Job
Select your algorithm from the dropdown
A new popup will ask for inputs; if it doesn’t take inputs, the popup will say so.
Click OK again to view the ID for the job just submitted.
OR
Import the maap-py
library: if in Jupyter, click the small blue MAAP button in the top left corner to automatically insert code. If using a script, add these lines manually at the top of your notebook:
from maap.maap import MAAP
maap = MAAP()
Pass your algorithm’s name, version, required inputs, and username to the function maap.submitJob (identifier is job- algo_name:algo_version)
Check result: maap.getJobResult()