Running Algorithms at Scale¶
In order to run algorithms in the scaled-up cloud compute environment, they must first be “registered” in the Algorithm Catalog. This will make them available to other MAAP users, clearly define their inputs and outputs, and prepare them to be run easily in the Data Processing System (DPS).
Register an Algorithm¶
Clone a test algorithm¶
This is an example algorithm you can use for this getting started guide: https://github.com/MAAP-Project/dps-unit-test
In the repo there are a few files that you will typically have, or which are required:
algorithm_config.yml
is a required file that has a description of the inputs and outputs of the algorithm along with other parameters like the run command.run_test.sh
is the run command for this algorithm. It is typical to have a shell script to tell the system how to run the algorithm and set some environmental variables.
- Make a new folder for your test algorithm. Open a terminal here (File > New > Terminal or use the blue “+” button above the Jupyter file browser).
- Copy the Github clone link from https://github.com/MAAP-Project/dps-unit-test
- Open the built-in Jupyter Github UI to the left of the file browser. Choose “Clone a Repository” and paste in the .git link you copied from the Github repository.
- You should see a new folder created with the repo you cloned. If you browse to that folder and open up the Jupyter Github UI again, it will show you some info about that repo.
- If you want to make changes to the code and have your own copy of it to register, Clone the code into your MAAP GitLab. The git link to your code is indicated in the
algorithm_config.yml
. If you would prefer to skip this for now, leave the repository_url inalgorithm_config.yml
pointed at the “root” user (repository_url: https://repo.dit.maap-project.org/root/dps-unit-test.git
) - Rename the algorithm to personalize it. You do this by opening up the
algorithm_config.yml
file and changing thealgo_name
field.
Register the algorithm¶
- Make sure code is ready and saved -> right click file -> “Register as MAS Algorithm”
- This automatically creates
algorithm_config.yaml
file with the presets if it is not already present (which, in this example case, it is present). There is only one for any directory. At this point you would normally edit the configuration file, then repeat step 1 and click “OK” to register. For this example we did this in step 5 in the previous section. - Outputs (if any) should be written to a folder named
outputs
. There are none in the example we are using here.
Note
It can take some time to register an algorithm. You can determine if it has completed when you see it appear in the Jobs UI (see below) or in the menus under DPS/MAS Operations > List Algorithms.
Run the Algorithm as a Job and Monitor it¶
The Jobs UI¶
MAAP is configured to run up to 4,000 concurrent jobs. There are two additional ways to run a job: via the Jobs UI in the Launcher, or via a call to the maap-py Python library.
The Jobs UI will let you run and monitor jobs easily. You can find full documentation in the system reference guide for the Jobs UI. You can also find specific documentation on how to submit jobs and how to monitor jobs in the System Reference Guide FAQs.
Some alternative methods of running the job are found below.
Pop-up¶
- Click DPS/MAS Operations menu -> Execute DPS Job
- Select your algorithm from the dropdown
- A new popup will ask for inputs; if it doesn’t take inputs, the popup will say so.
- Click OK again to view the ID for the job just submitted.
OR
maap-py¶
Import the maap-py
library: if in Jupyter, click the small blue MAAP button in the top left corner to automatically insert code. If using a script, add these lines manually at the top of your notebook:
from maap.maap import MAAP
maap = MAAP()
Pass your algorithm’s name, version, required inputs, and username to the function maap.submitJob (identifier is job- algo_name:algo_version)
Check result: maap.getJobResult()