{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Writing and Managing Code with MAAP\n", "\n", "Writing and editing code in the MAAP is done in a Jupyter workspace. Jupyter is a web-based development environment that allows for interactive coding in \"notebooks\". MAAP supports notebooks written in Python and R with pre-configured workspace-types (\"Stacks\"). Jupyter has caught on with the data-science community because it is possible to write and share notebooks with code, commentary, and data-visualizations mixed together in one place.\n", "\n", "The [previous section](getting_started.html#Creating-a-workspace) includes some basic topics in creating a workspace and getting oriented.\n", "\n", ".. note::\n", "If you have not used Jupyter or JupyterLab before, it is highly recommended that you [get acquainted with JupyterLab](https://jupyterlab.readthedocs.io/en/latest/).\n", "\n", "Code is version-controlled, typically using GitHub. MAAP Jupyter includes a GUI widget to help with code push/pull as a sidebar tool. Git is intended to help with collaborative code development and version-control.\n", "\n", ".. note::\n", "If you have not used Github or git before, it is highly recommended that you [get acquainted with Github](https://docs.github.com/en/get-started/quickstart/hello-world). For a quick reference to git commands there is a [Git Cheat Sheet](https://training.github.com/) in a variety of languages.\n", "\n", "![Writing code overview in context diagram](_static/writing_code_overview.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Working with code repositories like GitHub and GitLab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clone a Repository with GitHub\n", "\n", "Here is an example repository you can use for this getting started guide:\n", "https://github.com/MAAP-Project/dps-unit-test\n", "\n", "1. Copy the Github clone link from https://github.com/MAAP-Project/dps-unit-test\n", "![Copy .git link](../_static/clone_demo2.png)\n", "\n", "2. Open the built-in Jupyter Github UI to the left of the file browser. Choose \"Clone a Repository\" and paste in the .git link you copied from the Github repository. You can also access this menu through the **Git** tab at the top of the Jupyter window. \n", "![Clone a Repository](../_static/clone_demo3.png)\n", "![Paste .git link](../_static/clone_demo4.png)\n", "\n", "3. You should see a new folder created with the repo you cloned. If you browse to that folder and open up the Jupyter Github UI again, it will show you some info about that repo.\n", "![Algorithm folder was created](../_static/clone_demo5.png)\n", "![Browse to folder](../_static/clone_demo6.png)\n", "![Look at Github UI](../_static/clone_demo7.png)\n", "\n", "While developing an algorithm, this is how you would manage pushes and pulls to and from GitHub. For the purpose of this guide, there is no need to push or pull changes.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### The MAAP GitLab Code repository\n", "\n", "After creating your MAAP account, you can create a code repository by navigating to the MAAP GitLab account at https://repo.maap-project.org. This GitLab account is connected to your ADE workspaces automatically when signing into the ADE.\n", "\n", "You can then follow the same steps above to clone a repository from the MAAP GitLab. \n", "\n", "Typically, scientists have been storing code in GitHub, but the built-in GitLab is available if you prefer.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testing an Algorithm the Workspace\n", "\n", "To make sure that an Algorithm is functioning as expected, we can run it in the Jupyter terminal in a way to mimic how the scaled DPS (Data Processing System) will run it.\n", "\n", "To do this, you will want to run the `run-test.sh` script from another folder. For the sake of this test, let's try running it in `/tmp/my_test_run`. You can call the folder whatever you'd like, maybe with your name in it to make it unique (e.g. `/tmp/robs_test_run`).\n", "\n", "When the DPS runs an algorithm, it will first copy input files into a folder called `input/`. It will write outputs into a folder called `output/`. In order to mimic this, let's copy a test file from the `dps-unit-test` repo into a temporary test-run folder `/tmp/my_test_run/input`. (When we get to the [actual run in the DPS](running_at_scale.ipynb#Run-the-Algorithm-as-a-Job-and-Monitor-it), you will specify an input file URL and that file will be downloaded into the `input/` folder of the cloud-worker before execution of the algorithm.)\n", "\n", "If you cloned the demo repository to `~/algorithms`, then the files should be somewhere like `~/algorithms/dps-unit-test`. For the sake of the example below, we will assume this is where the demo repo has been cloned.\n", "\n", "First copy the example input file into a folder called `input/`, from inside your temporary test-run folder.\n", "```\n", "mkdir /tmp/my_test_run\n", "cd /tmp/my_test_run\n", "mkdir input\n", "cp ~/algorithms/dps-unit-test/input_file.txt input\n", "ls input\n", "```\n", "\n", "This should show you the contents of the `input/` folder is the file `input_file.txt`.\n", "\n", "Then we can execute a test run of the `run-test.sh` script found in the repository you cloned.\n", "```\n", "cd /tmp/my_test_run\n", "~/algorithms/dps-unit-test/run-test.sh\n", "```\n", "\n", "If the run was successful, you should see the following output:\n", "```\n", "Testing writing output product\n", "Testing opening input file\n", "Opening input file input/input_file.txt success\n", "```\n", "\n", "Then if you do `ls *` you should see the content of the `input/` and `output/` folders:\n", "```\n", "> ls *\n", "input:\n", "input_file.txt\n", "\n", "output:\n", "write-output.txt\n", "```\n", "\n", "If the run was not successful, you may not have created the input folder with the input file in it. This may also happen if you run the `run-test.sh` script from the a different folder (where there is no `input/` folder). In that case you will see an error something like this:\n", "```\n", "ls: cannot access 'input/*': No such file or directory\n", "Testing writing output product\n", "Testing opening input file\n", "Traceback (most recent call last):\n", " File \"/projects/algorithms/dps_unit_test/dps-unit-test/test-input-file.py\", line 6, in \n", " input_file = sys.argv[1]\n", " ~~~~~~~~^^^\n", "IndexError: list index out of range\n", "```\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Customizing your Workspace Environment\n", "\n", "Your Jupyter workspace has a set of pre-installed libraries, depending on [which Stack you selected](getting_started.ipynb#Creating-a-workspace). If you need libraries that are not pre-installed, we suggest using an environment manager; `conda` is pre-installed to help with this. \n", "\n", "Full [documentation on configuring conda environments](../system_reference_guide/custom-environments.ipynb) may be found in the [System Reference Guide](../system_reference.rst).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using maap.py to access MAAP functionality from Python notebooks\n", "\n", "The MAAP platform offers a variety of functionality to run and monitor large-scale processing jobs. Access to the functionality is gained via the underlying [RESTful MAAP API](https://api.maap-project.org/api/). In a Python notebook, you will typically use this API via a helper library called `maap.py`, which will make using MAAP platform features easy, using Python syntax. For example, registering algorithms, running batches of jobs, monitoring jobs, or accessing data.\n", "\n", "Much of the `maap.py` functionality is documented in the [Technical Tutorials section](../technical_tutorials.rst) and in-context in the [Science Examples](../science_examples.rst). The [maap-py Github page](https://github.com/MAAP-Project/maap-py) has additional usage documentation.\n" ] }, { "cell_type": "markdown", "metadata": { "vscode": { "languageId": "plaintext" } }, "source": [ "## Helpful Templates while developing Algorithms in MAAP\n", "\n", " - This [algorithm repository example](https://github.com/MAAP-Project/dps-unit-test) is a good starting point for a new algorithm, as it contains the various accessory files that facilitate running the algorithm at scale\n", " - Which templates will help you? Let the development or documentation team know!\n", " - For example: conda.yml with some default packages, run_script.sh\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Registering the Algorithm\n", "\n", "This section briefly demonstrated how you will sync algorithm code with GitHub as you develop it. Once your algorithm is ready to be run at scale, you will need to Register it with the MAAP DPS.\n", "\n", "The next section covers this process." ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.12.0 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" }, "vscode": { "interpreter": { "hash": "7500c3e1c7c786e4ba1e4b4eb7588219b4e35d5153674f92eb3a82672b534f6e" } } }, "nbformat": 4, "nbformat_minor": 4 }