{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Environments\n", "\n", "This document guides MAAP users in the process of selecting, extending existing environments (the set of libraries availables for analysis) or creating custom environments.\n", "\n", "## Workspaces\n", "\n", "The MAAP Hub offers various workspace options, each workspace coming with its own environment that has pre-installed essential libraries for computing and geospatial analysis as well as MAAP specific extensions. At the time of writing this guide, here are the options: \n", "- Pangeo image: Built from Pangeo notebook image: https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/environment.yml with VEDA packages https://github.com/NASA-IMPACT/pangeo-notebook-veda-image/blob/main/environment.yml \n", "- isce3 image: Built from Pangeo base notebook with these packages installed: https://github.com/MAAP-Project/maap-workspaces/blob/main/base_images/2i2c/isce3/environment.yml \n", "- R image: [py-rocket-geospatial-2](https://nmfs-opensci.github.io/py-rocket-geospatial-2/) base image with a couple Python packages: https://github.com/MAAP-Project/maap-workspaces/blob/main/base_images/2i2c/r/environment.yml and these R packages: https://github.com/MAAP-Project/maap-workspaces/blob/main/base_images/2i2c/r/scripts/install_cran_packages_r.sh \n", "- QGIS image: Built from quay.io/2i2c/nasa-qgis-image and does not have MAAP extensions\n", "- PyTorch image (if requested access): Built from quay.io/pangeo/pytorch-notebook\n", "- Tensorflow2 image (if requested access): Built from quay.io/pangeo/ml-notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bringing your own image\n", "Please consult the docs here for more information about creating and bringing your own image to the MAAP Hub https://docs.openveda.cloud/user-guide/scientific-computing/custom-environments.html\n", "Information provided for Python and R images\n", "If you would like to build off of MAAP images, you can use this repo as an example: https://github.com/MAAP-Project/repo2docker-maap-images" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Extending existing environments\n", "\n", "Users may need libraries for their specific analysis purposes that are not present in the environments of the different workspace options offered. In this case, ideally, the steps should be the following : \n", "\n", "1. The user explores her/his environment need by extending the environment of an existing workspace or creating her/his custom environment in an existing workspace (see next sections).\n", "2. Once that is done, the user submits a ticket/coordinates with the platform team to create a new workspace option with the requested, finalized environment.\n", "\n", "The above approach is ideal because modifications to the pre-defined workspace environment do not survive a workspace restart (see next sections), and because sharing new experimented environments is valuable.\n", "\n", "The next sections explain how to extend environments or create custom environments, and for this, introduces information regarding which environment management solution we are using. \n", "\n", "### Package manager\n", "\n", "We use `conda` with the libmamba solver as a package manager to install, update or remove packages (libraries). `conda` works with 'environments' that are directories in your local file system containing a set of packages. When you work 'in a given environment', it means that your programs will look for dependencies in that environment's `conda` directory. All workspaces launch with a default environment that has all the pre-installed libraries for that workspace. The default name for our environments is `notebook`: \n", "\n", "![conda environment](../_static/notebook_conda_environment.png)\n", "\n", "You can notice that a `notebook` `conda` environment is activated by default, and its libraries are located in `/srv/conda/envs/notebook`. \n", "\n", "### Extending the default environment in a given workspace session\n", "\n", "*Any modification to the default workspace environment, or to the `base` environment, does not survive a workspace restart.*\n", "\n", "Extending an existing `conda` environment means adding packages on top of what it contains, which works provided there are no dependency conflicts. You can install libraries using the `conda install` command to install additional packages in your current environment (run `conda --help` to learn more about how to use `conda` commands). Note that you need a workspace with at least 14.8GB RAM to install packages with conda, this is the default memory option. All `conda` install commands should use `-c conda-forge` otherwise it's unlikely to work, since many/most of the packages installed already are from conda-forge. For example :\n", "\n", "```\n", "conda install -c conda-forge xarray\n", "```\n", "\n", "libmamba is the default solver, but users are welcome to set the solver to \"classic\" with: \n", "\n", "```\n", "conda install --solver=classic -c conda-forge xarray\n", "```\n", "\n", "However, it is recommended to use configuration files for reproducibility and shareability. With this approach, assuming your configuration file is named `config.yml`, the command to use is : \n", "\n", "```\n", "conda env update -f config.yml\n", "```\n", "\n", "For more details on configuration files, see the [Custom environments section](#Custom-environments) and for an example of this command, refer to the [subsection about updating an environment with a configuration file](#Updating-an-existing-environment-with-a-configuration-file).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### IMPORTANT NOTE\n", "Bringing your own image is preferred over custom environments as these changes are persistent after your environment restarts " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Custom environments\n", "\n", "*For the rest of this README, in each section we provide a link to download an example YAML configuration file.*\n", "\n", "You can use the `conda` CLI to create a new, custom environment. The parameters (the list of libraries, the location where to search for them, etc...) can be passed either from a configuration YAML file or directly on the console. We recommend using the first option (a YAML file is easier to share and modify). \n", "\n", "### Basic custom environment\n", "\n", ".. note::\n", "Example config file for a basic custom environment [here](./example_conda_configuration_files/env-example.yml).\n", "\n", "This configuration installs specific versions `python`, `pandas` and `geopandas` from `conda-forge`. If versions aren't specified, the latest is installed. We recommend to always specify the version for reproducibility. The basic command to create this environment would be :\n", "\n", "```\n", "conda env create -f env-example.yml\n", "```\n", "\n", "### Updating an existing environment with a configuration file\n", "\n", ".. note::\n", "Example config file for updating the `notebook` environment [here](./example_conda_configuration_files/env-extend.yml).\n", "\n", "You can *update* an existing environment with a configuration file as well. For example, let's assume you have a `conda` environment with a set of packages already installed in it (for example the `pangeo` environment, or another default workspace environment), but it doesn't have `xarray` and `geopandas`. Using the linked example config : \n", "\n", "```\n", "conda env update -f env-extend.yml\n", "```\n", "\n", "This command will update the active environment by adding `xarray` and `geopandas`, provided it does not cause conflicts with the existing libraries. \n", "\n", "\n", "### Using `pip` for python packages\n", "\n", ".. note::\n", "Example config file for using pip install [here](./example_conda_configuration_files/env-with-pip.yml).\n", "\n", "Some python packages might not be availabe in the channel you are using, or in any `conda` channel. If that package however is in `PyPI` (the official python package repository), one can use `pip` within a `conda` environment to download packages. The recommended way is to specify this in the configuration file. In the linked example, we add `stackstac` as a dependency to install from `PyPI` because it is not available in the `conda-forge` channel. \n", "\n", "### Using custom environments in jupyter notebooks\n", "\n", ".. note::\n", "Example config file for this section [here](./example_conda_configuration_files/env-with-ipykernel.yml).\n", "\n", "The following instruction steps are for python kernels.\n", "\n", "- Make sure ipykernel is listed as a dependency in your configuration file.\n", "- Create your environment using the linked configuration file.\n", "- Install the environment as a kernel by running the following command (parameter values follow the example mentioned):\n", " ```\n", " python -m ipykernel install --user --name env-with-ipykernel --display-name \"Python env-with-ipykernel\"\n", " ```\n", " The above command installs the environment as a kernel in Jupyter, making it accessible in the notebook with a display name of \"Python env-with-ipykernel\".\n", "- Wait around 30 seconds and launch a new notebook. Among the kernel options, you should see \"Python env-with-ipykernel\" listed. Below you can see a screenshot that shows what this step looks like:\n", "![Register a kernel with a conda environment and launch a notebook with it](../_static/launch_custom_kernel_conda.png)\n", "- Remove by listing kernelspecs `jupyter kernelspec list` to find name, then `jupyter kernelspec remove `\n", "\n", "### Suggested packages for custom environment\n", "\n", ".. note::\n", "Example config file for installing maap-py via pip [here](./example_conda_configuration_files/env-with-maap-py.yml)\n", "\n", "MAAP users typically use the python `maap-py`. It's pre-installed in all workspaces, in the default workspace environment. Any custom environment should specify it, otherwise it is not going to be accessible from that environment. However, `maap-py` is not packaged in a public package repository, like `PyPI` or `conda-forge`. It is possible to install it directly from its Github repository with `pip` though. See the configuration example linked. You can note that in the example, `maap-py` is 'versioned' using one of the `maap-py` git version tags. You can find the most recent `maap-py` tags on the [github repository in the \"releases\" page](https://github.com/MAAP-Project/maap-py/releases) :\n", "\n", "![git version tags](../_static/git_tags_maap_py.png)\n", "\n", "The relevant MAAP extensions for algorithms and jobs on PyPi are:\n", "- maap-jupyter-server-extension (needs to be abled with `jupyter server extension enable maap_jupyter_server_extension`)\n", "- maap-dps-jupyter-extension\n", "- maap-algorithms-jupyter-extension\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.11.1 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" }, "metadata": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } }, "vscode": { "interpreter": { "hash": "5c7b89af1651d0b8571dde13640ecdccf7d5a6204171d6ab33e7c296e100e08a" } } }, "nbformat": 4, "nbformat_minor": 4 }