Environments

This document guides MAAP users in the process of selecting, extending existing environments (the set of libraries availables for analysis) or creating custom environments.

Workspaces

The MAAP ADE offers various workspace options, each workspace coming with its own environment that has pre-installed essential libraries for computing and geospatial analysis. At the time of writing this guide, here are the options :

Workspace image options

For example, the MAAP RGEDI Stable and MAAP R Stable workspace options come with various pre-installed R packages.

For more information : Each of these options rely on Docker images that were build off from Dockerfiles that are publicly available in the MAAP workspace repository. If you want to learn more about what libraries each image contains, check out this repository.

Extending environments

Users may need libraries for their specific analysis purposes that are not present in the environments of the different workspace options offered. In this case, ideally, the steps should be the following :

  1. The user explores her/his environment need by extending the environment of an existing workspace or creating her/his custom environment in an existing workspace (see next sections).
  2. Once that is done, the user submits a ticket/coordinates with the platform team to create a new workspace option with the requested, finalized environment.

The above approach is ideal because modifications to the pre-defined workspace environment do not survive a workspace restart (see next sections), and because sharing new experimented environments is valuable.

The next sections explain how to extend environments or create custom environments, and for this, introduces information regarding which environment management solution we are using.

Package manager

We use mamba (a fast conda drop-in replacement) as a package manager to install, update or remove packages (libraries). mamba works with ‘environments’ that are directories in your local file system containing a set of packages. When you work ‘in a given environment’, it means that your programs will look for dependencies in that environment’s mamba directory. All workspaces launch with an environment called base, which is a mamba environment that has all the pre-installed libraries. If you open a terminal launcher after creating a Basic Stable workspace :

Base conda environment location

You can notice that a base mamba environment is activated, and its libraries are located in /opt/conda.

Extending the base environment in a given workspace session.

Note : any modification to the ``base`` environment does not survive a workspace restart. In other words, modifications to ``/opt/conda`` disappear after a workspace restart.

Extending an existing mamba environment means adding packages on top of what it contains, which works provided there are no dependency conflicts. You can install libraries using the mamba install command to install additional packages in your current environment (run mamba --help to learn more about how to use mamba commands). For example :

mamba install xarray

However, it is recommended to use configuration files for reproducibility and shareability. With this approach, assuming your configuration file is named config.yml, the command to use is :

mamba env update -f config.yml

For more details on configuration files, see the Custom environments section and for an example of this command, refer to the subsection about updating an environment with a configuration file.

Custom environments

For the rest of this README, in each section we provide a link to download an example YAML configuration file.

You can use the mamba CLI to create a new, custom environment. The parameters (the list of libraries, the location where to search for them, etc…) can be passed either from a configuration YAML file or directly on the console. We recommend using the first option (a YAML file is easier to share and modify).

Basic custom environment

Example config file for this sectionhere.

This configuration installs specific versions python, pandas and geopandas from conda-forge. If versions aren’t specified, the latest is installed. We recommend to always specify the version for reproducibility. The basic command to create this environment would be :

mamba env create -f env-example.yml

However, this stores this environment files in /opt/conda, which is a directory that is recreated when the workspace restarts, and so custom environments are lost. Therefore, you want to specify a storage location in your user directory with the --prefix parameter

mamba env create -f env-example.yml --prefix /projects/env

and to activate it :

mamba activate env-example

Updating an existing environment with a configuration file

Example config file for this sectionhere.

You can update an existing environment with a configuration file as well. For example, let’s assume you have a mamba environment with a set of packages already installed in it (for example the base environment), but it doesn’t have xarray and geopandas. Using the linked example config :

mamba env update -f env-extend.yml

This command will update base by adding xarray and geopandas, provided it does not cause conflicts with the existing libraries.

Using pip for python packages

Example config file for this sectionhere.

Some python packages might not be availabe in the channel you are using, or in any mamba channel. If that package however is in PyPI (the official python package repository), one can use pip within a mamba environment to download packages. The recommended way is to specify this in the configuration file. In the linked example, we add stackstac as a dependency to install from PyPI because it is not available in the conda-forge channel.

Using custom environments in jupyter notebooks

Example config file for this sectionhere.

The following instruction steps are for python kernels.

  • Make sure ipykernel is listed as a dependency in your configuration file.

  • Create your environment using the linked configuration file.

  • Install the environment as a kernel by running the following command (parameter values follow the example mentioned):

    python -m ipykernel install --user --name env-with-ipykernel --display-name "Python env-with-ipykernel".
    

    The above command installs the environment as a kernel in Jupyter, making it accessible in the notebook with a display name of “Python env-with-ipykernel”.

  • Wait around 30 seconds and launch a new notebook. Among the kernel options, you should see “Python env-with-ipykernel” listed. Below you can see a screenshot that shows what this step looks like: Register a kernel with a conda environment and launch a notebook with it

Suggested packages for custom environment

Example config file for this sectionhere

MAAP users typically use the python maap-py. It’s pre-installed in all workspaces, in the base mamba environment, but any custom environment should specify it, otherwise it is not going to be accessible from that environment. However, maap-py is not packaged in a public package repository, like PyPI or conda-forge. It is possible to install it directly from its github repository with pip though. See the configuration example linked. You can note that in the example, maap-py is ‘versioned’ using a commit hash (at the end of the github URL).