Environments¶
This document guides MAAP users in the process of selecting, extending existing environments (the set of libraries availables for analysis) or creating custom environments.
Workspaces¶
The MAAP ADE offers various workspace options, each workspace coming with its own environment that has pre-installed essential libraries for computing and geospatial analysis. At the time of writing this guide, here are the options :
For example, the MAAP RGEDI Stable
and MAAP R Stable
workspace options come with various pre-installed R packages.
For more information : Each of these options rely on Docker images that were build off from Dockerfiles that are publicly available in the MAAP workspace repository. If you want to learn more about what libraries each image contains, check out this repository.
Extending environments¶
Users may need libraries for their specific analysis purposes that are not present in the environments of the different workspace options offered. In this case, ideally, the steps should be the following :
- The user explores her/his environment need by extending the environment of an existing workspace or creating her/his custom environment in an existing workspace (see next sections).
- Once that is done, the user submits a ticket/coordinates with the platform team to create a new workspace option with the requested, finalized environment.
The above approach is ideal because modifications to the pre-defined workspace environment do not survive a workspace restart (see next sections), and because sharing new experimented environments is valuable.
The next sections explain how to extend environments or create custom environments, and for this, introduces information regarding which environment management solution we are using.
Package manager¶
We use mamba
(a fast conda
drop-in replacement) as a package manager to install, update or remove packages (libraries). mamba
works with ‘environments’ that are directories in your local file system containing a set of packages. When you work ‘in a given environment’, it means you that your programs will look for dependencies in that environment’s mamba
directory. All workspaces launch with a environment called base
, which is a mamba
environment that has all the
pre-installed libraries. If you open a terminal launcher after creating a Basic Stable
workspace :
You can notice that a base
mamba
environment is activated, and its libraries are located in /opt/conda
.
Extending the base
environment in a given workspace session.¶
Note : any modification to the ``base`` environment does not survive a workspace restart. In other words, modifications to ``/opt/conda`` disappear after a workspace restart.
Extending an existing mamba
environment means adding packages on top of what it contains, which works provided there are no dependency conflicts. You can use a configuration file specifying all the new libraries to add. It is recommended for reproducibility and shareability. See the below sections for configuration file usage.
Alternatively, you can install libraries using the mamba install
command to install additional packages in your current environment (run mamba --help
to learn more about how to use mamba
commands). For example :
mamba install xarray
Custom environments¶
You can use the mamba
CLI to create a new, custom environment.
The parameters (the list of libraries, the location where to search for them, etc…) can be passed either from a configuration YAML file or directly on the console. We recommend using the first option (a YAML file is easier to share and modify).
Basic custom environment¶
Here is an example configuration file that we name env
:
# env.yml
name: env
channels:
- conda-forge
dependencies:
- python=3.8
- pandas=1.5.3
- geopandas=0.12.2
It installs specific versions python
, pandas
and geopandas
from either conda-forge
or defaults
. If versions aren’t specified, the latest is installed. We recommend to always specify the version for reproducibility. The basic command to create this environment would be :
mamba env create -f env.yml
However, this stores this environment files in /opt/conda
, which a directory that is reset when the workspace restarts, and so custom environments are lost. Therefore, you want to specify a storage location in your user directory with the --prefix
parameter
mamba env create -f env.yml --prefix /projects/env
and to activate it :
mamba activate env
Updating an existing environment with a configuration file¶
For this section and the next, you can find example configuration files in example_conda_configuration_files
in the same folder as this notebook.
You can update an existing environment with a configuration file as well. For example, let’s assume you have a mamba
environment with a set of packages already installed in it (for example the base
environment), but it doesn’t have xarray
and geopandas
. You can create a file like base.yml
, that specifies updates to the base
environment. Then, running mamba env update -f base.yml
will update base
by adding xarray
and geopandas
, provided it does not cause
conflicts with the existing libraries.
Using pip
for python packages¶
Some python packages might not be availabe in the channel you are using, or in any mamba
channel. If that package however is in PyPI
(the official python package repository), one can use pip
within a mamba
environment to download packages. The recommended way is to specify this in the configuration file. You can find an example in env2.yml
, that creates an environment named env2
. In that example, we add stackstac
as a dependency to install from PyPI
because it is
not available in the conda-forge
channel.
Using custom environments in jupyter notebooks¶
To make your environment accessible in a Jupyter notebook, you need to register a kernel that has your environment. You can use the ipykernel install
command for this, which means you must list ipykernel
as a dependency in your configuration file. An example is in env3.yml
, that creates an environment named env3
. After creating it, you run the kernel registration command this way :
python -m ipykernel install --user --name env3 --display-name "Python env3"
Once this is done, wait around 30 seconds for the registration to propagate. Then, click on a new launcher button. Among the notebook options, you should see “Python env3”. This will spin up a notebook with your environment. Below you can find a screenshot showing the commands on the notebook :
Suggested packages for custom environment¶
MAAP users typically use the python maap-py
. It’s pre-installed in all workspaces, in the base
mamba environment, but any custom environment should specify it, otherwise it is not going to be accessible from that environment. However, maap-py
is not packaged in a public package repository, like PyPI
or conda-forge
. It is possible to install it directly from its github repository with pip
though. An example configuration file can be found at env4.yml
. You can note that
inside of it maap-py
is ‘versioned’ using a commit hash (at the end of the github URL).