Memory Profiling Python Scripts in the MAAP ADE

Authors: Rajat Shinde (UAH), Alex Mandel (DevSeed), Jamison French (DevSeed), Sheyenne Kirkland (UAH), Brian Freitag (NASA MSFC), Chuck Daniels (DevSeed)

Date: February 7, 2024

Description: Memory profiling your Python script is a good practice to understand the resource requirements. This is useful when you have working code and you want to estimate the size of the DPS worker to be used. Additionally, it is helpful to optimize the code for resource requirements.

In this tutorial, we will use memory-profiler for profiling a sample Python script demo_memory_profiling.py. We also see how to log the output to a .log file.

Run This Notebook

To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the “Getting started with the MAAP” section of our documentation.

Disclaimer: It is highly recommended to run a tutorial within MAAP’s ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors.

Additional Resources

  1. https://github.com/pythonprofilers/memory_profiler

Installation

We will begin by installing memory-profiler in the current working environment.

[1]:
# !pip install -U memory-profiler
Collecting memory-profiler
  Using cached memory_profiler-0.61.0-py3-none-any.whl (31 kB)
Requirement already satisfied: psutil in /opt/conda/envs/vanilla/lib/python3.10/site-packages (from memory-profiler) (5.9.7)
Installing collected packages: memory-profiler
Successfully installed memory-profiler-0.61.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

Add Decorator

Typically, line-by-line memory usage is required for analyzing code. For this example, we are creating dummy functions to be profiled, named my_function and my_other_function.

You may add the @profile decorator to individual functions that you want to profile. This allows you to limit which parts of your program are profiled, thus limiting the volume of profiling output.

However, this requires you to modify your code. If you wish to avoid modifying your code, particularly when it is not yet obvious which parts of your code may be consuming too much memory, simply add the -m memory_profiler to the python command.

[13]:
from memory_profiler import profile

@profile
def my_function():
    # Perform some potentially memory-intensive computation

    return 0

@profile
def my_other_function():
    # Perform some potentially memory-intensive computation

    return 0

def main():
    my_function()
    my_other_function()

    #...

if __name__ == "__main__":
    main()
ERROR: Could not find file /tmp/ipykernel_314/3429381877.py
ERROR: Could not find file /tmp/ipykernel_314/3429381877.py

Running Memory Profiler

For understanding how to run memory profiler on an existing Python script from a Jupyter notebook, we copied the code snippet from above to a file named demo_memory_profiling.py in the working directory. After executing the Python script, we can see the details about memory usage and increment due to a particular line in the output.

[10]:
# With @profile decorator in the script

!python demo_memory_profiling.py
Filename: /projects/maap-documentation/docs/source/technical_tutorials/user_data/demo_memory_profiling.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     6     43.6 MiB     43.6 MiB           1   @profile
     7                                         def my_function():
     8                                             # Include each line of the script which needs to be profiled
     9                                             # under this function
    10
    11     43.6 MiB      0.0 MiB           1       return 0


Filename: /projects/maap-documentation/docs/source/technical_tutorials/user_data/demo_memory_profiling.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    13     43.6 MiB     43.6 MiB           1   @profile
    14                                         def my_other_function():
    15                                             # Include each line of the script which needs to be profiled
    16                                             # under this function
    17
    18     43.6 MiB      0.0 MiB           1       return 0


Logging the Output

By default, the output can be seen in the cell output or on the command line as standard output. This can be changed to store the output in a log file. For more details, it is recommended to follow the documentation.

[14]:
fp=open('memory_profiler.log','w+')
@profile(stream=fp)
def my_function():
    # Perform some potentially memory-intensive computation

    return 0

@profile(stream=fp)
def my_other_function():
    # Perform some potentially memory-intensive computation

    return 0

def main():
    my_function()
    my_other_function()

    #...

if __name__ == "__main__":
    main()
ERROR: Could not find file /tmp/ipykernel_314/1328302985.py
ERROR: Could not find file /tmp/ipykernel_314/1328302985.py

To test the logging, we will run memory profiling on the demo_memory_profiling_logging.py script saved in the working directory.

[12]:
!python demo_memory_profiling_logging.py

After executing the above script, we can see that the memory profiling output is saved in the memory_profiler.log file. You can also log profiling output to different log files for different functions by defining a separate logging file in the argument fp.