Memory Profiling Python Scripts in the MAAP ADE
Authors: Rajat Shinde (UAH), Alex Mandel (DevSeed), Jamison French (DevSeed), Sheyenne Kirkland (UAH), Brian Freitag (NASA MSFC), Chuck Daniels (DevSeed)
Date: February 7, 2024
Description: Memory profiling your Python script is a good practice to understand the resource requirements. This is useful when you have working code and you want to estimate the size of the DPS worker to be used. Additionally, it is helpful to optimize the code for resource requirements.
In this tutorial, we will use memory-profiler for profiling a sample Python script demo_memory_profiling.py. We also see how to log the output to a .log
file.
Run This Notebook
To access and run this tutorial within MAAP’s Algorithm Development Environment (ADE), please refer to the “Getting started with the MAAP” section of our documentation.
Disclaimer: It is highly recommended to run a tutorial within MAAP’s ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors.
Additional Resources
Installation
We will begin by installing memory-profiler
in the current working environment.
[1]:
# !pip install -U memory-profiler
Collecting memory-profiler
Using cached memory_profiler-0.61.0-py3-none-any.whl (31 kB)
Requirement already satisfied: psutil in /opt/conda/envs/vanilla/lib/python3.10/site-packages (from memory-profiler) (5.9.7)
Installing collected packages: memory-profiler
Successfully installed memory-profiler-0.61.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Add Decorator
Typically, line-by-line memory usage is required for analyzing code. For this example, we are creating dummy functions to be profiled, named my_function
and my_other_function
.
You may add the @profile
decorator to individual functions that you want to profile. This allows you to limit which parts of your program are profiled, thus limiting the volume of profiling output.
However, this requires you to modify your code. If you wish to avoid modifying your code, particularly when it is not yet obvious which parts of your code may be consuming too much memory, simply add the -m memory_profiler
to the python command.
[13]:
from memory_profiler import profile
@profile
def my_function():
# Perform some potentially memory-intensive computation
return 0
@profile
def my_other_function():
# Perform some potentially memory-intensive computation
return 0
def main():
my_function()
my_other_function()
#...
if __name__ == "__main__":
main()
ERROR: Could not find file /tmp/ipykernel_314/3429381877.py
ERROR: Could not find file /tmp/ipykernel_314/3429381877.py
Running Memory Profiler
For understanding how to run memory profiler on an existing Python script from a Jupyter notebook, we copied the code snippet from above to a file named demo_memory_profiling.py
in the working directory. After executing the Python script, we can see the details about memory usage and increment due to a particular line in the output.
[10]:
# With @profile decorator in the script
!python demo_memory_profiling.py
Filename: /projects/maap-documentation/docs/source/technical_tutorials/user_data/demo_memory_profiling.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
6 43.6 MiB 43.6 MiB 1 @profile
7 def my_function():
8 # Include each line of the script which needs to be profiled
9 # under this function
10
11 43.6 MiB 0.0 MiB 1 return 0
Filename: /projects/maap-documentation/docs/source/technical_tutorials/user_data/demo_memory_profiling.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
13 43.6 MiB 43.6 MiB 1 @profile
14 def my_other_function():
15 # Include each line of the script which needs to be profiled
16 # under this function
17
18 43.6 MiB 0.0 MiB 1 return 0
Logging the Output
By default, the output can be seen in the cell output or on the command line as standard output. This can be changed to store the output in a log file. For more details, it is recommended to follow the documentation.
[14]:
fp=open('memory_profiler.log','w+')
@profile(stream=fp)
def my_function():
# Perform some potentially memory-intensive computation
return 0
@profile(stream=fp)
def my_other_function():
# Perform some potentially memory-intensive computation
return 0
def main():
my_function()
my_other_function()
#...
if __name__ == "__main__":
main()
ERROR: Could not find file /tmp/ipykernel_314/1328302985.py
ERROR: Could not find file /tmp/ipykernel_314/1328302985.py
To test the logging, we will run memory profiling on the demo_memory_profiling_logging.py
script saved in the working directory.
[12]:
!python demo_memory_profiling_logging.py
After executing the above script, we can see that the memory profiling output is saved in the memory_profiler.log
file. You can also log profiling output to different log files for different functions by defining a separate logging file in the argument fp
.