๐ Setting up subprocess.Popen for streaming child process messages to stdout ๐งโ๐ป and a to a log file ๐
One day we needed to run a Python CLI application that loaded test data to a database. We wanted to run this script as part of the unittests
set up that we were building for automating a section of our application end to end tests. This CLI program had the following characteristics:
- Printed logs to stdout and stderr.
- Lots of logs were printed.
- It took about 5 minutes to complete on a standard developer laptop (2022).
Here we have a bare-bones version of the CLI application (child_process.py
):
# child_process.py
import contextlib
import logging
import sys
from time import sleep
from typing import Generator, Optional, TextIO
logger = logging.getLogger(__name__)
LOG_FORMAT = "%(asctime)s %(levelname)s:%(name)s:%(lineno)d:%(message)s"
@contextlib.contextmanager
def set_logging(
console_stream: Optional[TextIO] = None,
) -> Generator[None, None, None]:
if console_stream is None:
console_stream = sys.stderr
console_handler = logging.StreamHandler(console_stream)
console_handler.setLevel(logging.DEBUG)
console_handler.setFormatter(logging.Formatter(LOG_FORMAT))
root = logging.getLogger()
root.addHandler(console_handler)
old_level = root.level
root.setLevel(logging.DEBUG)
try:
yield
finally:
root.setLevel(old_level)
root.removeHandler(console_handler)
console_handler.flush()
console_handler.close()
if __name__ == "__main__":
with set_logging():
for i in range(5):
logger.debug(f"Stderr: Spam! {i}")
# Sleep to simulate some other work
sleep(1)
if i % 2:
print(f"Stdout: Ham! {i}")
Note that the log set up sends the logs to stderr and that there are some messages going to stdout via the print
function.
Since it took a while for the process to complete, we wanted to see the application logs while the process was executing in case runtime problems occurred at this set up step, for example, misuse of the CLI application parameters, or missing environment variables.
Next, we have a version of test code that we were building to automate parts of our end to end tests (tests/test_child_process.py
):
# tests/test_child_process.py
import subprocess
import sys
from typing import List
import unittest
def run_process(args: List[str]) -> subprocess.CompletedProcess:
return subprocess.run(
args,
stdout=sys.stdout,
stderr=sys.stdout,
encoding="utf-8",
check=True,
)
class TestLogCapture(unittest.TestCase):
@classmethod
def setUpClass(cls) -> None:
cmd = ["python", "child_process.py"]
completed_proc = run_process(cmd)
print("Stream done")
print(completed_proc)
def test_smoke(self):
print("Test 1")
Note that we have asked our helper run_process
function to route the child_process.py
stdout and stderr to the system's stdout. In this way, we will be able to see the logs as they occur, without having to wait for the program to complete (or fail). Running this test we get:
$ python -m unittest -v tests/test_child_process.py
2024-04-29 18:24:45,710 DEBUG:__main__:44:Stderr: Spam! 0
2024-04-29 18:24:46,711 DEBUG:__main__:44:Stderr: Spam! 1
Stdout: Ham! 1
2024-04-29 18:24:47,713 DEBUG:__main__:44:Stderr: Spam! 2
2024-04-29 18:24:48,714 DEBUG:__main__:44:Stderr: Spam! 3
Stdout: Ham! 3
2024-04-29 18:24:49,715 DEBUG:__main__:44:Stderr: Spam! 4
Stream done
CompletedProcess(args=['python', 'child_process.py'], returncode=0)
test_smoke (tests.test_child_process.TestLogCapture) ... Test 1
ok
----------------------------------------------------------------------
Ran 1 test in 5.044s
OK
Great! With this subprocess.run
configuration we can see the logs as they occur. Note that the lines with "Stderr: Spam! 1" and "Stdout: Ham! 1" print one after the other, meaning that stdout and stderr messages from the child process are sent to the main process stdout as soon as they are received (line buffered).
๐พ Persisting the logs
Monitoring the process logs as they occur is nice, but what if we want to route those same logs to a file and to stdout at the same time?
This turned out to be a bit more tricky than I anticipated, and required us to reach for subprocess.Popen
๐ซฃ
However, I'll also list here the alternative ideas I had which didn't work, in case I run into a similar problem in the future hoping to discard them more quickly or improve them if I've had acquired the missing knowledge:
Passing a file handle created with open
๐กIdea: subprocess.run
stdout argument accepts a file handle. When the text mode parameters are passed in, the open
function returns a TextIO
file handle. Because we also want to send the message to stdout
we could monkey patch TextIO.write
method to write a message to a file and to stdout.
๐Sticking point: Looking at the underlying write
method code, it seems to be written in C ๐ชwith a lot of low level dependencies relying on it. Where do I even begin? Where is the right place to do this monkey patching? I can't patch a monkey if I don't even know where it is ๐.
Passing the sys.stdout
stream
๐กIdea: subprocess.run
stdout argument accepts a stream. This is what we are using in the bare-bones test module. I learned from the wikiversity that sys.stdout
is backed by an actual file (os.ttyname(0)
). Could we copy the content of this file into our target test file after the process runs?
๐Sticking point: I tried copying the file after the child process executed in the set up method, but the copying step hangs ๐ It might have to do with Python's buffering, or that the source file is locked, I honestly don't know, so we will have to try something else.
Using subprocess.Popen
๐ซฃ
Well, with great power comes great responsibility ๐ธ๏ธ Initially, I mostly tip toed my way around this option owing to my previous experience with it mainly hanging while waiting for Popen.communicate
to do its work, which is not well suited for our purpose here because we want our solution to work with a lot of log messages. However, since that past experience I came across Luca's blog post that gave me a clear sight to a potential solution.
popen
stuff, I suggest you read it too. If you are facing these kinds of problems, it is sure to inspire you and perhaps lead you to a solution.Where were we?... Oh yes, the code. Back to our testing module, I made the following changes
# tests/test_child_process.py
import os
from pathlib import Path
import subprocess
import sys
import tempfile
from typing import List
import unittest
def run_process(args: List[str], log_file_path) -> subprocess.CompletedProcess:
env = os.environ
env["PYTHONUNBUFFERED"] = "True"
with open(log_file_path, "at", encoding="utf-8") as fh:
with subprocess.Popen(
args,
bufsize=1,
text=True,
encoding="utf-8",
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
env=env,
) as proc:
while (ret_code:=proc.poll()) is None:
line = proc.stdout.readline()
sys.stdout.write(line)
fh.write(line)
return ret_code
class TestLogCapture(unittest.TestCase):
@classmethod
def setUpClass(cls) -> None:
cls.tmpdir = tempfile.TemporaryDirectory()
cls.addClassCleanup(cls.tmpdir.cleanup)
cls.test_child_log_path = Path(cls.tmpdir.name) / "test_child-process.log"
cmd = ["python", "child_process.py"]
completed_proc = run_process(cmd, cls.test_child_log_path)
print("Stream done")
print(f"Process return code: {completed_proc}")
def test_smoke(self):
print("Test 1")
def test_child_process_written_to_file(self):
print("\n")
print("Printing from test_child_process_written_to_file")
with open(self.test_child_log_path, "r", encoding="utf-8") as fh:
for line in fh:
print(line.rstrip())
Note that our helper run_process
function now accepts a file handle and uses subproces.Popen
๐ซฃ, as promised. The keys to get the desired behavior are:
- Setting
bufsize
to 1, so the stdout and stderr pipe file objects are opened in line buffered mode (passed behind the scenes to our friend theopen
function). - Setting
text
to True, so the stdout and stderr pipe file objects are text streams and not byte streams. - Setting
stderr
tosubprocess.STDOUT
, so the child process stderr is captured on the same file as its stdout. - Setting the
PYTHONUNBUFFERED
environment variable so the print statements of the child process are written (flushed) as they occur in the code. This Real Python blog post was the source of inspiration here.
In addition to these changes, I also altered the test class to include a temporary directory and log file in the setUpClass
, and added a second "test" that prints the content of the log file.
Running this test module we get:
$ python -m unittest tests.test_child_process -v
2024-05-10 17:00:14,293 DEBUG:__main__:44:Stderr: Spam! 0
2024-05-10 17:00:15,294 DEBUG:__main__:44:Stderr: Spam! 1
Stdout: Ham! 1
2024-05-10 17:00:16,295 DEBUG:__main__:44:Stderr: Spam! 2
2024-05-10 17:00:17,296 DEBUG:__main__:44:Stderr: Spam! 3
Stdout: Ham! 3
2024-05-10 17:00:18,298 DEBUG:__main__:44:Stderr: Spam! 4
Stream done
Process return code: 0
test_child_process_written_to_file (tests.test_child_process.TestLogCapture) ...
Printing from test_child_process_written_to_file
2024-05-10 17:00:14,293 DEBUG:__main__:44:Stderr: Spam! 0
2024-05-10 17:00:15,294 DEBUG:__main__:44:Stderr: Spam! 1
Stdout: Ham! 1
2024-05-10 17:00:16,295 DEBUG:__main__:44:Stderr: Spam! 2
2024-05-10 17:00:17,296 DEBUG:__main__:44:Stderr: Spam! 3
Stdout: Ham! 3
2024-05-10 17:00:18,298 DEBUG:__main__:44:Stderr: Spam! 4
ok
test_smoke (tests.test_child_process.TestLogCapture) ... Test 1
ok
----------------------------------------------------------------------
Ran 2 tests in 5.048s
OK
Great success! Note that the stdout and stderr are written in the correct order (intertwined) while the child process runs, and that the log file captured the messages in the same manner ๐
๐ Resources
- โ๏ธ Python's subprocess module.
- ๐ฎ๐น Luca's Python's subprocess buffers blog post.
- ๐ Real Python blog post: How to Flush the Output of the Python Print Function.
- ๐ Wikiversity's console output lesson.
- ๐จโ๐ณ Python's logging cookbook.
๐ Summary
In this post we learned:
- How to configure Python's
subprocess.run
to stream the child process stdout and stderr to the main process stdout as they occur. - Configuring
subprocess.run
to stream the child messages to stdout and to a file can be deceptively challenging. - Python's
subprocess.Popen
class gives great flexibility for accomplishing the previous task, but we must be aware of the limitation of its methods (e.g.popen.communicate
) that might be blocked waiting for the stream of messages to flush. - We can use the
PYTHONUNBUFFERED
environment variable to make the child process messages (logs and prints) flush every time a new line character is encountered.