Lately we have found the need to focus some effort on boosting the performance of our services. In the past, we focused only on correctness, whether our software did the task it was supposed to do. However, as we gain users and activity within our system increases, we need to also ensure our software is FAST.

One specific part of our system, the FTP server, requires access to a user permissions. These permissions drive pretty much everything the FTP server does, so it often queries these values from our MySQL server. In some cases like a directory listing, permissions can be retrieved many times for a single operation. This is the perfect opportunity for caching. The strategy we chose was to fetch all of a users’ permissions when they log in, and then refer to the cached permissions throughout the session. Each FTP server process would maintain it’s own cache for the users logged into it.

Stale data is of little concern, since permissions update each time the user logs in. 99% of the time, the permissions for all logged in users fit comfortably in RAM. However, the big risk is that if a huge number of users are logged in simultaneously, the amount of RAM needed to cache all of their permissions could be too great. Because of this, we decided to limit the number of permission sets we would cache, fetching from our data store if the permissions were not found (cache miss).

We would prefer to keep the most frequently accessed permissions (busiest users) in the cache, while evicting those which are infrequently used. There is a device called an LRU, (Least Recently Used) for just this case. All cached items are associated with an age, the age is reset whenever an item is accessed. When the cache size must be reduced (perhaps to accomodate new items) the oldest items are evicted (the oldest items are the ones used longest ago).

Thus began my search for a Python LRU, I found some examples, but I did not find a package that I could readily use. I put together a working LRU pretty quickly and installed it into our FTP server code base. This had the desired effect, MySQL queries happened only when a user logged in, throughout their session no further access was needed to the database.

A few days later, while reading an article, I came across a link to a ready-made python LRU. I was disappointed that I had not initially found this package, but happy to compare it to my own. In the end, I decided to replace my implementation with the one from pylru. This decision was driven in no small part by the fact that pylru is up to three times faster than my naive version, I benchmarked both of them and pylru came out on top. Of course, the other reason to use someone else’s implementation is that it is maintained by someone else!

Leave a Reply

Your email address will not be published. Required fields are marked *