Skip to main content

Python Multi Threading

Speeding up Python code using multithreading

A lot of times we end up writing code in Python which does remote requests or reads multiple files or does processing on some data. And in a lot of those cases I have seen programmers using a simple for loop which takes forever to finish executing. For example:
import requests
from time import time 
url_list = [
    "https://via.placeholder.com/400",
    "https://via.placeholder.com/410",
    "https://via.placeholder.com/420",
    "https://via.placeholder.com/430",
    "https://via.placeholder.com/440",
    "https://via.placeholder.com/450",
    "https://via.placeholder.com/460",
    "https://via.placeholder.com/470",
    "https://via.placeholder.com/480",
    "https://via.placeholder.com/490",
    "https://via.placeholder.com/500",
    "https://via.placeholder.com/510",
    "https://via.placeholder.com/520",
    "https://via.placeholder.com/530",] 
def download_file(url):
    html = requests.get(url, stream=True)
    return html.status_code
start = time()
for url in url_list:
    print(download_file(url))

print(f'Time taken: {time() - start}')

Output:Time taken: 4.128157138824463

This is a sane example and the code will open each URL, wait for it to load, print its status code and only then move on to the next URL. This kind of code is a very good candidate for multi-threading.
Modern systems can run a lot of threads and that means you can do multiple tasks at once with a very low over-head. Why don’t we try and make use of that to make the above code process these URLs faster?
We will make use of the ThreadPoolExecutor from the concurrent.futureslibrary.
from concurrent.futures import ThreadPoolExecutor, as_completed.It is super easy to use. Let me show you some code and then explain how it works.

from time import time 
url_list = [
    "https://via.placeholder.com/400",
    "https://via.placeholder.com/410",
    "https://via.placeholder.com/420",
    "https://via.placeholder.com/430",
    "https://via.placeholder.com/440",
    "https://via.placeholder.com/450",
    "https://via.placeholder.com/460",
    "https://via.placeholder.com/470",
    "https://via.placeholder.com/480",
    "https://via.placeholder.com/490",
    "https://via.placeholder.com/500",
    "https://via.placeholder.com/510",
    "https://via.placeholder.com/520",
    "https://via.placeholder.com/530",]
 def download_file(url):
    html = requests.get(url, stream=True)
    return html.status_code 
start = time() 
processes = []
with ThreadPoolExecutor(max_workers=10) as executor:
    for url in url_list:
        processes.append(executor.submit(download_file, url)) 
for task in as_completed(processes):
    print(task.result())
print(f'Time taken: {time() - start}')

Output:
<--truncated-->
Time taken: 0.4583399295806885

 We just sped up our code by a factor of almost 9! And we didn’t even do anything super involved. The performance benefits would have been even more if there were more urls.
So what is happening? When we call executor.submit we are adding a new task to the thread pool. We store that task in the processes list. Later we iterate over the processes and print out the result.
The as_completed method yields the items (tasks) from processes list as soon as they complete. There are two reasons a task can go to the completed state. It has either finished executing or it got cancelled. We could have also passed in a timeout parameter to as_completed and if a task took longer than that time period, even then as_completed will yield that task.
You should explore multi-threading a bit more. For trivial projects it is the quickest way to speed up your code. If you want to learn, more read the official docs. They are super helpful.

Comments

Post a Comment

Popular posts from this blog

C# IEnumerable and IQueryable

The first important point to remember is IQueryable interface inherits from IEnumerable, so whatever IEnumerable can do, IQueryable can also do.   There are many differences but let us discuss about the one big difference which makes the biggest difference. IEnumerable interface is useful when your collection is loaded using LINQ or Entity framework and you want to apply filter on the collection. Consider the below simple code which uses IEnumerable with entity framework. It’s using a Wherefilter to get records whose EmpId is 2. EmpEntities ent = new EmpEntities(); IEnumerable<Employee> emp = ent.Employees;  IEnumerable<Employee> temp = emp.Where(x => x.Empid == 2).ToList<Employee>(); This where filter is executed on the client side where the IEnumerable code is. In other words all the data is fetched from the database and then at the client its scans and gets the record with EmpId is 2.   But now see the below code we have...