Multi-processing vs Multi-threading

In python, multiprocessing and multithreading are common terms when it comes to improve the application performance using concurrent and parallel programming.

Let’s look at each term individually.

Multithreading–ability to run code on separate processors inside a single python process. This is a common approach to spread the work load across available CPUs. But there is a big problem with this. If you have been in Python world, you might have encountered the word GIL(Global Interpreter Lock), a lock that must be acquired each time python needs to execute the code. So no matter how many CPUs are there, python will execute the threads in a sequence, one after another. This means, if you try to scale the application by adding more threads, you will always be limited by GIL.

Each time to execute a thread processor switches the context. Context switching happens so fast that it looks like that multiple threads are being executed in parallel when, in fact, they are not. It’s very important to know that by using multithreading we are still limited to single worker or processor. And in order to achieve more amount of computation done in parallel we have to employ more workers or processes.

Let’s look at some thread code and its performance:

    import threading
    import random

    output = []
    def compute():
        output.append(sum([random.randint(1, 100) 
                                  for i in range(100000)]))

    workers = [threading.Thread(target=compute) 
                                  for i in range(5)]
    for worker in workers:
        worker.start()
    for worker in workers:
        worker.join()

    print 'Output: {}'.format(output)

command:

   time python2.7 thread_perf.py

output:

    Output: [5065611, 5067336, 5046412, 5052144, 5055886]
    python thred_perf.py 1.50s user 0.65s system 126% cpu 1.699 total

Above program was executed using python2.7 on a machine with 4 cores CPU. Which means python could have utilized 400% of CPU power but it only managed to use 126% which is ~30% of hardware’s capability.

Multiprocessing–very similar in nature to threads. Allow us to do pretty much everything threads can do. But it is not bound to singular CPU core. Which means if we have 4 cores CPU via multiprocessing their power can be fully utilized.

Multiprocessing code:

    import multiprocessing
    import random

    def compute(n):
        return sum([random.randint(1, 100)
                    for i in range(1000000)])

    pool = multiprocessing.Pool(5)
    print "Output: {}".format(pool.map(compute, range(5)))

command:

    time python2.7 process_perf.py 

output:

    Output: [50475871, 50520878, 50519976, 50513625, 50485645]
    python process_perf.py 16.71s user 0.31s system 302% cpu 5.618 total

Based on above results it’s clear that in compare to multithreading, multiprocessing managed to utilized ~300% of CPU power. Which is ~75% of total hardware’s capability.

The way multiprocessing works is, it contains one main primary thread and can spawn multiple sub-threads.

 Conclusion

Multiprocessing is better solution whenever there is a need for spreading workload among multiple CPU cores. In other words when the programs are CPU bound. Whereas multithreading should be use when the programs are I/O bound, for example, downloading a file from internet
. For example, listening multiple sockets and waiting for the input.

 
1
Kudos
 
1
Kudos

Now read this

Workflow with airflow

Airflow is an open source project started at Airbnb. It is a tool to orchestrate the desire flow of your application dynamically which is readily scalable to infinity because of it modular architecture and message queuing mechanism. It... Continue →