CALCULATE PI USING MONTECARLO SIMULATION – PHASE III – aka "Multicarlo"

Introduction

After Phase II, we discovered that Numpy wasn’t going to help. Also, getting a half way decent approximation of Pi was going to take millions of paths / attempts. So on a slimmed down calculation where the [x,y] position of each random try was not kept, what could be done to make this as quick and accurate as possible?

Multi-Processing?

Can I use the 16 Cores on my machine to go 16X faster? Am I being naiive here? If I am, how and what is the most that can be done? I would like to see if the infamous G.I.L will Get me Into Logjam or not.

First Attempt [Disaster]

from multiprocessing import Pool
import os

num_cpus = os.cpu_count()
print('Num of CPUS: {}'.format(num_cpus))

try:
    pool = Pool(num_cpus) # on 8 processors
    data_outputs = pool.map(calc_pi_quick, [10])
    print (data_outputs)
finally: # To make sure processes are closed in the end, even if errors happen
    pool.close()
    pool.join()

Disappointingly, not only is it not any quicker; in fact, doesn’t finish running AT ALL.

Awkward, lets look at why that is happening first?

  • I quite like Medium Articles and this one seems to hit the spot.
  • Also this from StackOverflow

Second Attempt

from random import random

def calc_pi_quick(num_attempts: int) -> float:
    from random import random

    inside = 0

    for _ in range(num_attempts):
        x = random()
        y = random()

        if x ** 2 + y ** 2 <= 1:
            inside += 1

    return 4 * inside / num_attempts


def multi_calc_pi(num_attempts: int, verbose=False) -> float:
    from multiprocessing import Pool
    from random import random
    import os
    from statistics import mean

    # lets leave one spare for OS related things so everything doesn't freeze up
    num_cpus = os.cpu_count() - 1
    # print('Num of CPUS: {}'.format(num_cpus))

    try:
        attempts_to_try_in_process = int(num_attempts / num_cpus)
        pool = Pool(num_cpus)
        # what I am trying to do here is get the calc happening over n-1 Cpus each with an even
        # split of the num_attempts. i.e. 150 attempts over 15 CPU will lead to 10 each then the mean avg
        # of the returned list being returned.
        data_outputs = pool.map(calc_pi_quick, [attempts_to_try_in_process] * num_cpus)
        return mean(data_outputs)
    finally:  # To make sure processes are closed in the end, even if errors happen
        pool.close()
        pool.join()

To get multiprocessing to work on Jupyter Notebooks and Windows 10, I had to save this code to a different file to be called from the Jupyter Notebook, and executed within __main__. However once I did, this did lead to a significant performance increase.

Collect the Stats

import calcpi
from timeit import default_timer as timer
from datetime import timedelta
import matplotlib.pyplot as plt

if __name__ == '__main__':
    
    num_attempts = [50, 500, 5_000, 50_000, 500_000, 5_000_000, 50_000_000]
    multi_timings = []
    single_timings = []
    
    for x in num_attempts:
    
        start = timer()
        multi_pi = calcpi.multi_calc_pi(x)
        end = timer()  
        multi_timings.append((end-start))
        print("MultiCarlo Pi: {}: {} took {}".format(multi_pi, x, timedelta(seconds=end-start)))

        start = timer()    
        pi = calcpi.calc_pi_quick(x)
        end = timer()
        single_timings.append((end-start))
        print("MonteCarlo Pi: {}: {} took {}".format(pi, x, timedelta(seconds=end-start)))

    plt.plot(num_attempts, multi_timings)
    plt.plot(num_attempts, single_timings)

    plt.show()

Display the stats in a more meaningful way

import plotly.graph_objects as go

import numpy as np

# lets pretty this up so the axis label is more readable
x_axis = ['{:,}'.format(x) for x in num_attempts]

fig = go.Figure(data=[
    go.Bar(name='Multi Process', x=x_axis, y=multi_timings),
    go.Bar(name='Single Process', x=x_axis, y=single_timings)
])
# Change the bar mode
fig.update_layout(xaxis_type='category',barmode='group')
fig.show()
Single vs Multiprocessing for increasing simulation size

Early on in the exponentially increasing simulation size, the overhead of creating the Pool, asking the os for CPU count the splitting the attempts up to the pool seemed to result in a floor of 0.4s. However, at that level the Pi value returned was nonsense anyway. So I think the overhead for any reasonable approximation of Pi is worth the set up cost in this case.

Baby, can I get your Numba?

Can we go even further? I think we could give Numba a try.

After some research and reading around the internet for a day or so while avoiding Covid-19 I found out it is incredibly simple.

4 little chars that change everything. @jit

from random import random
from numba import njit, jit

@jit
def calc_pi_numba(num_attempts: int) -> float:
    inside = 0

    for _ in range(num_attempts):
        x = random()
        y = random()

        if x ** 2 + y ** 2 <= 1:
            inside += 1

    return 4 * inside / num_attempts

That is it. Just one import and one teensy little decorator and we are away. How does this stack up agianst my mighty multi monte carlo from earlier though?

Lets get some timings

import calcpi
from timeit import default_timer as timer
from datetime import timedelta
import matplotlib.pyplot as plt
import plotly.graph_objects as go

if __name__ == '__main__':
    
    num_attempts = [50, 500, 5_000, 50_000, 500_000, 5_000_000, 50_000_000]
    function_list = [calcpi.calc_pi_quick, calcpi.calc_pi_numba, calcpi.multi_calc_pi]
    results_dict = {}

    for func in function_list:

        timings = []
        
        for x in num_attempts:
        
            start = timer()
            pi = func(x)
            end = timer()  
            timings.append((end-start))
            # print("{} => Pi: {} Attempts: {} Time: {}".format(func.__name__, pi, x, timedelta(seconds=end-start)))

        results_dict[func.__name__] = timings
    
    go_data = []
    # lets pretty this up so the axis label is more readable
    x_axis = ['{:,}'.format(x) for x in num_attempts]
    
    for func in function_list:
    
        plt.plot(num_attempts, results_dict[func.__name__])
        
        go_data.append(go.Bar(name=func.__name__, x=x_axis, y=results_dict[func.__name__]))

    plt.show()

    fig = go.Figure(data=go_data)
    # Change the bar mode
    fig.update_layout(xaxis_type='category',barmode='group')
    fig.show()
        

Results visulisation

X: Attempts Y: Seconds to run
X: Attempts Y: Seconds to run

Oh… it absolutely smashes it.

Out of the park for

  • Speed
  • Readability
  • Implementation simplicity