# CALCULATE PI USING MONTECARLO SIMULATION – PHASE II ## Context

Carrying on from Phase I, I wanted to see how I could take the initial answer and make it more efficient. The first port of call then is… how expensive is it at the moment. Will any changes I make have a significant impact on that baseline.

## timeit

I dont think using %%timeit as an ipython coding secret on the cell will cut the mustard, but we shall see. I would like to be able to single out code sections for timing too.

#### Calc Pi Unoptimised

```def calc_pi(num_attempts, display_plot=False):

import matplotlib
import matplotlib.pyplot as plt
from random import random

inside = 0

x_inside = []
y_inside = []
x_outside = []
y_outside = []

for _ in range(num_attempts):
x = random()
y = random()
if x**2+y**2 <= 1:
inside += 1
x_inside.append(x)
y_inside.append(y)
else:
x_outside.append(x)
y_outside.append(y)

pi = 4*inside/num_attempts

if display_plot:
fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.scatter(x_inside, y_inside, color='g')
ax.scatter(x_outside, y_outside, color='r')

return pi
```

#### Calc Pi Quick (Version 1)

Quick Python only changes, remove array recording and if we never plan to plot remove the if statement

```def calc_pi_quick(num_attempts: int):

inside = 0

for _ in range(num_attempts):
x = random()
y = random()
if x**2+y**2 <= 1:
inside += 1

return 4*inside/num_attempts
```

#### Initial comparison

```from timeit import default_timer as timer
from datetime import timedelta

start = timer()
calc_pi(50000000, True)
end = timer()
print(timedelta(seconds=end-start))

start = timer()
calc_pi(50000000, False)
end = timer()
print(timedelta(seconds=end-start))

start = timer()
calc_pi_quick(50000000)
end = timer()
print(timedelta(seconds=end-start))
```

#### Output

```calc_pi with plotly display time: 0:00:08.318734
calc_pi no display time: 0:00:00.625193
calc_pi_quick time: 0:00:00.470770```

We start to see there are some decent performance benefits. Lets check if these are linear or not. However to see some really decent improvements I think we can start to think about threading perhaps? Also if we still wanted to record the hit and miss positions, would numpy be quick enough that we can keep this data without a significant performance hit?

### Numpy Optimisation Attempts

Spoiler Alert… my first attempt was absolutely terrible. However, in a way that’s great news. I now know what not to do with numpy on a non trivial problem, and now so do you.

```def calc_pi_numpi_bad(num_attempts):

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from random import random

inside = 0

x_inside = np.array([])
y_inside = np.array([])
x_outside = np.array([])
y_outside = np.array([])

for _ in range(num_attempts):
x = random()
y = random()
if x**2+y**2 <= 1:
inside += 1
x_inside = np.append(x_inside, x)
y_inside = np.append(y_inside, y)
else:
x_outside = np.append(x_outside, x)
y_outside = np.append(y_outside, y)

pi = 4*inside/num_attempts

return pi
```

How bad? Well lets see…

```def measure_performance_verbose(num_attempts: int) -> None:

from timeit import default_timer as timer
from datetime import timedelta

start = timer()
calc_pi(num_attempts, False)
end = timer()
time_taken = end-start
print('calc_pi time: {}'.format(timedelta(seconds=time_taken)))

start = timer()
calc_pi_quick(num_attempts)
end = timer()
quick_time_taken = end-start
print('calc_pi_quick time: {}'.format(timedelta(seconds=quick_time_taken)))

percentage_improvement = quick_time_taken / time_taken * 100

print('Precentage improvement: {}'.format(percentage_improvement))

start = timer()
calc_pi_numpi(num_attempts)
end = timer()
numpi_time_taken = end-start
print('calc_pi_numpi time: {}'.format(timedelta(seconds=numpi_time_taken)))

start = timer()
calc_pi_numpi_bad(num_attempts)
end = timer()
numpi_bad_time_taken = end-start
print('calc_pi_numpi_bad time: {}'.format(timedelta(seconds=numpi_bad_time_taken)))

measure_performance_verbose(500_000)
```
```calc_pi time: 0:00:00.636658
calc_pi_quick time: 0:00:00.462309
Precentage improvement: 72.6149660351135
calc_pi_numpi time: 0:00:00.616958
calc_pi_numpi_bad time: 0:12:52.784511
```

Whoa, what just happened there. It takes TWELVE seconds to run for 500,000 attempts. That is atrocious.

Lets look at the documentation for this function.

Ahh, that explains it. Every append copies the existing array into a new one. I did not know that, shame on me. I have always used np arrays as data converted from pandas or existing python lists, not created and appended to them on the fly.

However, we can fix it by pre-allocating the mem to the max num required and then mask the results if zero in plotly.

```def calc_pi_numpi(num_attempts, display_plot=False, display_masked_plot=False):

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from random import random

inside = 0

x_inside = np.zeros(num_attempts)
y_inside = np.zeros(num_attempts)
x_outside = np.zeros(num_attempts)
y_outside = np.zeros(num_attempts)

for i in range(num_attempts):
x = random()
y = random()
if x**2+y**2 <= 1:
inside += 1
x_inside[i] = x
y_inside[i] = y
else:
x_outside[i] = x
y_outside[i] = y

if display_masked_plot:

fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.scatter(np.ma.masked_equal(x_inside,0), np.ma.masked_equal(y_inside,0), color='g')
ax.scatter(np.ma.masked_equal(x_outside,0), np.ma.masked_equal(y_outside,0), color='r')

if display_plot:

x_inside2 = x_inside[x_inside !=0]
y_inside2 = y_inside[y_inside !=0]
x_outside2 = x_outside[x_outside !=0]
y_outside2 = y_outside[y_outside !=0]

fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.scatter(x_inside2, y_inside2, color='g')
ax.scatter(x_outside2, y_outside2, color='r')

pi = 4*inside/num_attempts

return pi
```

There are two booleans so that we can test that both work.

```calc_pi_numpi(256, True, True)
``` maskednumpy zero removal
```start = timer()
calc_pi_numpi(500_000, True, False)
end = timer()
time_taken = end-start
print('calc_pi normal display time: {}'.format(timedelta(seconds=time_taken)))

start = timer()
calc_pi_numpi(500_000, False, True)
end = timer()
time_taken = end-start
print('calc_pi masked display time: {}'.format(timedelta(seconds=time_taken)))
```
```calc_pi normal display time: 0:00:02.356013
calc_pi masked display time: 0:00:04.012019```

Much much better.

```start = timer()
calc_pi_numpi(500_000, False, False)
end = timer()
time_taken = end-start
print('calc_pi no display time: {}'.format(timedelta(seconds=time_taken)))

start = timer()
calc_pi(500_000)
end = timer()
time_taken = end-start
print('calc_pi no display time: {}'.format(timedelta(seconds=time_taken)))
```
```calc_pi no display time: 0:00:00.622075
calc_pi no display time: 0:00:00.655379```

So… numpy doesn’t make things better, but used correctly, doesn’t make things any worse. So this is a dead end. Also we don’t want to plot for the numbers higher than a million as it looks to the human eye to be , a complete quarter circle. in plotly anyway so we dont need to store large amounts of array data anyway

While I am here I wanted to utilise the “functions are first class objects” paradigm in python. I have hinted at it by showing the initial fuction as _verbose but its much cleaner to call that same code within a functor ( of sorts ) loop as below.

```def measure_performance(num_attempts: int) -> None:

from timeit import default_timer as timer
from datetime import timedelta

funcs_to_call = [calc_pi, calc_pi_quick, calc_pi_numpi, calc_pi_numpi_bad]

for func in funcs_to_call:
start = timer()
calc_pi(num_attempts, False)
end = timer()
time_taken = end-start
print('{} time: {}'.format(func.__name__, timedelta(seconds=time_taken)))

measure_performance(50000)
```

In Phase III -> aka PiMultiCarlo I want to loo at what we can do locally on a decent multi core machine to get some decent speed improvements for only calculating the number only, not storing the guesses.

Tagged with: