1. Introduction
Monte Carlo simulations are a powerful in numerous fields, including operations research, game theory, physics, mathematics, actuarial sciences, finance, among others. It is a technique used to measure risk and uncertainty when making a decision. To put it simple: a Monte Carlo simulation runs an enormous number of statistical experiments with random numbers generated from an underlying distribution based on a given time series. Brownian motion, or random walk, is the main driver for forecasting the future price.
On this article, I will present some results for Bitcoin close price prediction one week from now, including convergence tests for the number of simulations as well as sensitivity analysis for the historical data sample range. All code was written in Python and will be provided at the end of this article.
2. Method
The method consist in obtaining a mean and standard deviation from a given sample data (time series), on this particular case the close price data of Bitcoin from a certain time range. For this article, I will be working with the daily close price.
Step 1: download Bitcoin historical data
The Bitcoin historical data was downloaded and saved as .csv file using the Binance API. For this step, it is needed a Binance account and generation of API keys imported from a config.py file with the strings API_KEY and API_SECRET.
import config
from binance.client import Client
from binance.enums import *
import pandas as pdclient = Client(config.API_KEY, config.API_SECRET)# valid intervals — 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 3d, 1w, 1Mdef print_file(pair, interval, name):# request historical candle (or klines) data
bars = client.get_historical_klines(pair, interval, “27 Jun, 2018”, “27 Jun, 2021”)
# delete unwanted data — just keep date, open, high, low, close
for line in bars:
del line[5:]
# save as CSV file
df = pd.DataFrame(bars, columns=[‘date’, ‘open’, ‘high’, ‘low’, ‘close’])
df.set_index(‘date’, inplace=True)
df.to_csv(name)print_file(‘BTCUSDT’, ‘1d’, “BTC_1d.csv”)
Step 2: obtain percentage variation and apply logarithm
This step involves essentially reading the .csv file and processing the close price data by obtaining its percentage variation and applying the natural logarithm.
Step 3: compute drift
The drift is the direction that rates of returns have had in the past, i.e., the expected return. This parameter is obtained from the mean (μ) and standard deviation (σ) of the given sample.
Step 4: compute volatility
The volatility is the historical standard deviation multiplied by a random variable (Z), assuming normal distribution.
Step 5: compute the expected returns
The expected returns are the given by the natural exponential of the drift plus the volatility.
Observation: since we are dealing with large numbers, there is a high risk of occurring overflow. To deal with this, the natural logarithm of the percentage variation was obtained and then transformed back using the inverse (exponential function) after computing drift and volatility. This mathematical “trick” was used in order to avoid overflow error.
Step 6: compute the future price
The future price is then obtained by multiplying the price from the past time by the expected return.
This operation is repeated until the desired future date is achieved and then repeated N times to achieve better accuracy. Usually N = 10,000 is a sufficient number of simulations to achieve convergence of probabilities.
Step 7: compute probabilities
From the obtained price data, it is possible to compute probabilities for the future price with a given price range. This was implemented with the function below.
def cdf(array, b, a): # p(r) -> a < r < b
mean = statistics.mean(array)
std = statistics.stdev(array)
p1 = 0.5 * (1 + math.erf((a - mean)/math.sqrt(2 * std**2)))
p2 = 0.5 * (1 + math.erf((b - mean)/math.sqrt(2 * std**2)))
return (p2-p1)
3. Results
Convergence tests
Convergence tests were made for different values of N, beginning with N = 10 up to N = 10,000 in order to show convergence of probabilities. The tests were made using Bitcoin daily close price obtained from Binance in the period from 27 of July of 2018 to 27 of July of 2021, which corresponds to a 3 years time series. Forecasting will be made for one week from now.
The time series obtained is plotted in the figure below.
Below, is plotted the histogram of the daily expected returns of Bitcoin’s close price, with a black swan event highlighted which probably corresponds to February of 2020 market crash. The drift value obtained from these series was equal to 0.0716%.
In the following figures, it was plotted the predicted time series and the respective histogram of price forecasting one week from now for N = 10, N = 100, N = 1,000 and N = 10,000.
Considering the histograms shown, it can be seen that, as N increased, the histogram converged to a normally distributed curve. This was expected, since in these simulations a normal price variation distribution is assumed.
The figure below show the convergence of the probability that the future price of Bitcoin, one week from now, will be between $30,000 and $40,000.
It is observed that convergence is achieved for N = 10,000, as was previously stated, with P($30,000 ≤ close price ≤ $40,000) = 80,57%.
Sensitivity analysis
For this section, forecasting of the Bitcoin close price one day from today (09/10/2021) was made through Monte Carlo simulations with N = 10,000 on the hourly chart. Six different sample sizes were analyzed: 1 day, 1 week, 1 month, 3 months, 6 months and one year. The results are shown in the table and chart below.
As seen from the results, one day sample range gives the most outliers compared to the other data sample ranges. A convergence of probabilities is achieved for a sample range of 6 months, with maximum relative difference less than 0,1%.
4. Conclusion
In this article, it was presented a demonstration of how a Monte Carlo simulation can be used to forecast future Bitcoin prices and its probability of being in a given price range. Tests were made for different numbers of simulations and convergence was shown to be obtained for N = 10,000. Sensitivity analysis shown that, for a daily forecasting of the Bitcoin price, a sample size of 6 months was shown to yield satisfactory results.
It’s important to note that the Monte Carlo simulations assume a normal distribution of returns; however, that is not necessarily the case. While similar, a most precise distribution would be the Cauchy distribution. Nevertheless, for most purposes, the assumption of normal distribution is accurate enough for a large number of trials.
Source code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm
from datetime import datetime
import math
import statistics
from datetime import timedeltadef cdf(array, b, a): # p(r) -> a < r < b
mean = statistics.mean(array)
std = statistics.stdev(array)
p1 = 0.5 * (1 + math.erf((a - mean)/math.sqrt(2 * std**2)))
p2 = 0.5 * (1 + math.erf((b - mean)/math.sqrt(2 * std**2)))
return (p2-p1)start_date = datetime(2018, 6, 27)
end_date = datetime(2021, 6, 28)
dt = timedelta(days = 1)
dates = np.arange(start_date, end_date, dt).astype(datetime)file = pd.read_csv("BTC_1d.csv", parse_dates=True)
closes = file.close
plt.plot(dates, closes)
plt.show()log_returns = np.log(1 + closes.pct_change())u = log_returns.mean()
var = log_returns.var()
drift = u - (0.5*var)
print("Drift: {}".format(drift))stdev = log_returns.std()
days = 7
iterations = 10000
Z = norm.ppf(np.random.rand(days, iterations))
daily_returns = np.exp(drift + stdev * Z)price_paths = np.zeros_like(daily_returns)
price_paths[0] = closes.iloc[-1]
for t in range(1, days):
price_paths[t] = price_paths[t-1]*daily_returns[t]start_date = datetime(2021, 6, 27)
end_date = datetime(2021, 7, 4)
dt = timedelta(days = 1)
future_dates = np.arange(start_date, end_date, dt).astype(datetime)plt.plot(dates[-14:], closes[-14:], linestyle='dashed')
plt.plot(future_dates, price_paths)
plt.ylabel("Close price (BTC/USDT)")
plt.show()sns.histplot(price_paths[-1,:], bins = 20, stat = "frequency")
plt.xlabel("Close price (BTC/USDT)")
plt.ylabel("Frequency")
plt.show()pricemin = 30000
pricemax = 40000
print("Probability that the price will be between {} and {}: {}".format(pricemin, pricemax, cdf(price_paths[-1,:], pricemax, pricemin)))
Github: https://github.com/araujo88