February 5, 2018

Monte Carlo Simulation and Risk Analysis of Bitcoin

Note: This post does not represent any expert guidelines in finance or bitcoin. This article is a wrap-up of a class project regarded stock market risk analysis. 

– Data Preparation from Stack Overflow: link
– language: Python
– Example code for Stock Market Project, Inspired by Jose Portilla’s Github Repo


The daily return chart shows that 25% of the time it's around 0%. The tail of the daily return is very long, as much as 400%.

The Monte Carlo simulation shows that we will have between $5600 to $9000 range.
The most interesting part is here. If we look 10% empirical quantile of the final price distribution to estimate the Value at Risk for the Bitcoin price, which looks to be $925.49 for every investment of $6837.31, which accounts for 13.5% of the investment. 

This means that for every initial coin you purchase at 6837.31, $925 is at risk 90% of the time from Monte Carlo Simulation. 

Automatically download finance data via Google API

As Yahoo! Finance deprecated the API in late 2017 and Google changed their Finance URL, below is an alternative to download finance data.

from pandas_datareader.google.daily import GoogleDailyReader

def url(self):
    return 'http://finance.google.com/finance/historical'

GoogleDailyReader.url = url

# get data
import pandas_datareader as pdr
from datetime import datetime
start = datetime(2014,1,1)
end = datetime(2018,2,5)
ret = pdr.get_data_google(['FB'], start, end)

  • Documentation about other alternative API for downloading finance data here
  • GitHub discussion about Google new Finance URL here

September 25, 2017

How to convert milliseconds or seconds into date format in Presto?

DATE_FORMAT(FROM_UNIXTIME(column_name /1000),'%Y-%m-%d')

Please note that '/1000' should be added when it converts milliseconds to human-readable format. 
We have the column "purchased_date_epoch" stored as numeric format. Let's say we want to convert the "purchased_date_epoch" column value "1442287036" to human-readable format. 

SELECT purchased_date_epoch FROM table                              
return: 144287036 
SELECT DATE_FORMAT(FROM_UNIXTIME(purchased_date_epoch),'%Y-%m-%d %T)
return: 2015-09-15 03:17:16                                         
SELECT DATE_FORMAT(FROM_UNIXTIME(purchased_date_epoch),'%Y-%m-%d)   
return: 2015-09-15                                                  

September 24, 2017

How to perform two-sample one-tailed t-test in Python

In python, we can use ttest_ind to perform two-sample one-tailed test. Assuming that our hypothesis are:
Ho(Null Hypothesis): P1 >= P2
Ha(Alternative Hypothesis): P1< P2

In this case, we know that we have 1st normal distribution with mean equal to 3 and variance equal to 2 with 400 data points. The 2nd normal distribution has the mean equal to 6 but the same sigma and size as 1st normal distribution. 

How can we interpret the results?

According the Stat Trek, when the null hypothesis is: 6>=3, the t score should be equal to 21.2 with degree freedom equal to 798 and SE equal to 0.1414. Stat Trek Calculator gives use the p-value equal to 1.

You might notice that no matter whether or not we write ttest_ind(P1,P2) or ttest_ind(P2,P1) , the t-statistics changes but the p-value does not change. Why? By default, Python Scipy library does not give an option for us to perform one-tailed two sample test. The p-value is computed based on the assumption of two-tailed two sample test. 

Therefore, the correct way to perform our null hypothesis in Python should be as below.
P1 = np.random.normal(6,2,400)
P2 = np.random.normal(3,2,400)
stats.ttest_ind(P1, P2, axis=0, equal_var=True)
And you will the see the results as below
Ttest_indResult(statistic=21.374858126615408, pvalue=1.6807582123709593e-80)

The real p-value for our null Hypothesis: P1>=P2 is

real_pvalue=1-Ttest_indResult.pvalue/2 =1-1.6807582123709593e-80=1-0.84e-80=0.9999

As the real p value is so close to 1, we cannot reject the null hypothesis that P1>=P2 (6>=3). 

October 19, 2016

Why do you need to take Tableau Certificate Desktop Exam?

You might already read my previous blog about How to prepare for the Tableau Certificate Exam. Before you invested time and money in the exam, you may have the moment when you are wondering if it's worthy to take the exam.

The fact is that there are not many certificates in the market for data science. Tableau Software becomes a very popular business intelligence tool in the past couple years despite their decreasing stock price. According to the Tableau Report Fiscal Year 2015, "
88% of Fortune 500 companies, such as Cisco, Wells Fargo and Capital One, use Tableau, which bodes well for our land and expand strategy".