Viz Panda

Posts

How to calculate power with welch t-test formula

- December 21, 2019

This is based on a two-tailed T-test when the two groups have different mean and variance. power_cal_welch_t_test <- function (mean_control,mean_treatment, sample_size_control,sample_size_treatment, sd_control, sd_treatment) { m2 <- mean_treatment m1 <- mean_control n1 <- sample_size_control n2 <- sample_size_treatment s1 <- sd_control s2 <- sd_treatment true_mean <- m2-m1 se_pool <- sqrt(sd1*sd1/n1+sd2*sd2/n2) ## Below is the welch t-test degree of freedom degree_of_up <- (sd1*sd1/n1 + sd2*sd2/n2)^2 degree_of_buttom <- (sd1*sd1)^2/(n1*n1*(n1-1)) + (sd2*sd2)^2/(n2*n2*(n2-1)) degree_of <- degree_of_up/degree_of_buttom ## End of calculating degree of freedom t_critical <- abs(qt(0.025,df=degree_of)) margin_error <- t_critical*se_pool t_left <- (-true_mean+margin_error)/se_pool t_right <- (-true_mean-...

Monte Carlo Simulation and Risk Analysis of Bitcoin

- February 05, 2018

Note: This post does not represent any expert guidelines in finance or bitcoin. This article is a wrap-up of a class project regarded stock market risk analysis. – Data Preparation from Stack Overflow: link – language: Python – Example code for Stock Market Project, Inspired by Jose Portilla’s Github Repo TL;DR The daily return chart shows that 25% of the time it's around 0%. The tail of the daily return is very long, as much as 400%. The Monte Carlo simulation shows that we will have between $5600 to $9000 range. The most interesting part is here. If we look 10% empirical quantile of the final price distribution to estimate the Value at Risk for the Bitcoin price, which looks to be $925.49 for every investment of $6837.31, which accounts for 13.5% of the investment. This means that for every initial coin you purchase at 6837.31, $925 is at risk 90% of the time from Monte Carlo Simulation.

Automatically download finance data via Google API

- February 05, 2018

As Yahoo! Finance deprecated the API in late 2017 and Google changed their Finance URL, below is an alternative to download finance data. from pandas_datareader.google.daily import GoogleDailyReader @property def url(self): return 'http://finance.google.com/finance/historical' GoogleDailyReader.url = url # get data import pandas_datareader as pdr from datetime import datetime start = datetime(2014,1,1) end = datetime(2018,2,5) ret = pdr.get_data_google(['FB'], start, end) Reference Documentation about other alternative API for downloading finance data here GitHub discussion about Google new Finance URL here

How to convert milliseconds or seconds into date format in Presto?

- September 25, 2017

Milliseconds: DATE_FORMAT(FROM_UNIXTIME(column_name /1000),'%Y-%m-%d') Seconds: DATE_FORMAT(FROM_UNIXTIME(column_name),'%Y-%m-%d') Please note that '/1000' should be added when it converts milliseconds to human-readable format. We have the column " purchased_date_epoch " stored as numeric format. Let's say we want to convert the " purchased_date_epoch " column value " 1442287036 " to human-readable format. SELECT purchased_date_epoch FROM table return: 144287036 SELECT DATE_FORMAT(FROM_UNIXTIME(purchased_date_epoch),'%Y-%m-%d %T) return: 2015-09-15 03:17:16 SELECT DATE_FORMAT(FROM_UNIXTIME(purchased_date_epoch),'%Y-%m-%d) return: 2015-09-15 ...

How to perform two-sample one-tailed t-test in Python

- September 24, 2017

In python, we can use ttest_ind to perform two-sample one-tailed test. Assuming that our hypothesis are: Ho(Null Hypothesis): P1 >= P2 Ha(Alternative Hypothesis): P1< P2 In this case, we know that we have 1st normal distribution with mean equal to 3 and variance equal to 2 with 400 data points. The 2nd normal distribution has the mean equal to 6 but the same sigma and size as 1st normal distribution. How can we interpret the results? According the Stat Trek , when the null hypothesis is: 6>=3, the t score should be equal to 21.2 with degree freedom equal to 798 and SE equal to 0.1414. Stat Trek Calculator gives use the p-value equal to 1. You might notice that no matter whether or not we write ttest_ind(P1,P2) or ttest_ind(P2,P1) , the t-statistics changes but the p-value does not change. Why? By default, Python Scipy library does not give an option for us to perform one-tailed two sample test. The p-value is compu...

Why do you need to take Tableau Certificate Desktop Exam?

- October 19, 2016

You might already read my previous blog about How to prepare for the Tableau Certificate Exam . Before you invested time and money in the exam, you may have the moment when you are wondering if it's worthy to take the exam. The fact is that there are not many certificates in the market for data science. Tableau Software becomes a very popular business intelligence tool in the past couple years despite their decreasing stock price. According to the Tableau Report Fiscal Year 2015 , " 88% of Fortune 500 companies, such as Cisco, Wells Fargo and Capital One, use Tableau, which bodes well for our land and expand strategy".

How to Add Mixpanel or Google Analytics on a Shiny App Correctly?

- October 05, 2016

Before you start reading this post, I recommend to read this blog on shiny.rstudio.com about how to add a Google Analytics to your shiny app. It's vey likely that you followed all the steps on the blog but it did not work out. 3 steps are missing in the blog post on rstudio website. I will walk you step by step about adding Mixpanel on shiny correctly:

Data tells you the secrets of extramarital affairs

- January 23, 2016

Recently I found this interesting dataset about extramarital affair of women in 1974. The dataset was built under a survey and now is available to download via Pandas package in Python. Some interesting facts are discovered after pure data visualization. If you are interested in interactive dashboard, please click here .

Search This Blog