Posts

How to Prepare for Tableau Desktop 9 Qualified Associate Exam

Last week I passed the Tableau Desktop Qualified Exam, so I decided to give back to the Tableau community about my studying experience. I studied about 1-2 weeks for the exam, average about 2 hours per day, even though I had about 2 years user experience with Tableau.  The purpose of the exam is to test how much a candidate is familiar with the software instead of how good a candidate's visualization skills are. A post named ' Rumor has it...' in the community discussed about the fact that some candidates with few years working experience in Tableau failed the exam. If you are wondering if you should invest time and money in the exam (as it's quite expensive), I recommend to read my other post Why Do You Want to Take the Tableau Certificate Exam . 

How to calculate power with welch t-test formula

This is based on a two-tailed T-test when the two groups have different mean and variance.  power_cal_welch_t_test <- function (mean_control,mean_treatment, sample_size_control,sample_size_treatment, sd_control, sd_treatment) {   m2 <- mean_treatment   m1 <- mean_control   n1 <- sample_size_control   n2 <- sample_size_treatment   s1 <- sd_control   s2 <- sd_treatment   true_mean <- m2-m1   se_pool <- sqrt(sd1*sd1/n1+sd2*sd2/n2) ## Below is the welch t-test degree of freedom   degree_of_up <- (sd1*sd1/n1 + sd2*sd2/n2)^2   degree_of_buttom <- (sd1*sd1)^2/(n1*n1*(n1-1)) + (sd2*sd2)^2/(n2*n2*(n2-1))   degree_of <- degree_of_up/degree_of_buttom   ## End of calculating degree of freedom   t_critical <- abs(qt(0.025,df=degree_of))   margin_error <- t_critical*se_pool   t_left <- (-true_mean+margin_error)/se_pool   t_right <- (-true_mean-margin_error)/se_pool   power_val <- 1 - pt(t_left, degree_of) + pt(t_right, deg

Monte Carlo Simulation and Risk Analysis of Bitcoin

Image
Note: This post does not represent any expert guidelines in finance or bitcoin. This article is a wrap-up of a class project regarded stock market risk analysis.  – Data Preparation from Stack Overflow: link – language: Python – Example code for Stock Market Project, Inspired by Jose Portilla’s Github Repo TL;DR The daily return chart shows that 25% of the time it's around 0%. The tail of the daily return is very long, as much as 400%. The Monte Carlo simulation shows that we will have between $5600 to $9000 range. The most interesting part is here. If we look 10% empirical quantile of the final price distribution to estimate the Value at Risk for the Bitcoin price, which looks to be $925.49 for every investment of $6837.31, which accounts for 13.5% of the investment.  This means that for every initial coin you purchase at 6837.31, $925 is at risk 90% of the time from Monte Carlo Simulation. 

Automatically download finance data via Google API

As Yahoo! Finance deprecated the API in late 2017 and Google changed their Finance URL, below is an alternative to download finance data. from pandas_datareader.google.daily import GoogleDailyReader @property def url(self): return 'http://finance.google.com/finance/historical' GoogleDailyReader.url = url # get data import pandas_datareader as pdr from datetime import datetime start = datetime(2014,1,1) end = datetime(2018,2,5) ret = pdr.get_data_google(['FB'], start, end) Reference Documentation about other alternative API for downloading finance data  here GitHub discussion about Google new Finance URL  here

How to convert milliseconds or seconds into date format in Presto?

Milliseconds: DATE_FORMAT(FROM_UNIXTIME(column_name /1000),'%Y-%m-%d') Seconds: DATE_FORMAT(FROM_UNIXTIME(column_name),'%Y-%m-%d') Please note that '/1000' should be added when it converts milliseconds to human-readable format.  We have the column " purchased_date_epoch " stored as numeric format. Let's say we want to convert the " purchased_date_epoch " column value " 1442287036 " to  human-readable  format.  SELECT purchased_date_epoch FROM table                               return: 144287036  SELECT DATE_FORMAT(FROM_UNIXTIME(purchased_date_epoch),'%Y-%m-%d %T) return: 2015-09-15 03:17:16                                          SELECT DATE_FORMAT(FROM_UNIXTIME(purchased_date_epoch),'%Y-%m-%d)    return: 2015-09-15                                                  

How to perform two-sample one-tailed t-test in Python

Image
In python, we can use ttest_ind   to perform two-sample one-tailed test. Assuming that our hypothesis are: Ho(Null Hypothesis): P1 >= P2 Ha(Alternative Hypothesis): P1< P2 In this case, we know that we have 1st normal distribution with mean equal to 3 and variance equal to 2 with 400 data points. The 2nd normal distribution has the mean equal to 6 but the same sigma and size as 1st normal distribution.  How can we interpret the results? According the  Stat Trek , when the null hypothesis is: 6>=3, the t score should be equal to 21.2 with degree freedom equal to 798 and SE equal to 0.1414. Stat Trek Calculator gives use the p-value equal to 1. You might notice that no matter whether or not we write  ttest_ind(P1,P2)   or  ttest_ind(P2,P1)  , the t-statistics changes but the p-value does not change. Why? By default, Python Scipy library does not give an option for us to perform one-tailed two sample test. The p-value is computed based on the assumption of two

Why do you need to take Tableau Certificate Desktop Exam?

You might already read my previous blog about How to prepare for the Tableau Certificate Exam . Before you invested time and money in the exam, you may have the moment when you are wondering if it's worthy to take the exam. The fact is that there are not many certificates in the market for data science. Tableau Software becomes a very popular business intelligence tool in the past couple years despite their decreasing stock price. According to the Tableau Report Fiscal Year 2015 , " 88% of Fortune 500 companies, such as Cisco, Wells Fargo and Capital One, use Tableau, which bodes well for our land and expand strategy".

How to Add Mixpanel or Google Analytics on a Shiny App Correctly?

Image
Before you start reading this post, I recommend to read this blog on shiny.rstudio.com about how to add a Google Analytics to your shiny app. It's vey likely that you followed all the steps on the blog but it did not work out.  3 steps are missing in the blog post on rstudio website. I will walk you step by step about adding Mixpanel on shiny correctly:

Data tells you the secrets of extramarital affairs

Image
Recently I found this interesting dataset about extramarital affair of women in 1974. The dataset was built under a survey and now is available to download via Pandas package in Python. Some interesting facts are discovered after pure data visualization. If you are interested in interactive dashboard, please click here .