Why do you need to take Tableau Certificate Desktop Exam?

You might already read my previous blog about How to prepare for the Tableau Certificate Exam. Before you invested time and money in the exam, you may have the moment when you are wondering if it's worthy to take the exam.

The fact is that there are not many certificates in the market for data science. Tableau Software becomes a very popular business intelligence tool in the past couple years despite their decreasing stock price. According to the Tableau Report Fiscal Year 2015, "
88% of Fortune 500 companies, such as Cisco, Wells Fargo and Capital One, use Tableau, which bodes well for our land and expand strategy".

1) Who wants to pursue the Tableau Desktop 9 Qualified Associate Exam? 

If a candidate wants to pursue the Tableau Desktop Professional Exam, the candidate must pass the Qualified Associate Exam first. Therefore, anyone who considers to pursue data visualization and business intelligence in their career should consider taking this exam. 

2) Do tech companies value the certificate during the interview?

I was not asked if I have this certificate when I went on interviews. However, it brought the interviewers' attention and interests when I mentioned that I took this exam. As the license of Tableau costs $999+, a lot of start-ups were not willing to pay for the software. 
Based on what I learned from Tableau Software Annual Meeting 2014, here is the list of some fancy tech companies that presented during the conference: Facebook, Intuit, Amazon, and Cisco. Please check the full list on Tableau website. To summarize, the candidate's Tableau skills might not be outstanding from the crowd in start-ups but in some big tech companies. 

3) Does the exam improve the data visualization skills?
The format of the exam is multiple choice. It did not ask for any concept or theory in data visualization when I took the exam. 

How to Add Mixpanel or Google Analytics on a Shiny App Correctly?

Before you start reading this post, I recommend to read this blog on about how to add a Google Analytics to your shiny app. It's vey likely that you followed all the steps on the blog but it did not work out. 

Two steps are missing in the blog post on rstudio website:

1) You must put the mixpanel.js and google_analytics.js into a folder named "www". You need to put all the images and javascript files into this folder. You cannot name it to capital "WWW" or "Myfolder". The folder name has to be www.

The files structure of an shiny app should be like as below:

2) When it comes to using dashboardpage, you have to put  "tags$head(tags$script(src="mixpanel.js"))" inside the dashboardbody. Function includeScript might not work so you can use the default function src to extract the files.

   dashboardHeader(title = "Dashboard")
  ,skin = "blue"


3) If you have internet access and run the app locally on R Studio, the Mixpanel or Google Analytics website should receive the traffic immediately. If there's no visitor after you refresh page, please try it again.

Data tells you the secrets of extramarital affairs

Recently I found this interesting dataset about extramarital affair of women in 1974. The dataset was built under a survey and now is available to download via Pandas package in Python.

Some interesting facts are discovered after pure data visualization. If you are interested in interactive dashboard, please click here

How to group the dimension and customize the group name in Tableau

'Edit group name' is hidden in Tableau. Here is the instructions of how to group the dimensions and customize the group name. I used the default dataset of Superstore on Tableau Desktop 9 as an example. I want to group the sub-categories with number of records smaller than 200 together

Step 1: Build a sheet by dragging the dimension and measures.

What's the probability of forming a triangle by breaking a stick twice?

Questions: Supposed that you have 2 chances to break a stick with length of 1 into 3 pieces. What is the probability that the 3 pieces form a triangle?

Code in Python:


success = 0
fail = 0

for i in range(0,1500):
    x = rd.random()
    y = rd.random()
    if (y>0.5 and x<0.5 and y-x<0.5) or (x>0.5 and y<0.5 and x-y<0.5):
        success = success+1
        fail = fail+1

probability = success/n

Data Visualization with Seaborn in Python: FANG Stock Correlation Analysis

FANG, known as Facebook, Amazon, Netflix, and Google in the stock market, are considered very good investment in 2015. Thanks to the Python package Pandas and Seaborn, I am able to gather the adjusted close price and the volume on each day of last year of FANG stocks.

1. Volume and Adjusted Closing Price pattern of FANG in 2015

Amazon and Google has 3 significant peak volume in 2015. It's so interesting to see that the pattern of adjusted closing price of Google and Amazon are almost the same in 2015. It seems the Wall Street investors became confident in both Google and Amazon at the same time. 

How to Prepare for Tableau Desktop 9 Qualified Associate Exam

Last week I passed the Tableau Desktop Qualified Exam, so I decided to give back to the Tableau community about my studying experience. I studied about 1-2 weeks for the exam, average about 2 hours per day, even though I had about 2 years user experience with Tableau. 

The purpose of the exam is to test how much a candidate is familiar with the software instead of how good a candidate's visualization skills are. A post named 'Rumor has it...' in the community discussed about the fact that some candidates with few years working experience in Tableau failed the exam. If you are wondering if you should invest time and money in the exam (as it's quite expensive), I recommend to read my other post Why Do You Want to Take the Tableau Certificate Exam

Measure or Dimension?

From this article on the official guide it said 'For instance, you might calculate the Sum of “Sales” for every “State”. In this case, the Sales field is acting as a measure because you want to aggregate the field for each state. But measures could also result in a non-numeric result. For instance, you might create a calculated measure called “Sales Rating” that results in the word “Good” if sales are good and “Bad” otherwise. In this case the “Sales Rating” field acts as a measure even though it produces a non-numeric result. It is considered a measure because it is a function of the dimensions in the view'.
(Link: )

I used the below query to create the field of 'Sales Rating':

IF [Sales] > 2000 THEN 'good'
ELSEIF [Sales] <=2000 AND [Sales]>500 THEN 'medium'
ELSEIF [Sales]<= 500 THEN 'ok' 


It automatically sets the Sales Rating to dimension instead of measure. So the official guide of Tableau is incorrect? 
How did you interpret the last sentence from the guide 'It is considered a measure because it is a function of the dimensions in the view'

Automatically stream and download the Yahoo! Finance stock price using Python

Thanks to the Pandas package in Python, now we can stream the stock price from Yahoo! automatically within 1 second. And of course it's free! 

I am going to show you the example of downloading stock price of US Oil, Facebook, Best Buy, and Expedia from Jan 1st, 2014 to Dec 1st, 2015 and saving the data into a CSV file on your local drive. 

======== Code Start======== 
import as pdweb
import datetime
#import the library you need from pandas

stockprice = pdweb.get_data_yahoo(['USO','FB','BBY','EXPE'],start = datetime.datetime(2014,1,1),end=datetime.datetime(2015,12,1))['Adj Close']
#set the stocks you would like to download and the time frame

#save it to your local drive. It's in the same folder where you run the python.
======== Code End======== 

K-means Clustering Analysis of Red Wine Quality

I have been a big fan of sweet wine. During Thanksgiving, I had a road trip to the Napa Valley in California. The answers to good quality of wine can be different from person to person, but the expert of wine can summarize the indicators of good quality of wine. 

In Jun 2014, Business Insider published an article to list three main explanation of high quality of red wine:complexity, intensity, and balance. In 2009, a dataset, created by Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis, provided 1599 types of red wine with 10 scientific attributes associated with the quality. The data is available to be downloaded on University of California Irvine machine learning web page

3 Books for Data Scientist Interviews

Recently I made an infograph to show my classmates' current companies. I graduated with a M.S. in Business Analytics in 2014. A lot of my classmates pursued their careers in Data Science. 5 years ago, business analytics or data science were not considered as a major at college. It's a emerging subject that calls programming and statistics together to find the story of big data. 

(To see the high resolution of the graph or know more about the story, please click here.) 

1. Data Science Interview Exposed
    Purchase at
     Authors: Jane You, Iris Wang, Yanping Huang, Ian Gao, Feng Gao
     1) The book includes a lot of sample question with a variety topics covered as well as ANSWERS, no matter case study of data modeling, coding with Python, probability of fair coins, machine learning, and etc. 
     2) The book compares similar roles of Data Engineer, BI Engineer, Data Scientist, and related positions. If you are fresh out of college and wonder what's the job duties among all these seemly similar positions, this book is a good place to go. 

How to get approximate value of pi with Python

The goal of the project is to estimate pi value without using the function math.pi.

Assuming that you throw 18,000 balls into a square of 2 feet by 2 feet. The radius of the largest circle inside the square is equal to 1. Assuming that 18,000 balls has the same probability to fall into inside the circle or outside the circle within the square. At the end, we are going to calculate number of balls inside the circle divided by 18,000 to get the approximate value of pi.

Use Pascal Function to Get Approximate P-value of A/B Testing

If you are looking for script that allows you to get approximate p-value for your A/B testing, this is the place for you. Or if you are looking for reference to calculate p-value and z-value due to the limited available maths methods in SQL, the python script can be translated in SQL user define function.