December 13, 2015

Automatically stream and download the Yahoo! Finance stock price using Python

Attention: Yahoo! Finance deprecated the API in late 2017. Please refer to this article and learn how to use Google Finance as an alternative. 

Thanks to the Pandas package in Python, now we can stream the stock price from Yahoo! automatically within 1 second. And of course, it's free! 

I am going to show you the example of downloading stock price of US Oil, Facebook, Best Buy, and Expedia from Jan 1st, 2014 to Dec 1st, 2015 and saving the data into a CSV file on your local drive. 

======== Code Start======== 
import as pdweb
import datetime
#import the library you need from pandas

stockprice = pdweb.get_data_yahoo(['USO','FB','BBY','EXPE'],start = datetime.datetime(2014,1,1),end=datetime.datetime(2015,12,1))['Adj Close']
#set the stocks you would like to download and the time frame

#save it to your local drive. It's in the same folder where you run the python.
======== Code End======== 

K-means Clustering Analysis of Red Wine Quality

I have been a big fan of sweet wine. During Thanksgiving, I had a road trip to the Napa Valley in California. The answers to good quality of wine can be different from person to person, but the expert of wine can summarize the indicators of good quality of wine. 

In Jun 2014, Business Insider published an article to list three main explanation of high quality of red wine:complexity, intensity, and balance. In 2009, a dataset, created by Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis, provided 1599 types of red wine with 10 scientific attributes associated with the quality. The data is available to be downloaded on University of California Irvine machine learning web page

December 9, 2015

3 Books for Data Scientist Interviews

Recently I made an infograph to show my classmates' current companies. I graduated with a M.S. in Business Analytics in 2014. A lot of my classmates pursued their careers in Data Science. 5 years ago, business analytics or data science were not considered as a major at college. It's a emerging subject that calls programming and statistics together to find the story of big data. 

(To see the high resolution of the graph or know more about the story, please click here.) 

1. Data Science Interview Exposed
    Purchase at
     Authors: Jane You, Iris Wang, Yanping Huang, Ian Gao, Feng Gao
     1) The book includes a lot of sample question with a variety topics covered as well as ANSWERS, no matter case study of data modeling, coding with Python, probability of fair coins, machine learning, and etc. 
     2) The book compares similar roles of Data Engineer, BI Engineer, Data Scientist, and related positions. If you are fresh out of college and wonder what's the job duties among all these seemly similar positions, this book is a good place to go. 

December 2, 2015

How to get approximate value of pi with Python

The goal of the project is to estimate pi value without using the function math.pi.

Assuming that you throw 18,000 balls into a square of 2 feet by 2 feet. The radius of the largest circle inside the square is equal to 1. Assuming that 18,000 balls has the same probability to fall into inside the circle or outside the circle within the square. At the end, we are going to calculate number of balls inside the circle divided by 18,000 to get the approximate value of pi.

November 24, 2015

Use Pascal Function to Get Approximate P-value of A/B Testing

If you are looking for script that allows you to get approximate p-value for your A/B testing, this is the place for you. Or if you are looking for reference to calculate p-value and z-value due to the limited available maths methods in SQL, the python script can be translated in SQL user define function. 

May 6, 2015

Data Science & Business Intelligence

A lot of researches gave us the fancy definition about Data Science and Business Intelligence. I interviewed a few of my friends from my master program most of whom are working as a Data Scientist or Business Intelligence Engineer or Data Engineer in the tech industry. Since I am working in Amazon instead of the academic research environment, I will combine my experience and and my friends' interviews to discuss the responsibilities of a Data Scientist or a BI Engineer. 

February 19, 2015

Instagram – The Best Real Time Weather App in National Parks

Big Four Ice Cave is a beautiful hiking place of Mountain Baker northeast to Seattle. Despite the avalanche of the Big Four Snow Mountain on Dec 31st, 2014, the parking lot of the ice cave trail was full packed on at 3pm on Jan 1st, 2015. I never considered Instagram as the best real time weather until the New Year’s Day 2015.
The location of Big Four Ice Cave was marked on Instagram map in a place more than 50 miles away from the real location, but as long as I tapped the location feature on Instagram, all the images shot in Big Four Ice Cave was organized at the bottom of the map. When I tapped the image, Instagram told me that how many hours or days ago the user posted on Instagram.

WeChat Lucky Money Game in Chinese Lunar New Year: An Aggressive Way to Get Investment

Responding to the huge success of Wechat Lucky Money in 2014, the e-commerce giant in China, Alibaba also launched the Lucky Money Giveaway on their official website, delivering 1 billion Yuan (equal to about 1.6 billion USD) to all Alipay users. You don’t have to participate in any silly 10-minutes survey or fill out any application forms to participate on Alibaba website.