December 13, 2015

Automatically stream and download the Yahoo! Finance stock price using Python

Attention: Yahoo! Finance deprecated the API in late 2017. Please refer to this article and learn how to use Google Finance as an alternative. 

Thanks to the Pandas package in Python, now we can stream the stock price from Yahoo! automatically within 1 second. And of course, it's free! 

I am going to show you the example of downloading stock price of US Oil, Facebook, Best Buy, and Expedia from Jan 1st, 2014 to Dec 1st, 2015 and saving the data into a CSV file on your local drive. 

======== Code Start======== 
import as pdweb
import datetime
#import the library you need from pandas

stockprice = pdweb.get_data_yahoo(['USO','FB','BBY','EXPE'],start = datetime.datetime(2014,1,1),end=datetime.datetime(2015,12,1))['Adj Close']
#set the stocks you would like to download and the time frame

#save it to your local drive. It's in the same folder where you run the python.
======== Code End======== 

K-means Clustering Analysis of Red Wine Quality

I have been a big fan of sweet wine. During Thanksgiving, I had a road trip to the Napa Valley in California. The answers to good quality of wine can be different from person to person, but the expert of wine can summarize the indicators of good quality of wine. 

In Jun 2014, Business Insider published an article to list three main explanation of high quality of red wine:complexity, intensity, and balance. In 2009, a dataset, created by Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis, provided 1599 types of red wine with 10 scientific attributes associated with the quality. The data is available to be downloaded on University of California Irvine machine learning web page

December 9, 2015

3 Books for Data Scientist Interviews

Recently I made an infograph to show my classmates' current companies. I graduated with a M.S. in Business Analytics in 2014. A lot of my classmates pursued their careers in Data Science. 5 years ago, business analytics or data science were not considered as a major at college. It's a emerging subject that calls programming and statistics together to find the story of big data. 

(To see the high resolution of the graph or know more about the story, please click here.) 

1. Data Science Interview Exposed
    Purchase at
     Authors: Jane You, Iris Wang, Yanping Huang, Ian Gao, Feng Gao
     1) The book includes a lot of sample question with a variety topics covered as well as ANSWERS, no matter case study of data modeling, coding with Python, probability of fair coins, machine learning, and etc. 
     2) The book compares similar roles of Data Engineer, BI Engineer, Data Scientist, and related positions. If you are fresh out of college and wonder what's the job duties among all these seemly similar positions, this book is a good place to go. 

December 2, 2015

How to get approximate value of pi with Python

The goal of the project is to estimate pi value without using the function math.pi.

Assuming that you throw 18,000 balls into a square of 2 feet by 2 feet. The radius of the largest circle inside the square is equal to 1. Assuming that 18,000 balls has the same probability to fall into inside the circle or outside the circle within the square. At the end, we are going to calculate number of balls inside the circle divided by 18,000 to get the approximate value of pi.