May 6, 2015

Data Science & Business Intelligence

A lot of researches gave us the fancy definition about Data Science and Business Intelligence. I interviewed a few of my friends from my master program most of whom are working as a Data Scientist or Business Intelligence Engineer or Data Engineer in the tech industry. Since I am working in Amazon instead of the academic research environment, I will combine my experience and and my friends' interviews to discuss the responsibilities of a Data Scientist or a BI Engineer. 

According to Forrester Research, Business Intelligence is 
Forrester defines the business intelligence (BI) market as a set of methodologies, processes, architectures, and technologies that leverage the output of information management processes for analysis, reporting, performance management, and information delivery. Research coverage includes executive dashboards as well as query and reporting tools.

As for Data Science, the definition is:
In general terms, Data Science is the extraction of knowledge from data It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information theory and information technology, including signal processing, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization, predictive analytics, uncertainty modeling, data warehousing, data compression and high performance computing. Methods that scale to Big Data are of particular interest in data science, although the discipline is not generally considered to be restricted to such data. The development of machine learning, a branch of artificial intelligence used to uncover patterns in data from which predictive models can be developed, has enhanced the growth and importance of data science.

What kinds of skill are required for both?
SQL -  Extract the data
This is definitely the most important and generic skill required for both positions. However, SQL does not allow you to build specific statistic models. You can only do simple calculations in SQL. 

Tableau -  Simple reporting
The Seattle based company Tableau created the incredibly user friendly software for you to generate the report. The private Tableau server allows you to share your dashboard with your internal customers. The newest version in Tableau allows you to run Java and customized SQL on the top of your original tables.

R Studio - Build complex statistical model
In R studio, you can run linear regression, neural network regression, F-test, or other complex statistical model that are impossible to run on SQL.

Excellent programming skills introduces you to the career path of a Business Intelligence Engineer. However, in the real world, especially business driven working environment, critical thinking is much more important. For example, what kinds of metrics would you like to use to measure a user's interaction on Facebook ? Length of time when a user stayed on the page? How many likes a user gave in one day? Did a user download an app recommended on Facebook ad?  Why the number of likes a user gave is more important than the length of his staying on the New Feed page? What other metrics should we include in the Weekly Business Report that delivered to the Finance team? How about the report delivered to the Accounting team? When the metrics you want to use are not available in the data warehouse, how could you come up with SQL query that used other metrics to generate the metrics you want?

In the business-driven environment, production delivery speed is more important that the quality of the report. As private company's goal is to make more profits, it requires the leadership to make big decisions fast and correctly. Therefore, BIE working pace is very fast. Summary data in a report sounds pretty straightforward and simple but how to overcome all the data engineering problems and become critical thinking about the metrics relationship in the data warehouse takes a lot of hard work and reviews. 

You might wonder if a Data Scientist's job focuses more on the research base. Most of my friends with a title in Data Scientist use SQL to extract the data and another tool, including Excel, R Studio, and SAS to analyze the customer behavior.