September 25, 2017

How to convert milliseconds or seconds into date format in Presto?

Milliseconds:
DATE_FORMAT(FROM_UNIXTIME(column_name /1000),'%Y-%m-%d')
Seconds:
DATE_FORMAT(FROM_UNIXTIME(column_name),'%Y-%m-%d')

Please note that '/1000' should be added when it converts milliseconds to human-readable format. 
We have the column "purchased_date_epoch" stored as numeric format. Let's say we want to convert the "purchased_date_epoch" column value "1442287036" to human-readable format. 


SELECT purchased_date_epoch FROM table                              
return: 144287036 
SELECT DATE_FORMAT(FROM_UNIXTIME(purchased_date_epoch),'%Y-%m-%d %T)
return: 2015-09-15 03:17:16                                         
SELECT DATE_FORMAT(FROM_UNIXTIME(purchased_date_epoch),'%Y-%m-%d)   
return: 2015-09-15                                                  



September 24, 2017

How to perform two-sample one-tailed t-test in Python

In python, we can use ttest_ind to perform two-sample one-tailed test. Assuming that our hypothesis are:
Ho(Null Hypothesis): P1 >= P2
Ha(Alternative Hypothesis): P1< P2

In this case, we know that we have 1st normal distribution with mean equal to 3 and variance equal to 2 with 400 data points. The 2nd normal distribution has the mean equal to 6 but the same sigma and size as 1st normal distribution. 




How can we interpret the results?

According the Stat Trek, when the null hypothesis is: 6>=3, the t score should be equal to 21.2 with degree freedom equal to 798 and SE equal to 0.1414. Stat Trek Calculator gives use the p-value equal to 1.



You might notice that no matter whether or not we write ttest_ind(P1,P2) or ttest_ind(P2,P1) , the t-statistics changes but the p-value does not change. Why? By default, Python Scipy library does not give an option for us to perform one-tailed two sample test. The p-value is computed based on the assumption of two-tailed two sample test. 

Therefore, the correct way to perform our null hypothesis in Python should be as below.
P1 = np.random.normal(6,2,400)
P2 = np.random.normal(3,2,400)
stats.ttest_ind(P1, P2, axis=0, equal_var=True)
And you will the see the results as below
Ttest_indResult(statistic=21.374858126615408, pvalue=1.6807582123709593e-80)

The real p-value for our null Hypothesis: P1>=P2 is

real_t_score=Ttest_indResult.statistic
real_pvalue=1-Ttest_indResult.pvalue/2 =1-1.6807582123709593e-80=1-0.84e-80=0.9999

As the real p value is so close to 1, we cannot reject the null hypothesis that P1>=P2 (6>=3).