2015/11/24

Use Pascal Function to Get Approximate P-value of A/B Testing

If you are looking for script that allows you to get approximate p-value for your A/B testing, this is the place for you. Or if you are looking for reference to calculate p-value and z-value due to the limited available maths methods in SQL, the python script can be translated in SQL user define function. 


The script allows you to interact with terminal and input your data to get the p-value and conclusion of your test. The script is for proportion test, like Example 1. If your test is similar to example 2, this script is not for you.


Example 1:We have 38 out of 100 customers opened a bank account in Group A while 102 customers of 200 opened a bank account in Group B. Is the marketing campaign effective on both group equally? 
Example 2:We have the grade of all students in the data mining class taught by Mr.Lynn and Mr.Robert. The average grade of students in Mr.Lynn's class is 90 compared to 88.5 in Mr.Robert's class. Do students perform better in Mr.Lynn's class than Mr.Robert's class if grade is the only standard for performance evaluation? 

How to use this script?
1. If you have Jupyter, you can run the script directly using Python 3 ipython Notebook. If not, download the script and run in on the terminal
2. Input the number

3.After you decide the all your inputs, result on your terminal window show looks like this:

(Nice! The result of the script is matched with the number of the online z-value calculator at http://www.socscistatistics.com/tests/ztest/Default2.aspx
Part of the code can be used in SQL:
If you are looking for the UDF or query in SQL to get the approximate p-value, click here to go to wikipedia: https://en.wikipedia.org/wiki/Normal_distribution
proportion_control = (p1*1.00)/(n1*1.00)
proportion_treatment = (p2*1.00)/(n2*1.00)
diff = proportion_treatment - proportion_control
p = (p1 + p2)/(n1 + n2)
standard_error = math.sqrt(p*(1-p)*(1/n1+1/n2))
z = abs(diff/standard_error)
v1 = z
v2 = z
i = 1
for i in range(1,101):
    v2 = (v2*z*z/(2*i+1))
    v1 = v1+v2
cpvalue = (0.5+v1*(math.exp(-(z*z)/2)/math.sqrt(2*math.pi)))
pvalue = 1 - abs(cpvalue-0.5)*2


Python Code:
It's available to download on GitHub at https://github.com/evavon/approximate_p_value/blob/master/approximate_p_value_proportion.ipynb


print("Enter the number for the A/B testing experiment")
print(" ")
p1 = int(input("Total number of individuals who share the same characteristic in the control group: "))
n1 = int(input("Total population in the control group: "))
print(" ")
p2 = int(input("Total number of individuals who share the same characteristic in the treatment group: "))
n2 = int(input("Total population in the treatment group: "))
print(" ")
ci = float(input("The significance level(type 0.1, 0.05, or 0.01): "))
while ci not in (0.1,0.05,0.01):
print("try again")
ci = float(input("The significance interval(type 0.1, 0.05, or 0.01): "))
print("The significance level:", ci)
print("The decision for the test is based on", int((1-ci)*100), "% confidence interval.")
if ci == 0.1:
z_sig = 1.645
print("z-value of 0.1 significance level: 1.645.")
elif ci == 0.05:
z_sig = 1.960
print("z-value of 0.05 significance level: 1.960.")
elif ci == 0.01:
z_sig = 2.576
print("z-value of 0.01 significance level: 2.576")
proportion_control = (p1*1.00)/(n1*1.00)
proportion_treatment = (p2*1.00)/(n2*1.00)
diff = proportion_treatment - proportion_control
p = (p1 + p2)/(n1 + n2)
standard_error = math.sqrt(p*(1-p)*(1/n1+1/n2))
z = abs(diff/standard_error)
v1 = z
v2 = z
i = 1
for i in range(1,101):
v2 = (v2*z*z/(2*i+1))
v1 = v1+v2
cpvalue = (0.5+v1*(math.exp(-(z*z)/2)/math.sqrt(2*math.pi)))
pvalue = 1 - abs(cpvalue-0.5)*2

print("Null hypothesis: proportion of control group = proportion of treatment group")
print(" ")
print("Conclusion for two-tailed test: ")
if z>5:
print("We have almost 100% confidence to reject the null hypothesis.")
print("We cannot accept the null hypothesis.")
print("P-value =",round(pvalue,3))
else:
if z < z_sig:
print("We fail to reject the null hypothese")
print("P-value =",round(pvalue,3),">",ci)
else:
print("We have",int((1-ci)*100),"% confidence to reject the null the hypothesis.")
print("The impact of products on both groups are not equally effective.")
print("P-value =",round(pvalue,3),"<",ci)
print("Absolute Z-value=: ", round(z,3))
print("standard error",round(standard_error,3))
print("pool sample proportion",round(p,3))


Reference:
1. Hypothesis Test: Difference Between Proportion from Stat Trek: http://stattrek.com/hypothesis-test/difference-in-proportions.aspx?Tutorial=AP
2.Pascal Function for Normal Distribution: https://en.wikipedia.org/wiki/Normal_distribution

No comments:

Post a Comment