Hasan's Post

Tutorial repository

View on GitHub
16 January 2022

Basic statistics

by Hasan

Measure of central tendency

Population and Sample

Mean

\[Mean = \frac{sum of all observation}{ number of observations}\]

Median

Mode

Measure of Dispertion

Standard devaitions or variants

Random variables and Probability Distributions

Normal distribution

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
x_axis = np.arange(-4, 4. 0.1)
plt.plot(x_axis, norm.pdf(x_axis, 0, 1))
plt.show()

T distribution

Probability of getting a high or low teaching evolution.

import scipy.stats
prob = scipy.stats.norm.cdf((
    (X- mean) / std
))
prob_greater_than = 1- prob

When to test t-test or Z-test

* Z test normal distribution and t-test T distribution.

Dealing with tails and rejection

Equal vs Unequal Variances

scipy.stats.levene(
    first_var,
    second_var,
    center='mean'
)
scipy.stats.ttest_ind(
    first_var,
    second_var,
    equal_var=True
)

ANOVA

# Three variables are
# forty_lower, forty_fifty_seven, fiftyseven_older
f_statistics, p_value = scipy.stats.f_oneway(
    forty_lower,
    forty_fifty_seven,
    fiftyseven_older
)
print(f'F statistics = {f_statistics}\n and P value is = {p_value}')

Correlation

scipy.stats.chi2_contingency(
    cont_table,
    correction=False
)
# first value = chi_squaure test value
# second value is p value
# Degree of freedom third value
# last array expected values
pearson_value, p_value = scipy.stats.pearsonr(
                         first_variable
                         second_variable
                         )

Regression in place of T-test

scipy.stats.ttest_ind(
    first_var,
    second_var,
    equal_var=True
)
import statsmodel.api as sm
X = first_var
y = second_var
X =  sm.add_constant(X)
model = sm.OLS(y, X).fit()
predictions = model.predict(X)
model.summary()

Regression in place of ANOVA

import statsmodel.api as sm
from statsmodel.formula.api import ols
lm = ols('beauty ~ age_group' data = ratings_df).fit()
table = sm.stats.anova_lm(lm)
print(table)

Regression in place in Correlation

import statsmodel.api as sm
X = first_var
y = second_var
X =  sm.add_constant(X)
model = sm.OLS(y, X).fit()
predictions = model.predict(X)
model.summary()

tags: