Hypothesis Testing in Research: Concepts, Methods, and Examples

Key takeaways

Hypothesis testing uses sample data to make probabilistic inferences about population parameters.
Type I and Type II errors are inherent trade-offs controlled by alpha and power.
Every test produces a test statistic compared against a known distribution to calculate the p-value.

Hypothesis testing is the formal method researchers use to evaluate claims about populations using sample data. Instead of proving a theory absolutely, statistics asks: if there were truly no effect, how likely is the data we observed? That probability is the p-value. Understanding the concepts behind hypothesis testing—null and alternative hypotheses, test statistics, significance levels, and error types—helps you choose methods correctly, interpret output critically, and write results chapters that examiners trust. This guide explains hypothesis testing concepts, surveys common methods, and provides worked examples from typical academic research.

The logic of hypothesis testing

Hypothesis testing is inferential, not deductive. You assume H0 is true and calculate how extreme your sample result would be under that assumption. Extreme results (low p-values) lead you to reject H0 in favour of Ha. You never 'accept' or 'prove' Ha—you reject or fail to reject H0 based on evidence strength.

Null hypothesis (H0) defined

H0 is the default position of no effect, no difference, or no relationship. Examples: μ1 = μ2 (equal population means), r = 0 (no correlation), β = 0 (no regression slope). H0 is what you attempt to disprove with data.

Alternative hypothesis (Ha) defined

Ha is what you predict or suspect. It can be directional (one-tailed)—group A > group B—or non-directional (two-tailed)—group A ≠ group B. Choose one-tailed only with strong theoretical justification stated before analysis.

Type I and Type II errors

Type I error (α): rejecting H0 when it is true—false positive.
Type II error (β): failing to reject H0 when Ha is true—false negative.
Power (1 − β): probability of correctly rejecting false H0.
Lower α reduces Type I but increases Type II unless sample size increases.

Test statistics and sampling distributions

Each test produces a statistic (t, F, z, χ²) measuring how far your sample result deviates from H0 expectation. That statistic is compared to a theoretical distribution given the degrees of freedom to obtain a p-value. Larger absolute test statistics generally mean smaller p-values.

Common hypothesis testing methods

One-sample t-test: sample mean vs known value.
Independent t-test: two independent group means.
Paired t-test: two measurements on same subjects.
One-way ANOVA: three or more group means.
Pearson correlation: linear association between two variables.
Chi-square: association between categorical variables.
Linear regression: predicting continuous outcome from predictors.
Mann-Whitney U: non-parametric two-group comparison.

Worked example 1: Independent t-test

Study: effect of mindfulness training on anxiety scores. H0: μmindfulness = μcontrol. Ha: μmindfulness < μcontrol. n = 45 per group. Result: t(88) = −2.67, p = .009. Conclusion: reject H0; mindfulness group shows significantly lower anxiety scores. Report Cohen's d for effect size.

Worked example 2: Chi-square test

Study: association between department and preferred learning mode. H0: no association. Ha: association exists. χ²(2, N = 300) = 8.45, p = .015. Conclusion: reject H0; department and learning preference are significantly associated. Report Cramér's V for effect size.

Worked example 3: Linear regression

Study: hours studied predicting exam score. H0: β = 0. Ha: β > 0. Result: β = 2.4, t(148) = 5.89, p < .001, R² = .19. Conclusion: study hours significantly predict exam scores, explaining 19% of variance.

One-tailed vs two-tailed tests

Two-tailed tests detect effects in either direction and are default in most social science research. One-tailed tests detect effects in one direction only, with slightly more power but no detection of opposite effects. Examiners scrutinise one-tailed choices—justify them theoretically before analysis.

Multiple comparisons problem

Running many tests inflates Type I error rate. If you run 20 tests at α = 0.05, expect one false positive on average. Use corrections (Bonferroni, FDR) or pre-specify primary vs secondary hypotheses. Report all tests conducted, not only significant ones.

Getting expert help with hypothesis testing

Our statistical analysis support helps researchers select valid tests, interpret output, and write APA-formatted results for dissertations and journal submissions.

Back to all resources