What does p < 0.05 mean?

p < 0.05 means that if the null hypothesis were true, there would be less than a 5% probability of observing results as extreme as those obtained. When p < α, you reject the null hypothesis and conclude there is statistically significant evidence for the alternative hypothesis.

What is the difference between a Z-test and a t-test?

A Z-test uses the standard normal distribution and is appropriate when the population standard deviation is known or sample size is large (n ≥ 30). A t-test uses the t-distribution when population std dev is unknown and estimated from the sample. As degrees of freedom increase, the t-distribution approaches the normal distribution.

What is the null hypothesis?

The null hypothesis (H₀) is the default assumption that there is no effect, no difference, or no relationship. The alternative hypothesis (H₁) is what you are trying to show evidence for. The p-value tests H₀: small p-values provide evidence against H₀.

What is a Type I error?

A Type I error (false positive) occurs when you reject H₀ when it is actually true. The probability of a Type I error equals α (the significance level). A Type II error (false negative) occurs when you fail to reject H₀ when it is actually false. Statistical power = 1 - β = probability of correctly detecting a real effect.

How do I calculate a Z-score for this calculator?

For a one-sample Z-test: Z = (sample mean − hypothesised mean) / (population std dev / √n). For a t-test: t = (sample mean − hypothesised mean) / (sample std dev / √n). Calculate the sample mean and standard deviation first, then enter the test statistic here.

∫ Z-test · T-test · Chi-Square · F-test · 5 Tail Types

P-value Calculator

Q: What does p < 0.05 mean?

p < 0.05 means that if the null hypothesis were true, there would be less than a 5% probability of observing results as extreme as those obtained. When p < α, you reject the null hypothesis and conclude there is statistically significant evidence for the alternative hypothesis.

Q: What is the difference between a Z-test and a t-test?

A Z-test uses the standard normal distribution and is appropriate when the population standard deviation is known or sample size is large (n ≥ 30). A t-test uses the t-distribution when population std dev is unknown and estimated from the sample. As degrees of freedom increase, the t-distribution approaches the normal distribution.

Q: What does it mean if my p-value is greater than 0.05?

If p > α, you fail to reject the null hypothesis. This does NOT mean you accept H₀ or that H₀ is true. It means your data do not provide sufficient evidence to reject H₀ at your chosen significance level. It could mean the effect is real but your sample was too small to detect it.

Q: What is the null hypothesis?

The null hypothesis (H₀) is the default assumption that there is no effect, no difference, or no relationship. The alternative hypothesis (H₁) is what you are trying to show evidence for. The p-value tests H₀: small p-values provide evidence against H₀.

Q: What is a Type I error?

A Type I error (false positive) occurs when you reject H₀ when it is actually true. The probability of a Type I error equals α (the significance level). A Type II error (false negative) occurs when you fail to reject H₀ when it is actually false. Statistical power = 1 - β = probability of correctly detecting a real effect.

Q: How do I calculate a Z-score for this calculator?

For a one-sample Z-test: Z = (sample mean − hypothesised mean) / (population std dev / √n). For a t-test: t = (sample mean − hypothesised mean) / (sample std dev / √n). Calculate the sample mean and standard deviation first, then enter the test statistic here.

Calculate the p-value from a Z-score, t-statistic, chi-square, or F-statistic. Supports all tail types with a live normal distribution curve and significance interpretation.

∑

Test Parameters

Statistical Test

Z-Score (test statistic)

Hypothesis Test Type

Two-tailed

H₁: μ ≠ μ₀ | Both extremes

Right-tailed

H₁: μ > μ₀ | Upper tail

Left-tailed

H₁: μ < μ₀ | Lower tail

Significance Level (α)

Try:

∫

Enter test parameters

P-value and distribution curve will appear here

Normal Distribution

P-value

—

Test Stat

—

α Level

—

Decision

—

Interpretation

—

Common Z Critical Values

Confidence	α (two-tailed)	Z critical	α (one-tailed)	Z critical
90%	0.10	±1.645	0.10	1.282
95%	0.05	±1.960	0.05	1.645
99%	0.01	±2.576	0.01	2.326
99.9%	0.001	±3.291	0.001	3.090

What Is a P-value and How Is It Calculated?

A p-value (probability value) is the probability of obtaining test results at least as extreme as the observed results, assuming that the null hypothesis (H₀) is true. In other words: if there really were no effect or difference in the population, how likely would it be to see data as extreme as what you observed just by chance? A small p-value means your observed data would be very unlikely under H₀ — suggesting that H₀ may be false, and that your result is statistically significant. For related statistics, see our Standard Deviation Calculator and Average Calculator.

The Standard Normal Distribution & Z-scores

P-value from Z-score

Two-tailed: p = 2 × Φ(−|Z|) where Φ is the standard normal CDF Right-tailed: p = 1 − Φ(Z) Left-tailed: p = Φ(Z)

Φ(Z) = area under the standard normal curve to the left of Z. This calculator uses the error function (erf) approximation for high accuracy.

Supported Statistical Tests

Z-test

Used when population standard deviation is known, or sample size is large (n ≥ 30). Test statistic follows the standard normal distribution. Common for proportions and large samples.

T-test

Used when population std dev is unknown and estimated from the sample. Requires degrees of freedom (df = n−1 for one-sample; df = n₁+n₂−2 for two-sample). Follows t-distribution.

χ²

Chi-Square Test

Used for categorical data — goodness-of-fit tests and tests of independence in contingency tables. Always right-tailed. df = (rows−1)×(cols−1) for contingency tables.

F-test (ANOVA)

Used to compare variances or in ANOVA to compare means across 3+ groups. Requires two degrees of freedom: df₁ (numerator) and df₂ (denominator). Always right-tailed.

Significance Levels (α) Explained

The significance level α is the threshold below which you reject the null hypothesis. It represents the probability of a Type I error — concluding there is an effect when there isn't one. Common choices:

α Level	Confidence Level	Meaning	Common Use
0.10	90%	10% false positive rate	Exploratory research, weak evidence
0.05	95%	5% false positive rate	Standard threshold in most sciences
0.01	99%	1% false positive rate	Medical trials, high-stakes decisions
0.001	99.9%	0.1% false positive rate	Physics (e.g., Higgs boson detection)

The most widely used α = 0.05 was originally proposed by Ronald Fisher in the 1920s. It means: if H₀ were true, you'd see results this extreme only 5% of the time by chance. Use our Standard Deviation Calculator to prepare your test statistic from raw data.

One-tailed vs Two-tailed Tests

Choose your tail type based on your research hypothesis before seeing the data: a two-tailed test is used when you're testing for any difference (H₁: μ ≠ μ₀). It splits α between both tails (α/2 each). A right-tailed test is used when you hypothesize an increase (H₁: μ > μ₀). A left-tailed test is used when you hypothesize a decrease (H₁: μ < μ₀). One-tailed tests are more powerful when the direction is known but can be misleading if the direction is chosen after seeing the data — a practice called "p-hacking." Two-tailed tests are conservative and more commonly published.

Frequently Asked Questions

Common questions about p-values, hypothesis testing, and statistical significance

p < 0.05 means that if the null hypothesis were true, there would be less than a 5% probability of observing results as extreme as those obtained. This is the standard threshold for "statistical significance" in most scientific fields. When p < α (your chosen significance level), you reject the null hypothesis and conclude there is statistically significant evidence for your alternative hypothesis. However, p < 0.05 does not mean: the result is practically important; the effect is large; the probability that H₀ is true is 5%; or the probability that your finding is a "false positive" is exactly 5%. It is a conditional probability under H₀, not a direct measure of the probability that your hypothesis is correct. Always report effect sizes alongside p-values for full context. Use our Standard Deviation Calculator to understand your data's spread.

A Z-test uses the standard normal distribution and is appropriate when: (1) the population standard deviation (σ) is known, or (2) the sample size is large (typically n ≥ 30, where the Central Limit Theorem ensures normality). A t-test uses the t-distribution and is appropriate when σ is unknown and must be estimated from the sample — which is almost always the case in practice. The t-distribution has heavier tails than the normal distribution, reflecting extra uncertainty from estimating σ. As degrees of freedom increase (larger sample), the t-distribution approaches the normal distribution. For df > 30, Z and t critical values become nearly identical. The t-test requires specifying degrees of freedom (df = n−1 for one-sample t-test). Our Average Calculator and Standard Deviation Calculator can help compute your test statistic.

If p > α (e.g., p = 0.12 with α = 0.05), you fail to reject the null hypothesis. Crucially, this does NOT mean you "accept" H₀ or that H₀ is true. It means your data do not provide sufficient evidence to reject H₀ at your chosen significance level. There are several possible explanations: the null hypothesis really is true; the effect exists but your sample was too small to detect it (low statistical power); there was measurement error; or there was too much variability. A result of p > 0.05 is often reported as "not statistically significant" (NS) and represented as the failure to find an effect — not proof of absence. Increasing sample size or reducing variability can sometimes reveal effects that were masked by insufficient power. Use our Standard Deviation Calculator to understand variance in your dataset.

The null hypothesis (H₀) is the default assumption — typically that there is no effect, no difference, or no relationship. For example: "the new drug has no effect on blood pressure" or "the two groups have equal means." The alternative hypothesis (H₁ or Hₐ) is what you're trying to show evidence for — that there IS an effect, difference, or relationship. For example: "the drug lowers blood pressure" (one-tailed) or "the drug changes blood pressure" (two-tailed). The p-value tests H₀: small p-values provide evidence against H₀ and in favour of H₁. You never "prove" H₁ — you merely find or fail to find sufficient evidence against H₀. A p-value is essentially asking: "how consistent are these data with H₀?" A small p-value says: "not very consistent" → reject H₀.

A Type I error (false positive) occurs when you reject H₀ when it is actually true — concluding there's an effect when there isn't one. The probability of a Type I error equals α (your significance level). Choosing α = 0.05 means you accept a 5% chance of a false positive. A Type II error (false negative) occurs when you fail to reject H₀ when it is actually false — missing a real effect. The probability of a Type II error is denoted β. Statistical power = 1 − β = the probability of correctly detecting a real effect. There is a trade-off: reducing α (stricter standard) reduces Type I errors but increases Type II errors. In medical research where false positives are costly, α = 0.01 is often used. In exploratory research where missing effects is costly, α = 0.10 may be appropriate. This calculator helps you evaluate Type I error risk through the p-value vs α comparison.

The formula for a one-sample Z-test statistic is: Z = (x̄ − μ₀) / (σ / √n) where x̄ is your sample mean, μ₀ is the hypothesised population mean, σ is the population standard deviation, and n is the sample size. For a two-sample Z-test: Z = (x̄₁ − x̄₂) / √(σ₁²/n₁ + σ₂²/n₂). For a one-sample t-test (unknown σ): t = (x̄ − μ₀) / (s / √n) where s is the sample standard deviation. Use our Average Calculator to find x̄ and our Standard Deviation Calculator to find s, then plug into the formula and enter the result here.

P-hacking (also called data dredging or fishing) refers to manipulating data analysis decisions — collecting more data until p < 0.05, trying multiple tests and reporting only significant ones, choosing between one-tailed and two-tailed tests after seeing results, removing outliers selectively, or trying different subgroup analyses — until a significant result is found. This inflates the false positive rate far above α. For example, if you run 20 independent tests at α = 0.05, you'd expect one false positive just by chance. P-hacking is a significant contributor to the "replication crisis" in psychology and social sciences. Best practices to avoid it include: pre-registering your hypothesis and analysis plan before collecting data; adjusting for multiple comparisons (Bonferroni correction: α/number of tests); and reporting all analyses conducted, not just significant ones. Always report p-values alongside effect sizes and confidence intervals for full transparency.