It is an estimate of an interval used in statistics, which contains a population parameter. This unknown population parameter is found through a sample model calculated from collected data.
Example: the mean of a collected sample x̅ may or may not coincide with the true population mean μ. For this, it is possible to consider a range of sample means where this population mean can be contained. The longer this interval, the more likely it is to do so.
The confidence interval is expressed as a percentage, called the confidence level, with 90%, 95% and 99% being the most suitable. In the image below, for example, we have a 90% confidence interval between its upper and lower limits (o and -a).
Example 90% Confidence Interval between your upper (a) and lower (-a) limits.
The Confidence Interval is one of the most important concepts in statistical hypothesis testing, as it is used as a measure of uncertainty. The term was introduced by the Polish mathematician and statistician Jerzy Neyman in 1937.
What is the relevance of a Confidence Interval?
The confidence interval is important to indicate the margin of uncertainty (or imprecision) in front of a calculation made. This calculation uses the study sample to estimate the actual size of the result in the source population.
Calculating a confidence interval is a strategy that takes error sampling into account. The size of your study result and its confidence interval characterize the assumed values for the original population.
The narrower the confidence interval, the greater the probability of the population percentage of study represent the real number of the population of origin, giving greater certainty as to the result of the object of study.
How to interpret a Confidence Interval?
Correct interpretation of the confidence interval is probably the most challenging aspect of this statistical concept. An example of the most common interpretation of the concept is as follows:
There is one 95% probability that, in the future, the true value of the population parameter (for example, mean) falls within the range X (lower limit) and Y (upper limit).
Thus, the confidence interval is interpreted as follows: it is 95% confident that the range between X (lower limit) and Y (upper limit) contains the true value of the population parameter.
Would be totally incorrect state that: there is a 95% probability that the interval between X (lower limit) and Y (upper limit) contains the actual value of the population parameter.
The above statement is the most common misconception about the confidence interval. After the statistical range is calculated, it can only contain the population parameter or not.
However, the ranges can vary between samples, while the true population parameter is the same regardless of the sample.
Therefore, the probability statement concerning the confidence interval can be made only in the case where the confidence intervals are recalculated for the number of samples.
The steps of calculating the Confidence Interval
The range is calculated using the following steps:
- Gather sample data: no;
- Calculate the sample mean x̅;
- Determine whether a population standard deviation (σ) is known or unknown;
- If a population standard deviation is known, a point can be used. z for the corresponding confidence level;
- If a population standard deviation is unknown, we can use a statistic t for the corresponding confidence level;
- Thus, the lower and upper limits of the confidence interval are found using the following formulas:
The) Standard deviation of a known population:
Formula for calculating the standard deviation of a known population.
B) Standard deviation of an unknown population:
Formula for calculating the standard deviation of an unknown population.
Practical example of a confidence interval
A clinical study evaluated the association between the presence of asthma and the risk of developing Obstructive Sleep Apnea in adults.
Some adults were randomly recruited from a list of state civil servants to be followed over four years.
Participants with asthma, when compared with those without, had a higher risk of developing apnea within four years.
When conducting clinical trials like this example, one typically recruits a subset of the population of interest to increase the efficiency of the study (less cost and less time).
This subgroup of individuals, the studied population, is made up of those who meet the inclusion criteria and agree to participate in the study, as shown in the image below.
Explanatory graph of the population studied in the example.
Then, the study is completed and an effect size is calculated (for example: an average difference or one relative risk) to answer the survey question.
This process, called inference, involves using data collected from the study population to estimate the actual effect size in the population of interest, ie, the source population.
In the example given, the researchers recruited a random sample of state employees (source population) who were eligible and agreed to participate in the study (study population) and reported that asthma increases the risk of developing apnea in the population studied.
To account for a sampling error due to recruiting only a subset of the population of interest, they also calculated a 95% confidence interval (around the estimate) of 1.06 - 1.82, indicating a probability of 95% that the true relative risk in the population of origin would be between 1.06 and 1.82.
Confidence Interval for Average
When you have information on the standard deviation of a population, you can calculate a confidence interval for the mean or mean of that population.
When a statistical characteristic being measured (such as income, IQ, price, height, quantity, or weight) is numerical, in most cases the mean value for the population is estimated to be found.
Thus, we seek to find the population average (μ) using a sample mean (x̅), with a margin of error. The result of this calculation is called confidence interval for the population mean.
When the population standard deviation is known, the formula for a confidence interval (CI) for a population mean is:
Where:
- x̅ is the sample mean;
- σ is the population standard deviation;
- nois the sample size;
- Ζ* represents the appropriate value of the standard normal distribution for your desired confidence level.
Below are the values for the various confidence levels (Ζ*):
Trust level | Z value*- |
---|---|
80% | 1.28 |
90% | 1,645 (conventional) |
95% | 1.96 |
98% | 2.33 |
99% | 2.58 |
The table above shows z* values for the given confidence levels. Note that these values are taken from the standard normal distribution (Z-).
The area between each z * value and the negative of that value is the percent confidence (approximate). For example, the area between z * = 1.28 and z = -1.28 is approximately 0.80. Therefore, this table can also be expanded to other confidence percentages. The table only shows the most used confidence percentages.
See also the meaning of Hypothesis.