 PRIDE MONTH ALERT : FLAT 20% OFF On Our Best-Selling Courses Use -  PRIDE20 - Data Science Blogs -

# An Easy to Understand the Definition of the Confidence Interval

### Introduction

In most walks of life, we talk that I’m quite confident that this thing will happen or I’m quite confident about this. All of us at some point in time have been talking about this. If we look at statements made by us, it will be obvious to anybody that we are always certain with a degree of confidence in our statement. This statement is made in natural language and is quite vague if we can put a number to this, things will become more clear.

Now, let us say that we have a record of the number of times one made a statement like this was made. As soon as we start this process, we put a number and enter the domain of statistics. In the domain of statistics, it becomes possible for us to define the level of certainly using a number. This number can belong to a particular range. This range to which the number belongs can be said to be the confidence interval.

Technically speaking, a confidence interval is the measure of uncertainty in the data. These are widely used to depict how much confidence one can place in the outcomes reflected in the results of a poll or survey. Confidence intervals are connected to confidence levels due to their inherent nature.

### Confidence Intervals – The definition:

Theoretically, the confidence interval is the range of values which have derived from a sample population and depict the behavior of an unknown sample belonging to the broader population group. As can be observed from the natural surrounding that No two groups are the same. Thus, if two different samples are taken from a single population they can also yield a difference in the range. But, if this process is repeated several times, some parts of the resulting confidence interval will be able to replicate the unknown behavior. The percentage is known as the confidence level.

### Differentiating confidence interval from confidence level:

Though the concept of the confidence interval and confidence level are intrinsically linked, they still are different concepts in the domain of statistics whereas in slang language that can still be used interchangeably. Usually, confidence intervals are depicted in percentage terms whereas confidence intervals are sheer numbers. For example, if a survey of lactating mothers is conducted to have an estimate of the number of diapers purchased by each one of them. The results yield a confidence level of 95% and we get a confidence interval of (100,200). This would mean that a lactating mother on an average buys diapers somewhere between 100 and 200 and this could be true for 95% times. Diagrammatically, it is being displayed in Fig. 1. ### Known Standard Deviation

How to calculate the confidence interval for a population with a known standard deviation:

One of the methods to compute the confidence interval is the use of error bound method. The other being the use of a central limit theorem. In this blog, we’ll be using the error bound method. This method gets its name from the fact that it gives the boundary of the interval as obtained from the standard deviation making its application a bit more straightforward. To compute the interval for a population with mean ?, where the standard deviation of the population is known, x and the margin of error is required. IN this case, the margin of error (EBM) is called the error bound for the population.

The mathematical formulation for margin of error is as: The margin of error depends upon the confidence level (CL) which is to be achieved. depicts the probability that an unknown population parameter is not contained in the confidence level.

Mathematically, =1-CL

For a population with known standard deviation, the confidence interval is based upon the assumption that sampling distribution follows approximately a normal distribution.

To achieve a confidence interval of 95% of the probability distribution, 95% of the area under the curve should be computed. I.e. one should leave a 2.5% area on the tails. This for a distribution whose mean is 10 and distribution is normal would like as in Figure 2: Figure 2: area under the curve

Steps for calculating the confidence interval using the Error Bound Method:

• First of all, calculate the mean of the sample data.
• Calculate the z-score using the standard normal table corresponding to the confidence level desired.
• Compute the error bound for EBM
• Define the confidence interval
• Explain the final output using standard English literature.

Example:

Let us say that we are interested in finding the mean scores on an examination. Now, a random sample of 40 scores is taken and mean for which is 68 and the standard deviation is 3. Here, the point of interest is to find the confidence interval with 95?curacy.

So,

To calculate the confidence interval, the sample mean is an error bound is required.

• • EBM = • ?=3;n=40 (Given)
• Confidence interval is 95% (CL = 0.95)
• Thus, ?=0.05

=> => Thus, the left area is is 0.025 and right are is 0.095

Using the probability distribution table for the standard normal distribution we can infer that for a normal distribution the value of z for this case is 1.96.

Thus, my error bound, EBM  = 1.96*( ) = 12.397

EBM = 0.93

Lower bound  = 68 – 0.93 = 67.07

Upper bound = 68 + 0.93 = 68.93

Thus, 95% confidence interval is (67.07, 68.93)

### Unknown Standard Deviation

While doing statistics over a dataset, the standard deviation of the same is rarely known. Thus, the method as described in the previous section is seldom used. But, unknown standard deviation had never been an issue, owing to the large sizes of datasets from which the standard deviation was calculated making way for calculation of confidence interval. But if the dataset had fewer observations, the final was never good enough to be trusted.

This problem was faced by William S. Gosset of Guinness brewery in Dublin when he was experimenting with hops and barley and generated very few observations. While performing the experimentation, he realized that normal distribution was not the ideal way of calculating the confidence interval. This problem resulted in the genesis of the student’s t-distribution.

The student’s t-distribution is the symmetrical curve. It's different from a normal distribution in the sense that its flat over the x-axis and hence, can take a larger number of standard deviations to capture the same amount of area under the curve. In reality, there can be an infinite number of students t curve which can be achieved by altering the number of samples. With the increase in sample size, the t-distribution takes the shape of a normal curve and with fewer samples, it flattens towards the x-axis. The relationship between normal distribution and t-distribution can be depicted as in figure 3. Figure 3: Relationship (source:openstax.org)

Developing the student t-distribution:

Step 1: Calculate t using the following formula: Where Z is the standard normal distribution and x2 is the chi-squared distribution with v degree of freedom.

Step 2: By Substituting the numerator and denominator with expanded values we get: Step 3:

Finally, for v = n-1, this equation becomes: Reiterating the confidence interval formula for the mean for scenarios where sample size is less than 30 and standard deviation is unknown, the confidence interval becomes: Here the point estimate of the population standard deviation, s has been substituted for the population standard deviation, ?, and t?,? has been substituted for Z?.

V represents the degree of freedom of the distribution and varies according to the size of sample.

### Conclusion:

A number of types of estimates are compounded from the statistics of a dataset. One of them which shows the level of certainty in talk is called confidence interval. It proposes a range of values to which an unknown parameter belongs. This is associated with the confidence level that the proposed parameter is in the range. FaceBook Twitter Google+ LinkedIn Pinterest Email

### Trending Courses AWS

• AWS & Fundamentals of Linux
• Amazon Simple Storage Service
• Elastic Compute Cloud
• Databases Overview & Amazon Route 53 Upcoming Class

4 days 08 Jul 2022 DevOps

• Intro to DevOps
• GIT and Maven
• Jenkins & Ansible
• Docker and Cloud Computing Upcoming Class

11 days 15 Jul 2022 Data Science

• Data Science Introduction
• Python & Intro to R Programming
• Machine Learning Upcoming Class

4 days 08 Jul 2022 • Architecture, HDFS & MapReduce
• Unix Shell & Apache Pig Installation
• HIVE Installation & User-Defined Functions
• SQOOP & Hbase Installation Upcoming Class

4 days 08 Jul 2022 Salesforce

• Salesforce Configuration Introduction
• Security & Automation Process
• Sales & Service Cloud
• Apex Programming, SOQL & SOSL Upcoming Class

11 days 15 Jul 2022 QA

• Introduction and Software Testing
• Software Test Life Cycle
• Automation Testing and API Testing
• Selenium framework development using Testing Upcoming Class

4 days 08 Jul 2022 • BA & Stakeholders Overview
• BPMN, Requirement Elicitation
• BA Tools & Design Documents
• Enterprise Analysis, Agile & Scrum Upcoming Class

4 days 08 Jul 2022 MS SQL Server

• Introduction & Database Query
• Programming, Indexes & System Functions
• SSIS Package Development Procedures
• SSRS Report Design Upcoming Class

4 days 08 Jul 2022 Python

• Features of Python
• Python Editors and IDEs
• Data types and Variables
• Python File Operation Upcoming Class

4 days 08 Jul 2022 Artificial Intelligence

• Components of AI
• Categories of Machine Learning
• Recurrent Neural Networks
• Recurrent Neural Networks Upcoming Class

12 days 16 Jul 2022 Machine Learning

• Introduction to Machine Learning & Python
• Machine Learning: Supervised Learning
• Machine Learning: Unsupervised Learning Upcoming Class

25 days 29 Jul 2022 Tableau

• Introduction to Tableau Desktop
• Data Transformation Methods
• Configuring tableau server
• Integration with R & Hadoop Upcoming Class

5 days 09 Jul 2022

### Browse Categories

Search Posts

Trending Posts

Related Posts

Logistic Regression is Easy to Understand 2.8k

Top 10 Data Science Influencers Who Can Help Carve Your Career in Data 1.3k

ARIMA like Time Series Models and Their Autocorrelation 1.5k

Random Forest: An Easy Explanation of the Forest 2.9k

An Easy To Understand Approach For K-Nearest Neighbor Algorithm 2.2k

Receive Latest Materials and Offers on Data Science Course