Lesson 30 : Inferential Statistics

Inferential statistics refers to a collection of statistical methods in which random sample

results are used to draw an inference make a statement or reach a conclusion about an

entire population in general there are two basic types of inferential

statistics methods confidence interval estimation and hypothesis testing procedures

confidence interval estimation is used when the issue under investigation involves learning the

value of an unknown population parameter that is we have no idea what the value of the population

parameter is ahead of time so we can estimate it using confidence interval techniques hypothesis

testing procedures are used when the issue under investigation involves assessing the validity

of an assumed value of a particular population parameter in this case we do have an idea what

the value of the population parameter is ahead of time and with all ideas it's either a good idea

or a bad idea so we use our sample data to test the idea in a hypothesis testing procedure for

instance a student in the nursing program wanted to know the average starting salary of registered

nurses well since the student does not know the average starting salary of registered nurses she

can take a random sample of registered nurses starting salaries and use a confidence interval

to estimate the value for instance a counselor told a student in the nursing program that the

average starting salary of registered nurses is $50,000 per year in this case a random

sample of registered nurses starting salaries can be used in a hypothesis testing procedure

to test the validity of the counselor stated average starting salary of registered nurses

the only way to really know with certainty the value of a population parameter is to

conduct a census by collecting data from each and every member of the entire population it

is only when 100% of the data that is a census is actually known that the value

of a particular population parameter can be determined with 100% accuracy

realistically speaking in most circumstances conducting a census is either impractical or

impossible to achieve as a result the actual data collected only comes from part of the

population that is a sample since sample results are not based on data collected from 100% of the

population the population parameter cannot be determined with 100% accuracy nonetheless when

unbiased representative random samples are collected from the population the results

obtained in the sample it is the statistics are used to infer what the results in the population

that is the parameters might actually be in confidence interval estimation a random sample

is collected from the population and the resulting sample statistics are used to determine the lower

limit L and upper limit U of an interval that accurately estimates the actual value of the

unknown population parameter one minus alpha times 100% of the time so to illustrate this

idea of a confidence interval estimate will focus on two aspects the confidence and the

interval first the interval a confidence interval is a range of values from some lower limit L to

some upper limit u such that the actual value of the population parameter is estimated to fall

somewhere within it now for the confidence the confidence is referring to the likelihood that

this population parameter will be contained within the interval that is the population parameter is

estimated to be between L and you with one minus alpha times 100% confidence where the notation one

minus alpha times 100% represents what we call the confidence level and the confidence level

is the probability that the confidence interval accurately estimates the population parameter

so in order to have a great deal of confidence in the estimate we need to set our confidence

level very high now in statistics the customary value used for alpha which is called the level of

significance is 0.05 thus the customary confidence level use in statistics is 1 minus the point O 5

or 95% thus we can construct confidence intervals that can accurately estimate the value of the

population parameter 95% of the time even though the value of a population parameter is usually

unknown the actual value is fixed the exact value could be determined if a census is conducted

on the other hand since random sample results are used to calculate the confidence interval

the resulting lower limit L and upper limit you produce varying results determined by chance due

to random sampling so to illustrate this imagine we've collected a random sample and constructed

the confidence interval now it's the actual data values in the confidence intervals which we use to

produce the lower limit L and upper limit you to form the confidence interval so let's say

just by chance we happen to get a few lower data values in the sample thus the resulting confidence

interval will estimate lower values or let's say just by chance we happen to get a few larger data

values the resulting confidence interval in that sample will estimate a little larger values on

the other hand we could collect sample data that contains both smaller data values and

larger data values thus the resulting confidence interval will produce a much wider range of data

values to reflect that diversity in the sample but if we happen to collect data that doesn't

contain any small data values or any large data values just mostly data values in the middle the

confidence interval estimate will be much more concentrated in the data values in the middle

so just due to chance alone and random sampling we get different confidence interval estimates

because the different data values in the random sample are themselves different so keep in mind

when it comes to confidence interval estimates the confidence interval themselves is random and takes

on bearing results just due to chance and random sampling but what each confidence interval is

trying to do is estimate the actual value of the population parameter and this population parameter

value although unknown is fixed or constant it's the target that each of these confidence intervals

are trying to hit so when we say that a confidence interval is accurate one minus alpha times 100%

of the time we mean that each time we construct one of our confidence intervals it has this one

minus alpha times 100% probability of hitting its target that is having the actual population

parameter be contained within the interval thus 95% of confidence intervals constructed with the

customary level of confidence accurately estimate the value of the population parameter as seen here

most all of these confidence intervals contain the actual population parameter on the other

hand only 5% of confidence intervals constructed with the customary level of confidence do not

accurately estimate the value of the population parameter and as we see here we actually have

one confidence interval not contain the actual value of the population parameter

in hypothesis testing procedures a random sample is collected from the population if the resulting

sample statistics are consistent with the assumed value of the population parameter the validity of

the assumed or hypothesized value is confirmed alternatively if the resulting sample statistics

contradict the assumed value of the population parameter the assumed value is considered to be

invalid to illustrate this in hypothesis testing procedures the value of the population parameter

is assumed to take on a certain value then a sample is collected and the corresponding

sample statistic is calculated if the sample statistic is consistent with the value of the

hypothesized population parameter then we conclude that this population parameter value is valid but

if the resulting sample statistic differs from the hypothesized population parameter

then this sort of gives evidence that maybe the hypothesized population parameter is not

valid but you must be careful when making this sort of conclusion because for instance if the

particular population parameter hypothesis is the mean mu and the resulting corresponding sample

statistic is the sample mean x-bar we know that the result of the central limit theorem tells us

that the distribution of the sample mean x-bar follows a normal distribution and just due to

chance alone if the actual population parameter mean mu is true there is a 50% chance that the

resulting sample mean x-bar will be less than that value and by chance there's a 50% chance

that the actual sample statistic will be larger than the sample mean so this is something that can

just happen due to chance alone so just because the resulting sample statistic differs somewhat

from the population parameter we don't necessarily want to conclude that that hypothesized value is

invalid now in hypothesis testing what we do is we want that sample evidence - really extremely

contradict the hypothesized value so if the resulting sample statistic is extremely different

from the hypothesized population parameter then we say that we have enough evidence to really

contradict it thus leading us to a conclusion that the hypothesized population value is invalid

in hypothesis testing the probability that the sample statistic is at least as Extreme as the

resulting sample value is calculated under the assumption that the population parameter equals

the hypothesized value this probability is referred to as the p-value so it's this

probability with the p-value that is calculated that determines this result of being as Extreme

as the sample result just due to chance alone is how we'll reach our decision when the resulting

p-value is 0.05 or less it would be considered unusual to obtain these sample results by chance

alone therefore the more likely explanation of these sample results is that the assumed value

of the population parameter is invalid thus when the resulting sample statistic is considered to be

an unusual outcome we reject this hypothesized value of the population parameter in favor of

an alternative explanation which is much more consistent with the sample results thus having

a more likely explanation to summarize there are two basic types of inferential statistics methods

confidence interval estimation and hypothesis testing procedures in upcoming lessons we'll be

learning how to construct for different types of confidence interval estimates two of which

involve the population mean mu which is an average based off of measurement data also

we'll be learning how to estimate proportions which are percentages based off of count data

for both the mean and the proportion we'll be learning how to construct confidence interval

estimates for just one population mean mu and we'll be learning how to construct confidence

interval estimates for one population proportion P in upcoming lessons we'll also be learning how to

estimate the difference between two population means mu1 minus mu2 and we'll be learning how

to estimate the difference between two population proportions p1 minus p2 in upcoming lessons we'll

also be learning how to conduct for different types of hypothesis test procedures two of which

involve the population mean mu which is an average based off of measurement data and to involve tests

involving proportions or P which are percentages based off of count data now for both the mean and

the proportion we'll be learning how to conduct hypothesis test procedures involving just one

population mean mu and tests involving just one population proportion P we'll also cover lessons

that allow us to make comparisons between two population means mu1 and mu2 and comparisons

involving two population proportions p1 and p2 so regardless of which inferential statistic method

we apply that is whether it be confidence interval estimation or hypothesis testing

procedures it's important not to lose sight of the main idea behind inferential statistics

inferential statistics uses random sample results to reach a conclusion about an entire population

you