Hypothesis Testing in Statistics

Easy read on Hypothesis Testing in Statistics in under 3 mins

Learn about hypothesis testing in Statistics, confusion matrix, types of hypothesis testing errors, Type 1 and Type 2 Errors.

What is hypothesis testing in statistics? 

Statistics is full of data, and to find out the critical interpretation from the data, we use Hypothesis Testing. 

Definition: 

Hypothesis Testing helps evaluate two or more mutually exclusive statements on a population using a sample of data from the population. 

Examples: 

In the court of law, in a criminal case, there are two statements. One is the defendant guilty, and the other exclusive to that is that the defendant is innocent. Hypothesis testing helps in arriving at the correct result with the data or evidence from the case. 

Hypothesis Testing in Statistics
Hypothesis Testing in Statistics

Steps in Hypothesis Testing

Step 1: 

Make an initial assumption that a particular statement is correct. The initial assumption is Null Hypothesis (H0). Contradictory to the Null Hypothesis is called Alternate Hypothesis (H1). 

Step 2: 

Start collecting information or insights from the data. In the example above, to prove a defendant is innocent or guilty, we have to collect DNA, Fingerprint, and alibi information. 

Step 3: 

After gathering sufficient evidence to infer. Based on the evidence or data insights, we will either accept the Null Hypothesis or Alternate Hypothesis. 

In the real-time scenario where the population is enormous, we will be analysing the data, i.e., sample data arrived from the population and not entirely on the population. Based on the result of the Sample Data, we will conclude the entire population itself. 

In the sample data, if we have enough insights that the H0 or the Null Hypothesis is correct, we will directly approve H0 and reject the H1 or the Alternate Hypothesis. 

Confusion Matrix in Hypothesis Testing

A confusion matrix, in predictive analytics is an in pairs table that lets us know the pace of false positives, false negatives, true positives and true negatives for a test or indicator. We can make a confusion matrix if we know both the predicted and actual values for a sample set.

Confusion Matrix in Hypothesis Testing
Confusion Matrix in Hypothesis Testing

Inference 1: If we have enough evidence and insight to prove the Null Hypothesis is correct, it is OK. 1st Row 1st Column

Inference 2: If we have enough evidence and insight to reject the Alternate Hypothesis, it is OK. 2nd row 2nd Column

Inference 3: If we know that H0 is correct, but we do not have enough evidence to “Do not reject H0”, this scenario is incorrect and called a Type I Error. 

Inference 4: If we know that H0 is correct, but we do not have enough evidence to “Reject H1”, and we do not have enough evidence to prove H0, this scenario is incorrect and called a Type II Error. 

Type I and Type II errors can be catastrophic, and we have to take extra caution to arrive at the correct results. 

Now that we have learnt about Hypothesis Testing in Statistics, we can read about Sampling in Statistics. If you are interested in learning Data Analysis, we can check out our R Programming course by Ampersand Academy

Sampling in Statistics

Easy read on Sampling in Statistics in just 5 mins

Learn about Sampling in Statistics — the difference between Sample and Population, Different types of Sampling in Statistics in detail.

Sampling is a method utilized in statistical examination in which we take a predetermined number of observations from a more significant population.

To understand sampling, we must first discuss the difference between Sample and Population. 

Sample:

  • It’s difficult for the Analyst or the Researcher to collect possible detail from the entire group. Instead, they choose a small group from the whole group. 
  • Samples are the subset of the Population that participates in research and represents the Population.
  • Samples come from the Population.

Population:

  • It is a collection from which a sample comes. 
  • It is an exact quantity involved in the entire study. 

Example of Sample and Population

To develop a vaccine for Covid, several companies have come with the research and vaccines they test to read efficacy. The vaccine is for the entire human race, and the Population for this study is all of us. Companies can’t make a study on the whole human race. Hence to make the vaccination quicker, they conduct clinical trials based on a closed group called Sample. That way, the research is quick, efficient and cost-effective. 

What is Sampling in Statistics?

Sampling is a method that allows researchers or analysts to infer knowledge about the Population based on Sample results without needing to investigate every individual. 

Sampling in Statistics
Sampling in Statistics

Advantages of Sampling

  • Reduces the number of individuals in studies
  • Reduces cost
  • Reduces workload
  • High-quality information to arrive at a better conclusion from the results

Categories of Sampling in Statistics

  • Probability Sampling 
  • Non-probability Sampling

Probability Sampling

  • In this sampling, every member of the Population has an equal chance of participating in the study.
  • For example, when we start flipping a coin at every instance, there is an equal chance of getting a “head” or a “tail” irrespective of how many times we flip the coin.

Types of Probability Sampling in Statistics

Types Sampling in Statistics
Types of Sampling in Statistics

Simple Random Sampling: 

  • Every member of the study has an equal chance of selection.
  • Selection depends on chance and randomness.
  • Simple Random Sampling is also called Simple Random Sampling.

Example: 

Consider a meeting that has people from different age groups, ethnicity, sex, colour and creed. If we have to create a sample using Simple Random Sampling, we choose individuals without any bias and randomly. 

Systematic Sampling

  • In Systematic Sampling, we choose the first member randomly and subsequent members systematically

Example:

Consider a class with roll numbers 1 to 40, and if we have to create a Sample by Systematic Sampling method, we choose the first student randomly, say 10. Subsequently, we have to select members like every 4th student from Roll Number 10.

Cluster Sampling

  • Researcher/analyst divide Population into a group called Clusters 
  • Groups can be externally homogenous and internally heterogenous 
  • The analyst creates random clusters
  • Select a member from a single cluster based on Simple Random Clustering

Stratified Sampling

  • Even in stratified sampling, we create a strata group based on specific characteristics and the select member from each stratum.

Example: 

Consider a meeting having people of different ages, sex, ethnicity. In Stratified Sampling, we group people based on age, sex, race. Then we select members from each stratum.

Non-probability Sampling

  • Sampling-based on non-random criteria
  • Not every individual in the Population has a chance of being included in the study.
  • This method is easier and cheaper to implement, but it cannot provide valid statistical inferences. 

Types of Non-probability Sampling in Statistics

Convenience Sampling:

  • Based on convenience 
  • Sampling happens to be most accessible to analysts or researchers.
  • It’s an easy way of sampling and cost-effective but no way to prove if it’s an accurate representation of the entire Population.
  • Convenience Sampling is also called Accidental Sampling.

Snowball Sampling:

  • The first Sample is selected based on random selection.
  • The first Sample then selects the subsequent selections. 
  • Snowball Sampling is also called Network Sampling.

Quota Sampling

  • Used by Market Researchers.
  • Samples are Tailor Selected.
  • Selection based on the proportion of the trait of the Population

Purposive / Judgemental Sampling in Statistics

  • Selection based on Judgement
  • Selection based on Analyst or Researchers
  • Used by Media or Public Surveys

Now that you have read about Sampling in Statistics, you can check our post on Distribution in Statistics. If you wish to learn Data Analytics, check the R Programming course by Ampersand Academy.

Normal Distribution in Statistics

Easy understanding of Distribution in Statistics in 2021

Learn about Distribution in Statistics and various types of Distribution in Statistics with examples and charts

Definition of Distribution:

Distribution is a function that shows the possible value for variables and how often they occur. 

Distribution in Statistics means listing all possible data in the sample space and how frequently they occur. In statistics, usually, if we mean Distribution, it generally implies Probability Distribution. Few examples of Distribution include 

  • Normal
  • Binomial 
  • Uniform
  • Skewed – Left and Right
  • Bimodal Distribution

Each probability distribution is associated with the graph describing the likelihood of occurrence of a particular event.

We can take a simple example of explaining the Distribution considering the example of Rolling a Dice. 

Rolling dice can produce a 1, 2, 3, 4, 5, 6. The chance of getting a 1 is 1/6, likewise for each turn-up, the probability is 1/6th. Whereas the likelihood of getting a 7 is 0. You’ll find that the Distribution of getting the side 1 to 6 is uniform. Now let’s discuss the various type of Distribution, using examples.

Normal Distribution:

Normal Distribution looks like a Bell Curve with the maximum probability of occurrence at the centre and minimum at the proximities. If you have to choose any particular event, there is a good chance that you’ll select from the centre. We can take a classic example of players height in the Indian Cricket Team, and when we plot, we can see that Rarely Crickets will be less than 1.5m in height and not more than 1.9m. Most of the players will fall between 1.7 to 1.8. This curve will have symmetry, as mentioned below.

Normal Distribution in Statistics
Normal Distribution in Statistics

Uniform Distribution:

Uniform Distribution is where the probability of occurrence of all the possible outcomes is equal. In the example of rolling a dice, the possibility of getting 1 to 6 is 1/6. The Uniform Distribution looks like the graph given below.

Uniform Distribution in Statistics
Uniform Distribution in Statistics

Bimodal Distribution: 

Bimodal Distribution has two sets of peaks or modes around which other possibilities distribute. The following chart is the classic example of Bimodal Distribution: 

Bimodal Distribution in Statistics
Bimodal Distribution in Statistics

Skewed Distribution:

Skewed Distribution is where one side of the Distribution is larger than the other. If the larger side is right, it is called Left Skewed Distribution; if the larger side is left, it is called Right Skewed Distribution. 

Skewed Distribution in Statistics
Skewed Distribution in Statistics

Now that you’ve read Distribution in Statistics, you can also read about the Data types in Statistics. If you’re looking to enrol for Data Analytics, check our training institute Ampersand Academy

Data Types in Statistics

Easy read on Data Types in Statistics in just 2 mins

Learn about Data Types in Statistics

In general, data types in statistics are a group of individual data points. Statistics consists of two main types of Data.

  1. Categorical Data Types
  2. Numerical Data Types

In this section, we will be discussing each of the Data Types with the example from Cricket since I watch cricket the most and its kinds of intriguing for me. We will explain in detail below about each of the data types

Categorical Data:

Categorical data is a data type stored as groups or categories with the help of names or labels. There are two types of Categorical Data Type

Nominal: Datatypes with no proper order

Ordinal: Datatypes with proper order

Numerical Data:

Numerical data is a data type that represents values that are measured and put into a logical order. There are two types of Numerical Data Type

Discrete: Discrete Data types are the ones that take only certain values. Discrete Data Types cannot be measured but it can be counted

Continuous: Continuous Data types are the ones that are measured and hence it cannot be counted.

Data Types in Statistics
Data Types in Statistics

Examples of each Data Type:

Categorical – Nominal Data Type:

Which team does “Virat Kohli” play?

{India, Pakistan, Bangladesh, Sri Lanka}

Data inside the parenthesis is called Sample Space

Categorical – Ordinal Data Type:

Which position does “Virat Kohli” bat?

{1 down, 2 down, 3 down, 4 down}

Numerical – Discrete Data Type:

How many runs did “Virat Kohli” score in World Cup?

{150, 250, 350, 450}

Numerical – Continuous Data Type:

What is the height of “Virat Kohli”?

{5.5, 5.6, 5.7, 5.8}

Now that you have learnt the data types of basics in statistics we will discuss Distribution, Sampling & Estimation, Hypothesis Testing and P values in statistics. If you’re looking to enrol for Data Analytics Course, check the R Programming course from our institute. Also, read about the Installation of SAS University Edition.