Statistics

Statistics is the area of study dealing with the presentation, analysis and interpretation of data. Facts or figures, collected with a definite purpose, are called data. Data can be presented graphically in the form of bar graphs, histograms and frequency polygons.

Measures of Central Tendency

Central tendency means you have to find value at the "center". There are three ways of measuring central value.

  1. Mean: It is found by adding all the values of the observations and dividing it by the total number of observations.
  2. Median: It is the value of the middle-most observation.
  3. Mode: It is the most frequently occurring observation.

Mean (Arithmetic Mean or Simple Average)

Add up the numbers and divide by how many numbers.

Example: What is the mean of numbers: 3, 8, 5, 13, 20, 23? 

The sum of these numbers is 72. There are 6 numbers. Mean is equal to 72/6 = 12.

Mean of Grouped Data

Median (Middle Value)

List all numbers in ascending order and select the middle one. If there are two middle numbers, take average of both.

Example: What is the median of 13, 7, 21, 23, 40, 23, 12, 29?

Arrange these numbers in order, you'll get: 7, 12, 13, 21, 23, 23, 29, 40. There are eight numbers, 21 and 23 are two middle numbers. Median will be average of both i.e. 22.

Median of Grouped Data

Mode (Most Often Value)

The Mode is the value that occurs most often. If more than one value occur most often, then there can be more than one Mode. When there are two modes it is called bimodal, when there are three or more modes is is called multimodal.

Example: Find the mode of 13, 7, 21, 23, 23, 40, 23, 12, 29?

Arrange these numbers in order and you will find 23 appears most often. Hence, mode is 23.

Mode of Grouped Data

Frequency Distribution

When there is large amount of data, it is grouped together and frequency distribution table is created. A frequency distribution is defined when the following two information is specified:

  1. The value which the variable takes
  2. The number of times (i.e. frequencies) a value is taken by a variable.

The cumulative frequency of a class is the frequency obtained by adding the frequencies of all the classes preceding the given class.

Class Limit: The starting and end values of each class are called “lower limit” and “upper limit” of that class respectively.

Class Interval: The difference between the upper and lower boundary of a class is called the class interval or size of the class. It can also be defined as the difference between the lower or upper limits or boundaries of two consecutive classes.

Histogram: Pertaining to a frequency distribution, if the true limits of the classes are taken on the x-axis and the corresponding frequencies on the y-axis and adjacent rectangles are drawn, the diagram is called histogram.

Frequency Polygon and Frequency Curve: If the points pertaining to the mid values of the classes of a frequency distribution and the corresponding frequencies are plotted on a graph sheet and these points are joined by straight lines, the figure formed is called frequency polygon. If these points are joined by a smooth curve, the figure formed is called frequency curve.

Cumulative Frequency Curves: If the points pertaining to the boundaries of the classes of a frequency distribution and the corresponding cumulative frequencies are plotted on a graph sheet and they are joined by a smooth curve, the figure formed is called cumulative frequency curve.

Relationship Between Mean, Median & Mode

In case of symmetrical distribution, Mean = Median = Mode

In case of a moderately asymmetrical distribution, Mode = 3 Median - 2 Mean

Measures of Dispersion / Spread

The dispersion of data is the measure of spreading (scatter) of the data about some central tendency. It is measured in the following types:

  1. Range
  2. Quartile Deviation
  3. Mean Deviation
  4. Standard Deviation
  5. Variance

Range

The Range is the difference between the lowest and highest values. It is the difference between two extreme observations of a distribution. 

Range = Maximum value - Minimum value

Example: What is the range of 4, 6, 9, 3, 4, 11?

The lowest value is 3 and the highest value is 11. Range = 11-3 = 8.

Quartile Deviation

After arranging the data in the ascending order of magnitude you find

QD = (Q3 - Q1)/2

Standard Deviation and Variance

The Standard Deviation is a measure of how spread out numbers are. It is represented by the greek letter sigma, σ. The standard deviation is the square root of the Variance. 

Variance

Variance is defined as the average of the squared differences from the Mean. To calculate variance

  • First, find the mean (simple average) of the numbers
  • Then for each number, subtract the mean and square the result (squared difference)
  • Find the mean of these squared differences

Standard Deviation

To find the standard deviation, take square root of the variance.