What is Central Tendency? And, How do we measure it?
What is Central Tendency?
In the age of Artificial Intelligence, World revolves around data. Understanding the data is a key. When working with numerical data, at times we would want to represent whole data set with a single value (like average of the data) that we may call "Central Value" or "Middle Value".
How do we measure Central Value or Central Tendency?
There are different ways to measure the central value of the data provided. This could vary depending on the how we want to look at the data. Like,
- Do we need to calculate an average of the data?
- Do we need to split the data into two equal parts?
- Do we need to identify the value that occurred most of the time in data?
Based on what we need to look at, Below are the different measures.
- Mean
- Median
- Mode
Let's now look at what each measure is and how do we calculate mean with an example.
Mean
Mean can also be called as an arithmetic mean or an arithmetic average of the data. In other words, Mean is an average of the data i.e., Sum of all the data (numbers) divided by the total number of data points.
Mean = (Sum of all the data) / (Number of data points)
Let's have a look at this with an example,
8, 10, 15, 6, 18, 10, 22, 16, 2, 7
We have 10 numbers provided in this example.
Mean = (Sum of the data) / (Number of data points)
Mean = (8 + 10 + 15 + 6 + 18 + 10 + 22 + 16 + 2 + 7) / 10
= 114 / 10
= 11.4
As we can see in the above example, Mean does not necessarily be the point in the data set.
Now, let's generalize the formula for calculating the mean. If we have 'n' data points (or numbers) x1, x2, x3, . . . , xn. Then below is the formula for calculating the mean.
Mean = (x1 + x2 + x3 + . . . + xn) / n
x̅ = (1/n) ∑i=1n xi
Mean can be used to normalize features in machine learning models.
Now that we understand how to calculate mean, Let's now have a look at what a median is and how to calculate median.
Median
Unlike mean, Median is the middle value from the data set. Pre-requisite for calculating the median is that the input data is ordered. In other words, median splits the input data set in such a way that 50% of data is below median and 50% of data is above the median.
Let's now take a look at the same example we have used for calculating the mean.
8, 10, 15, 6, 18, 10, 22, 16, 2, 7
This data is not ordered, let's order this data in an ascending order.
2, 6, 7, 8, 10, 10, 15, 16, 18, 22
We have a total of 10 numbers here (i.e., even number), How do we identify the middle value here?
If we take the 5th value, then it would be like we would have 4 numbers below the median and 5 numbers above the median.
So, If the total number of data points are even, then the Median should be the average of the middle two numbers.
In this case, average of 5th & 6th numbers. i.e., (10 + 10) / 2
Let's assume we have another number and the total number of data points are 11. Then middle number would be the median.
2, 6, 7, 8, 10, 10, 15, 16, 18, 22, 25
In the above example, 6th number becomes the median i.e., 10.
Let's now look at the formula for calculating the median considering that there are total of n numbers.
If n is even,
[n/2th entry + (n/2)+1 th entry] / 2
If n is odd,
(n+1)/2 th entry
Median helps in identifying outliers in data preprocessing for machine learning.
Let's now look at what a mode is.
Mode
Mode is a bit different from the mean and median. Mean is the average of all the numbers (or data points) and Median is the middle entry from the data set. Mode is the most frequently occurring number (or value) in a data set.
Let's take a look at it with an example.
8, 10, 15, 6, 18, 10, 22, 16, 2, 7
In the above data, most of the numbers are only present once except 10 which is present twice.
So, mode for the above data is 10.
As we can think, there can be multiple modes in the data set.
8, 10, 15, 6, 18, 10, 22, 16, 2, 7, 18
In this example, we have both 10 and 18 are repeated twice, Modes for this data are 10 and 18.
If the data set contains only one mode, then we call it uni-modal data and if it contains more than one mode, then we call it multi-modal data.
Data set does not necessarily contain a mode, if the data is unique and no entry is repeated more than once, then we say there is no mode for the data set. See below example.
8, 10, 15, 6, 18, 11, 22, 16, 2, 7, 13
Mode helps in identifying common categories in Machine learning classification tasks.
Wrapping Up
Mean, Median and Mode are the 3M's that are key for identifying the central point of the given numeric data set. I hope this has given an insight on what is a Central Tendency is and how do we measure it? Do you think there is something more that should be added to this or like to have a different topic covered as a next blog post? Drop a comment or hit the contact form—let’s keep decoding together!
Comments
Post a Comment