What are the Measures of Variability in Statistics?
What are the Measures of Variability in Statistics?
Data is everything in AI today. Knowing the middle value—like mean or median—is good, but we also need to know how much the data spreads. Let's take a look about different measures of variability: range, variance, standard deviation, and interquartile range. This is in continuation to understanding Central Tendency of data (If you have not already read that post, Have a quick look at “What is Central Tendency?”).
What is Variability?
Variability shows how far data points are from each other or the center. For example, If we say a data set has a mean of 11.4 doesn’t tell us if numbers are close to it or all over. In AI, spread matters—big swings might mean bad data for a model. For example, In any application different job run times spread could show system issues.
We measure variability with:
- Range
- Variance
- Standard Deviation
- Interquartile Range
Let’s use the same numbers from last post to see how they work.
The Measures
Range
Range is easy—it’s the biggest number minus the smallest. It shows the full spread but not what’s in between.
Range = Biggest number - Smallest number
Data:
8, 10, 15, 6, 18, 10, 22, 16, 2, 7
Re-arranging this data in ascending order: 2, 6, 7, 8, 10, 10, 15, 16, 18, 22
Biggest = 22, Smallest = 2
Range = 22 - 2 = 20
Range is quick for AI—like checking data limits in a model—but one odd number messes it up. Like If we have all off the data points less than 22, what happens if one of this entry is 200. Then range will become 198 (200 - 2), which doesn't always represent correctly.
Variance
Variance looks at how far each number is from the mean, squares it, then averages those squares. It’s exact but uses squared units.
Variance = (Sum of (each number - mean)2) / total numbers
Mean of given data is is 11.4 (from last post). Let’s calculate:
Differences: (8-11.4), (10-11.4), (15-11.4), ..., (7-11.4)
= -3.4, -1.4, 3.6, -5.4, 6.6, -1.4, 10.6, 4.6, -9.4, -4.4
Square them: 11.56, 1.96, 12.96, 29.16, 43.56, 1.96, 112.36, 21.16, 88.36, 19.36
Sum = 11.56 + 1.96 + 12.96 + 29.16 + 43.56 + 1.96 + 112.36 + 21.16 + 88.36 + 19.36 = 342.4
Variance = 342.4 / 10 = 34.24
In AI, high variance means your data’s messy—might need cleaning before training. Less the variance, the close the data is from the mean.
Standard Deviation
Standard deviation is the square root of variance. It’s like variance but in the same units as your data.
Standard Deviation = Square root of variance
Variance = 34.24
Standard Deviation = √34.24 ≈ 5.85
Most numbers are within 11.4 ± 5.85. In AI, it’s used to scale data or spot odd values.
Interquartile Range
Interquartile range (IQR) is the spread of the middle 50% of ordered data. It ignores extreme values.
IQR = Q3 - Q1
Q1 is the median of the lower half, Q3 is the median of the upper half. Formulas:
Q1 = (n+1)/4 th entry
Q3 = 3(n+1)/4 th entry
Ordered data: 2, 6, 7, 8, 10, 10, 15, 16, 18, 22
n = 10
Q1 position = (10+1)/4 = 2.75 ≈ 3rd entry = 7
Q3 position = 3(10+1)/4 = 8.25 ≈ 8th entry = 16
IQR = 16 - 7 = 9
IQR’s good for AI data prep—it skips outliers.
Example in Action
Let's consider we are job times (seconds) of an application: 8, 10, 15, 6, 18, 10, 22, 16, 2, 7. Mean’s 11.4, but variability shows:
- Range = 20: Big gap, check extremes.
- Variance = 34.24: Numbers vary a lot.
- Standard Deviation = 5.85: Most times are 5.55–17.25 seconds.
- IQR = 9: Middle half is tighter, 2 and 22 are outliers.
In AI, this helps clean data for models or flag issues in an application.
Wrapping Up
Range, variance, standard deviation, and IQR helps us understand how data spreads. They’re key for AI and coding. See my “What is Central Tendency?” post for more about central tendency (mean, mode and median). Got a stats tip or want another topic? Comment or use the contact form—let’s keep it going!
Comments
Post a Comment