View Single Post
  #22  
Old Posted Jul 23, 2014, 4:48 AM
DenverInfill's Avatar
DenverInfill DenverInfill is offline
mmmm... infillicious!
 
Join Date: Jan 2004
Location: Lower Highland, Denver
Posts: 3,355
Quote:
Originally Posted by Tom Servo View Post
That's fine.

But if you have two number sets like say...

13, 16, 18, 20, 549, 757, 1006

30, 32, 44, 51, 53, 67, 79

You can't seriously tell me the bottom represent a greater value simply because its median value is higher; the average value of the top is nearly seven times greater than the bottom.

Sorry, but I just don't understand why the median is used the analyze city data.

...

And how are real estate values bounded on the bottom end but not the top end? Both ends fluctuate.

A measure of central tendency, like mean, median, or mode, is a single number that is trying to capture or represent the essence of an entire data set. Consequently, there'll always be a loss of information when attempting to use one number to represent many, but you want to choose the measure that represents the data as best as possible, given the nature of the data set in question.

Some data sets, test scores for example, are naturally bounded on both ends, meaning you can't score lower than a zero or higher than one hundred. Under normal circumstances, there will be a few people that score really poorly (outliers on the low end) and a few people that score really well (outliers on the high end) but most of the people's scores will be clumped in the middle, giving us the bell-shaped "normal curve" if you were plot all scores as a histogram. In cases like these, the mean (average) is the best measure of central tendency because of the symmetrical distribution of the data.

Other data sets, income for example, are naturally bounded on only one end. You can't earn less than zero (that's the bound on the lower end) but theoretically there's no bound (upper limit) to income on the high end. If you were to plot the population's income as a histogram, you'd find that the vast majority of people are clustered at the lower end of the scale, resulting in a steep top-of-the-curve near the far left, with the curve then sloping gently off to the right representing a decreasing number of people who make higher and higher incomes. That gives you a right-skewed distribution--a bell-curve that is stretched or skewed off to the right due to the presence of extreme values at the high end. In a data set like this, the mean is not the best measure of central tendency because it will be influenced (pulled to the right) by the presence of the extreme values at the high end giving you an average that doesn't really represent the essence of the distribution. Therefore, the median is a better measure because the middle value of the entire data set is going to be down at the lower left end where most of the values are clustered.

This is why you always see the median used for attributes such as income and housing prices, while mean is used for things like height, temperature, and test scores. You use the measure that's the best fit for the distribution of the data at hand.
__________________
~ Ken

DenverInfill Blog
DenverUrbanism
Reply With Quote