A family of Microsoft spreadsheet software with tools for analyzing, charting, and communicating data.
Errata.... I wrote:
Maverik wrote:
I have an array of numbers have values that range from 0-50000 and I would like to find a function or way to split them up into 10 groups where their values are most similar.
[....]
Suppose your numbers are in A1:A1000. Put the following formula into B1 and copy down through B10:
=PERCENTILE(A:A,ROWS($A$1:A1)/10)
That establishes the values for the 10%ile, 20%ile, etc.
Although that does ensure at least 1 number in each percentile, it does not reliably produce reasonable results if there is a "significant number" of repeated numbers. What is "significant"? It's a lot like "most similar": it is difficult to define rigorously.
Case in point.... I have a list of more than 1800 numbers ranging from 1 to 1554; measurements of gaps in a series of numbers. I used the above paradigm to look at the distribution. The results are correct; but they surprised me. The 10th through 70th percentile are the same number (1); and the 80th and 90th percentile are another number (2).
Clearly that kind of "grouping" would not satisfy Maverik's requirements, no matter how he defines "most similar".
As it turns out, there are only 9 distinct numbers in my list. 79% are 1; about 15.6% are 2; less than 0.05% are 3 and 4; and there is one each of 5, 43, 63, 288 and 1554.
Once I realized that lopsided distribution, obviously the 9-cell array formula =FREQUENCY(A:A,{1,2,3,4,5,43,63,288,1554}) produced the more interesting "groupings".
I think that is an unusual example. Nevertheless, as always, it is best to understand the distribution before applying any method of partitioning.