Divide Records into Clusters

Balasaheb Molawade 136 Reputation points
2024-09-01T07:04:26.3566667+00:00

Hi,

We have a requirement to create clusters of records based on revenue distribution and geographic boundaries. We initially used the K-means algorithm, but it did not yield the desired results for our use case.

For example, if we have 100 records and need to divide them into 4 clusters or segments, where the revenue in each cluster is nearly equal while also respecting geographic boundaries, is there any sample code available to create proper clusters that take these boundaries into account? For instance, if we consider postal codes as geographic boundaries, each cluster should have a unique postal code. If two clusters share records within the same postal code, then all records from that postal code should be assigned to one cluster, ensuring that all clusters are nearly equal.

Thanks!

Azure Maps
Azure Maps
An Azure service that provides geospatial APIs to add maps, spatial analytics, and mobility solutions to apps.
711 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. IoTGirl 3,126 Reputation points Microsoft Employee
    2024-09-03T17:35:31.55+00:00

    Hi Babasaheb,

    This is not a maps problem perse but rather a data configuration / mathematical choice. If you have a datastore, you should be able to define these clusters before ever displaying them. If you don't mind incurring the overhead, you should be able to do a point in poly calculation on the fly to get what you need for the boundary scenario.

    Here is a simple sample that filters on a drawn polygon https://samples.azuremaps.com/drawing-tools-module/select-data-in-drawn-polygon-area but your sample should be even easier than this. With only 100 records you could probably just do this in Excel with some simple math and filters without writing any code.

    Sincerely,

    IoTGirl

    0 comments No comments

  2. rbrundritt 17,896 Reputation points Microsoft Employee
    2024-09-03T18:08:16.07+00:00

    If you want to cluster by "postal code", and you have the postal code in your records, then this is a simple "group by" scenario based on the postal code string and not something that requires any geospatial calculations. It would be the same grouping logic that could be used on any property in the records. Here is a simple example:

    var records = [
    	{ id: 1, postalCode: "90210" },
    	{ id: 2, postalCode: "98052" },
    	{ id: 3, postalCode: "90210" },
    	{ id: 4, postalCode: "98052" },
    	{ id: 5, postalCode: "98052" },
    	{ id: 6, postalCode: "90210" },
    	{ id: 7, postalCode: "90210" },
    ];
    
    var clusters = {}; //key = postalcCode, value = [ids]
    
    //Group the records into clusters.
    records.forEach(r => {
    	//See if a cluster already exists for the postal code.
    	if(clusters[r.postalCode]){
    		//Add the id to the list of items in the cluster.
    		clusters[r.postalCode].push(r.id);
    	} else {
    		//Create a new cluster and add the record id.
    		clusters[r.postalCode] = [r.id];
    	}
    });
    
    //Output the cluster information.
    Object.keys(clusters).forEach(key => {
    	console.log(`Cluster for postal code ${key}, contains records: ${clusters[key]}`);
    });
    

    This would output:

    Cluster for postal code 90210, contains records: 1,3,6,7
    Cluster for postal code 98052, contains records: 2,4,5
    

    If you don't have the postal code information, then you would either need to geocode your records ahead of time or retrieve the postal code boundaries that intersect with your records (ideally you would retrieve a unique postal code only once, then have other records check to see if they intersect).


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.