How to determine data distribution in fields

Carlton Patterson 21 Reputation points
2021-02-06T15:32:04.36+00:00

Hello Community,

I have couple samples below showing howt get determine the data distribution of a record in a field.

For example the following code will show that record, 'FIESTA' in the 'model' field represents 59.52% of all the records

SELECT
  vehicles.model
 ,COUNT(*) AS cnt
 ,CAST(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS DEC(5, 2)) AS distribution
FROM dbo.vehicles
GROUP BY vehicles.model
ORDER BY distribution DESC

Likewise the following code will show that record '1.7' in the 'engine_size' field represents 15.08% of all records in the 'engine_size' field

SELECT
  vehicles.engine_size
 ,COUNT(*) AS cnt
 ,CAST(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS DEC(5, 2)) AS distribution
FROM dbo.vehicles
GROUP BY vehicles.engine_size
ORDER BY distribution DESC

Can someone let me know how to combine the code samples above to allow me to get the distribution of both the 'model' and 'engine_size'?

The sample data follows:

CREATE TABLE vehicles (
     registration varchar(50),
     make varchar(50),
     model varchar(50),
     engine_size float)

 INSERT vehicles VALUES
 ('JjFw5a0','SKODA','OCTAVIA',1.8),
 ('VkfCDpZ','FORD','FIESTA',1.7),
 ('5E93ZEq','SKODA','OCTAVIA',1.3),
 ('L2PPN0m','FORD','FIESTA',1.1),
 ('9xKghxp','FORD','FIESTA',1.5),
 ('WHShdBm','FORD','FIESTA',1.4),
 ('TNRHyy7','NISSAN','QASHQAI',1.2),
 ('6RNX0XG','SKODA','OCTAVIA',1.4),
 ('tJ9bOD8','FORD','FIESTA',1.1),
 ('ablFUSC','FORD','FIESTA',1),
 ('4B7RLYL','MERCEDED_BENZ','E CLASS',1.3),
 ('tlJiwVY','FORD','FIESTA',1),
 ('Fb9lcvG','FORD','FIESTA',1.4),
 ('nW4lqBC','FORD','FIESTA',1.6),
 ('LggTmL5','HYUNDAI','I20',1),
 ('2mGgSjS','FORD','FIESTA',1.1),
 ('IDvOzcM','FORD','FIESTA',1.3),
 ('JefpXK2','FORD','FIESTA',1.5),
 ('0h1uWfZ','MERCEDED_BENZ','E CLASS',1.4),
 ('ylBoGbV','MERCEDED_BENZ','E CLASS',1.7),
 ('XzoILDK','VAUXHALL','CORSA',1.8),
 ('Xhocs1Z','FORD','FIESTA',1.5),
 ('Lh2yWGa','KIA','RIO',1.5),
 ('hM5GWA0','FORD','FIESTA',1.3),
 ('PbpxkFt','FORD','FIESTA',1.7),
 ('SDHWV2r','FORD','FIESTA',1.2),
 ('n83Je2D','FORD','FIESTA',1.8),
 ('sDN0gex','FORD','FIESTA',1.2),
 ('7EICOZY','KIA','RIO',1.5),
 ('PUuMmIH','FORD','FIESTA',1),
 ('HiBwSg2','FORD','FIESTA',1.8),
 ('1yk1vDm','KIA','RIO',1.7),
 ('cMpH72R','HYUNDAI','I20',1.1),
 ('ZgQL0gt','MERCEDED_BENZ','E CLASS',1.3),
 ('jhpamQG','KIA','RIO',1.1),
 ('pk0lU2F','VAUXHALL','CORSA',1.4),
 ('fDCUeq1','FORD','FIESTA',1.1),
 ('ono5QFC','FORD','FIESTA',1.7),
 ('VohWwGR','FORD','FIESTA',1.5),
 ('Hih8dKc','SUZUKI','SWIFT',1.2),
 ('D2RNn3h','SUZUKI','SWIFT',1.2),
 ('QaYQulE','FORD','FIESTA',1.1),
 ('xmQPxAG','FORD','FIESTA',1.8),
 ('vmTqkTO','FORD','FIESTA',1.2),
 ('lvUtVUA','MERCEDED_BENZ','E CLASS',1),
 ('SFoj00d','FORD','FIESTA',1),
 ('9S6wrWV','MERCEDED_BENZ','E CLASS',1),
 ('0SBnW0z','FORD','FIESTA',1.1),
 ('HnDHdfj','MERCEDED_BENZ','E CLASS',1),
 ('RV7q947','FORD','FIESTA',1.4),
 ('JZqCtTg','FORD','FIESTA',1.7),
 ('XVgBwgi','FORD','FIESTA',1.8),
 ('iqJDsIF','FORD','FIESTA',1.6),
 ('CMbpRFa','FORD','FIESTA',1.6),
 ('vF7K5Xg','SUZUKI','SWIFT',1.1),
 ('3j6XGDH','FORD','FIESTA',1.5),
 ('ommqugM','FORD','FIESTA',1.1),
 ('LMQkPnw','NISSAN','QASHQAI',1.4),
 ('1dKgcdd','FORD','FIESTA',1.5),
 ('hC8BxiP','MERCEDED_BENZ','E CLASS',1.1),
 ('wLTWol7','FORD','FIESTA',1.6),
 ('TY8ChYN','FORD','FIESTA',1.6),
 ('Gw1CpI8','FORD','FIESTA',1.4),
 ('L4OPAJq','FORD','FIESTA',1.1),
 ('6TyYpfi','NISSAN','QASHQAI',1.6),
 ('ozoOcGL','FORD','FIESTA',1.4),
 ('6IME19U','FORD','FIESTA',1.4),
 ('BxpmJO5','FORD','FIESTA',1.4),
 ('0zc2n5A','FORD','FIESTA',1.3),
 ('FqbBZE2','FIAT','500',1.7),
 ('2EkTOTz','FORD','FIESTA',1.4),
 ('fNBvIvg','MERCEDED_BENZ','C CLASS',1.2),
 ('u5j4R4S','KIA','RIO',1.4),
 ('zpWaUZo','FORD','FIESTA',1.1),
 ('FQPVQYc','NISSAN','QASHQAI',1.7),
 ('8RBQADq','KIA','RIO',1.7),
 ('TOz2bcT','HYUNDAI','I20',1.7),
 ('jebhCex','FORD','FIESTA',1.3),
 ('cdHA1gL','FORD','FIESTA',1.2),
 ('FoaN4AT','FORD','FIESTA',1.7),
 ('atGn288','FORD','FIESTA',1.5),
 ('es8VNdW','FIAT','500',1.3),
 ('hDWoMXa','KIA','RIO',1.4),
 ('Q9C6Br1','KIA','RIO',1.5),
 ('mFSy4aF','FORD','FIESTA',1.6),
 ('bbbKnrM','SKODA','OCTAVIA',1.5),
 ('qY7lz6I','FORD','FIESTA',1),
 ('8Ch2OeU','VAUXHALL','CORSA',1.3),
 ('dcWsjJv','VAUXHALL','CORSA',1.3),
 ('bnnoBPg','SKODA','OCTAVIA',1.8),
 ('mvDyYkK','FORD','FIESTA',1.4),
 ('KpWDYap','FORD','FIESTA',1.3),
 ('7EK9K4z','FORD','FIESTA',1.3),
 ('ZPLHtlP','FORD','FIESTA',1.6),
 ('4EpYeSB','FORD','FIESTA',1.6),
 ('O1eZ20M','FORD','FIESTA',1),
 ('WfVntKk','FORD','FIESTA',1.7),
 ('6VlkBdi','FORD','FIESTA',1.1),
 ('hFQfKjk','KIA','RIO',1.4),
 ('3Y4njNP','KIA','RIO',1),
 ('3UuNqG0','FORD','FIESTA',1.7),
 ('qpvMYAu','FORD','FIESTA',1.1),
 ('NCYJUqx','FORD','FIESTA',1.3),
 ('M0AvWzg','FORD','FIESTA',1.6),
 ('XbVmtFf','FORD','FIESTA',1.3),
 ('l8qZy0H','SKODA','OCTAVIA',1.3),
 ('EDUbxaU','MERCEDED_BENZ','E CLASS',1.6),
 ('nWLd82o','FORD','FIESTA',1.7),
 ('4AkoyWx','FORD','FIESTA',1),
 ('nOoO25v','FORD','FIESTA',1.3),
 ('VAm5aV8','NISSAN','QASHQAI',1.4),
 ('zbd3cie','FORD','FIESTA',1.5),
 ('hyAN71W','NISSAN','QASHQAI',1),
 ('FxACHDf','FIAT','500',1.7),
 ('wOZdaeV','FORD','FIESTA',1.6),
 ('gfxZl99','VAUXHALL','CORSA',1.1),
 ('06HhwEJ','SKODA','OCTAVIA',1.7),
 ('PCTgYiG','KIA','RIO',1.7),
 ('U54WXZQ','KIA','RIO',1.6),
 ('FHgrRiF','FORD','FIESTA',1.6),
 ('R3jP73p','SKODA','OCTAVIA',1.5),
 ('etVPKX9','SUZUKI','SWIFT',1.1),
 ('BE3yReB','FORD','FIESTA',1.7),
 ('zXmX878','FORD','FIESTA',1.6),
 ('wdM3P2m','FORD','FIESTA',1.7),
 ('tb727BM','FORD','FIESTA',1.1)

Thanks

Carlton

Developer technologies | Transact-SQL
Developer technologies | Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
{count} votes

Answer accepted by question author
  1. Viorel 125.7K Reputation points
    2021-02-07T20:31:54.487+00:00

    If you really need such results, then try this query:

    ;
    with M as
    (
        select model, 
            count(*) as model_cnt, 
            count(*) * 100.0 / (select count(*) from vehicles) as model_distribution
        from vehicles
        group by model
    ),
    MN as
    (
        select *, row_number() over (order by model_cnt desc) as rn
        from M
    ),
    S as
    (
        select engine_size, 
            count(*) as engine_size_cnt, 
            count(*) * 100.0 / (select count(*) from vehicles) as engine_size_distribution
        from vehicles
        group by engine_size
    ),
    SN as
    (
        select *, row_number() over (order by engine_size_cnt desc) as rn
        from S
    )
    select model, engine_size, model_cnt, engine_size_cnt, 
        cast( model_distribution as decimal(30, 2)), 
        cast( engine_size_distribution as decimal(30, 2))
    from MN
    full outer join SN on SN.rn = MN.rn
    

9 additional answers

Sort by: Most helpful
  1. Carlton Patterson 761 Reputation points
    2021-02-10T08:54:48.093+00:00

    On side note, can someone let me know why I'm getting email notifications to email address ******@olvin.com?

    0 comments No comments

  2. Carlton Patterson 761 Reputation points
    2021-02-10T09:03:12.363+00:00

    Hi Melissa,

    I have checked the link you provided regarding email notifications. My email notifications is set to ******@keyloop.co.uk,see image, but they're going to ******@olvin.com.

    66337-email.png

    I don't understand


  3. Carlton Patterson 761 Reputation points
    2021-02-10T09:15:58.493+00:00

    Hi Melissa,

    Email notifications are still going to ******@olvin.com ... so weird


  4. Carlton Patterson 761 Reputation points
    2021-02-10T09:37:32.337+00:00

    Hi Melissa,

    I'm now logged in as CarltonPatterson-4841, and my profile is set for email notifications to go to ******@keyloop.co.uk, but they're still going to ******@olvin.com .... I don't understand

    In the meantime, your solution really worked for my purposes

    I have a few questions.

    Can you let me know what the following sets out to achieve?

    SELECT ID = IDENTITY(INT,1,1),

    Also, could this have been achieved using sub queries?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.