How to determine data distribution in fields

Carlton Patterson 21 Reputation points
2021-02-06T15:32:04.36+00:00

Hello Community,

I have couple samples below showing howt get determine the data distribution of a record in a field.

For example the following code will show that record, 'FIESTA' in the 'model' field represents 59.52% of all the records

SELECT
  vehicles.model
 ,COUNT(*) AS cnt
 ,CAST(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS DEC(5, 2)) AS distribution
FROM dbo.vehicles
GROUP BY vehicles.model
ORDER BY distribution DESC

Likewise the following code will show that record '1.7' in the 'engine_size' field represents 15.08% of all records in the 'engine_size' field

SELECT
  vehicles.engine_size
 ,COUNT(*) AS cnt
 ,CAST(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS DEC(5, 2)) AS distribution
FROM dbo.vehicles
GROUP BY vehicles.engine_size
ORDER BY distribution DESC

Can someone let me know how to combine the code samples above to allow me to get the distribution of both the 'model' and 'engine_size'?

The sample data follows:

CREATE TABLE vehicles (
     registration varchar(50),
     make varchar(50),
     model varchar(50),
     engine_size float)

 INSERT vehicles VALUES
 ('JjFw5a0','SKODA','OCTAVIA',1.8),
 ('VkfCDpZ','FORD','FIESTA',1.7),
 ('5E93ZEq','SKODA','OCTAVIA',1.3),
 ('L2PPN0m','FORD','FIESTA',1.1),
 ('9xKghxp','FORD','FIESTA',1.5),
 ('WHShdBm','FORD','FIESTA',1.4),
 ('TNRHyy7','NISSAN','QASHQAI',1.2),
 ('6RNX0XG','SKODA','OCTAVIA',1.4),
 ('tJ9bOD8','FORD','FIESTA',1.1),
 ('ablFUSC','FORD','FIESTA',1),
 ('4B7RLYL','MERCEDED_BENZ','E CLASS',1.3),
 ('tlJiwVY','FORD','FIESTA',1),
 ('Fb9lcvG','FORD','FIESTA',1.4),
 ('nW4lqBC','FORD','FIESTA',1.6),
 ('LggTmL5','HYUNDAI','I20',1),
 ('2mGgSjS','FORD','FIESTA',1.1),
 ('IDvOzcM','FORD','FIESTA',1.3),
 ('JefpXK2','FORD','FIESTA',1.5),
 ('0h1uWfZ','MERCEDED_BENZ','E CLASS',1.4),
 ('ylBoGbV','MERCEDED_BENZ','E CLASS',1.7),
 ('XzoILDK','VAUXHALL','CORSA',1.8),
 ('Xhocs1Z','FORD','FIESTA',1.5),
 ('Lh2yWGa','KIA','RIO',1.5),
 ('hM5GWA0','FORD','FIESTA',1.3),
 ('PbpxkFt','FORD','FIESTA',1.7),
 ('SDHWV2r','FORD','FIESTA',1.2),
 ('n83Je2D','FORD','FIESTA',1.8),
 ('sDN0gex','FORD','FIESTA',1.2),
 ('7EICOZY','KIA','RIO',1.5),
 ('PUuMmIH','FORD','FIESTA',1),
 ('HiBwSg2','FORD','FIESTA',1.8),
 ('1yk1vDm','KIA','RIO',1.7),
 ('cMpH72R','HYUNDAI','I20',1.1),
 ('ZgQL0gt','MERCEDED_BENZ','E CLASS',1.3),
 ('jhpamQG','KIA','RIO',1.1),
 ('pk0lU2F','VAUXHALL','CORSA',1.4),
 ('fDCUeq1','FORD','FIESTA',1.1),
 ('ono5QFC','FORD','FIESTA',1.7),
 ('VohWwGR','FORD','FIESTA',1.5),
 ('Hih8dKc','SUZUKI','SWIFT',1.2),
 ('D2RNn3h','SUZUKI','SWIFT',1.2),
 ('QaYQulE','FORD','FIESTA',1.1),
 ('xmQPxAG','FORD','FIESTA',1.8),
 ('vmTqkTO','FORD','FIESTA',1.2),
 ('lvUtVUA','MERCEDED_BENZ','E CLASS',1),
 ('SFoj00d','FORD','FIESTA',1),
 ('9S6wrWV','MERCEDED_BENZ','E CLASS',1),
 ('0SBnW0z','FORD','FIESTA',1.1),
 ('HnDHdfj','MERCEDED_BENZ','E CLASS',1),
 ('RV7q947','FORD','FIESTA',1.4),
 ('JZqCtTg','FORD','FIESTA',1.7),
 ('XVgBwgi','FORD','FIESTA',1.8),
 ('iqJDsIF','FORD','FIESTA',1.6),
 ('CMbpRFa','FORD','FIESTA',1.6),
 ('vF7K5Xg','SUZUKI','SWIFT',1.1),
 ('3j6XGDH','FORD','FIESTA',1.5),
 ('ommqugM','FORD','FIESTA',1.1),
 ('LMQkPnw','NISSAN','QASHQAI',1.4),
 ('1dKgcdd','FORD','FIESTA',1.5),
 ('hC8BxiP','MERCEDED_BENZ','E CLASS',1.1),
 ('wLTWol7','FORD','FIESTA',1.6),
 ('TY8ChYN','FORD','FIESTA',1.6),
 ('Gw1CpI8','FORD','FIESTA',1.4),
 ('L4OPAJq','FORD','FIESTA',1.1),
 ('6TyYpfi','NISSAN','QASHQAI',1.6),
 ('ozoOcGL','FORD','FIESTA',1.4),
 ('6IME19U','FORD','FIESTA',1.4),
 ('BxpmJO5','FORD','FIESTA',1.4),
 ('0zc2n5A','FORD','FIESTA',1.3),
 ('FqbBZE2','FIAT','500',1.7),
 ('2EkTOTz','FORD','FIESTA',1.4),
 ('fNBvIvg','MERCEDED_BENZ','C CLASS',1.2),
 ('u5j4R4S','KIA','RIO',1.4),
 ('zpWaUZo','FORD','FIESTA',1.1),
 ('FQPVQYc','NISSAN','QASHQAI',1.7),
 ('8RBQADq','KIA','RIO',1.7),
 ('TOz2bcT','HYUNDAI','I20',1.7),
 ('jebhCex','FORD','FIESTA',1.3),
 ('cdHA1gL','FORD','FIESTA',1.2),
 ('FoaN4AT','FORD','FIESTA',1.7),
 ('atGn288','FORD','FIESTA',1.5),
 ('es8VNdW','FIAT','500',1.3),
 ('hDWoMXa','KIA','RIO',1.4),
 ('Q9C6Br1','KIA','RIO',1.5),
 ('mFSy4aF','FORD','FIESTA',1.6),
 ('bbbKnrM','SKODA','OCTAVIA',1.5),
 ('qY7lz6I','FORD','FIESTA',1),
 ('8Ch2OeU','VAUXHALL','CORSA',1.3),
 ('dcWsjJv','VAUXHALL','CORSA',1.3),
 ('bnnoBPg','SKODA','OCTAVIA',1.8),
 ('mvDyYkK','FORD','FIESTA',1.4),
 ('KpWDYap','FORD','FIESTA',1.3),
 ('7EK9K4z','FORD','FIESTA',1.3),
 ('ZPLHtlP','FORD','FIESTA',1.6),
 ('4EpYeSB','FORD','FIESTA',1.6),
 ('O1eZ20M','FORD','FIESTA',1),
 ('WfVntKk','FORD','FIESTA',1.7),
 ('6VlkBdi','FORD','FIESTA',1.1),
 ('hFQfKjk','KIA','RIO',1.4),
 ('3Y4njNP','KIA','RIO',1),
 ('3UuNqG0','FORD','FIESTA',1.7),
 ('qpvMYAu','FORD','FIESTA',1.1),
 ('NCYJUqx','FORD','FIESTA',1.3),
 ('M0AvWzg','FORD','FIESTA',1.6),
 ('XbVmtFf','FORD','FIESTA',1.3),
 ('l8qZy0H','SKODA','OCTAVIA',1.3),
 ('EDUbxaU','MERCEDED_BENZ','E CLASS',1.6),
 ('nWLd82o','FORD','FIESTA',1.7),
 ('4AkoyWx','FORD','FIESTA',1),
 ('nOoO25v','FORD','FIESTA',1.3),
 ('VAm5aV8','NISSAN','QASHQAI',1.4),
 ('zbd3cie','FORD','FIESTA',1.5),
 ('hyAN71W','NISSAN','QASHQAI',1),
 ('FxACHDf','FIAT','500',1.7),
 ('wOZdaeV','FORD','FIESTA',1.6),
 ('gfxZl99','VAUXHALL','CORSA',1.1),
 ('06HhwEJ','SKODA','OCTAVIA',1.7),
 ('PCTgYiG','KIA','RIO',1.7),
 ('U54WXZQ','KIA','RIO',1.6),
 ('FHgrRiF','FORD','FIESTA',1.6),
 ('R3jP73p','SKODA','OCTAVIA',1.5),
 ('etVPKX9','SUZUKI','SWIFT',1.1),
 ('BE3yReB','FORD','FIESTA',1.7),
 ('zXmX878','FORD','FIESTA',1.6),
 ('wdM3P2m','FORD','FIESTA',1.7),
 ('tb727BM','FORD','FIESTA',1.1)

Thanks

Carlton

Developer technologies | Transact-SQL
Developer technologies | Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
{count} votes

Answer accepted by question author
  1. Viorel 125.7K Reputation points
    2021-02-07T20:31:54.487+00:00

    If you really need such results, then try this query:

    ;
    with M as
    (
        select model, 
            count(*) as model_cnt, 
            count(*) * 100.0 / (select count(*) from vehicles) as model_distribution
        from vehicles
        group by model
    ),
    MN as
    (
        select *, row_number() over (order by model_cnt desc) as rn
        from M
    ),
    S as
    (
        select engine_size, 
            count(*) as engine_size_cnt, 
            count(*) * 100.0 / (select count(*) from vehicles) as engine_size_distribution
        from vehicles
        group by engine_size
    ),
    SN as
    (
        select *, row_number() over (order by engine_size_cnt desc) as rn
        from S
    )
    select model, engine_size, model_cnt, engine_size_cnt, 
        cast( model_distribution as decimal(30, 2)), 
        cast( engine_size_distribution as decimal(30, 2))
    from MN
    full outer join SN on SN.rn = MN.rn
    

9 additional answers

Sort by: Most helpful
  1. Carlton Patterson 761 Reputation points
    2021-02-10T09:46:31.617+00:00

    Hi Melissa

    I think I have fixed the email issue.

    I have already accepted an answer. Is it possible to accept your answer as well

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.