My query produces to multiple duplicate results, any ideas?

Simflex 326 Reputation points
2024-02-04T19:28:31.75+00:00

We have a survey of 8 questions. These 8 questions are broken down into 8 database tables as the query below shows. Each question can have multiple results. For instance, it is like using dynamically generated rows where an employee enters an answer into one row, clicks to to add another row and provides more answers. These tables are related by EmployeeID. In other words, the employee information is stored in Employees table with auto generated EmployeeID and each employeeID is also stored in each of the 8 questions or tables, The following screenshot shows the results of each survey question for an employee with ID of 7650. Yet, the query below produces multiple duplicate rows. I am not sure if there is a flaw in my DB design. Here is the screenshot. of query run for each question or table and the results of each question. results

The above screenshot shows when query is run for each table, For instance, select sourcename, sourceaddress from sourceDetails s inner join dateDetails d on s.EmployeeID=d.EmployeeID WHERE d.dateCreated >= '20240125' and d.dateCreated < '20240531' and e.employeeID = 7650

Here is the code we currently use that is producing the duplicate rows.:

SELECT e.employeeID,s.sourcename, s.sourceaddress,
            sp.spousename,sp.spouseaddress,
            dv.dividentName, dv.dividentAddress,
            r.reimbursementName, r.reimbursementAddress,
            h.HonorariaName,h.HonorariaAddress,
            g.giftName,g.giftAddress,
            o.orgName,o.orgAddress,
            cr.creditorName,cr.creditorAddress
From Employees e
            INNER JOIN  SourceDetails s ON e.EmployeeID = s.EmployeeID
            INNER Join  SpouseDetails sp ON e.EmployeeID = sp.employeeID
            INNER JOIN  org o ON e.employeeID = o.employeeID
            INNER Join  DividentDetails dv ON e.EmployeeID = dv.EmployeeID
            INNER JOIN  ReimbursementDetails r ON e.EmployeeID = r.employeeID
            INNER Join  Honoraria h ON e.EmployeeID = h .EmployeeID
            INNER JOIN  GiftDetails g ON e.EmployeeID = g.employeeID
            INNER Join  dateDetails d ON e.EmployeeID = d.employeeID
            INNER JOIN  creditorDetails cr ON e.employeeID = cr.employeeID
WHERE d.dateCreated >= '20240125' and d.dateCreated < '20240531' and e.employeeID = 7650


I tried using DISTINCT. I also tried GROUP BY but the query keeps displaying many duplicate records. Could you experts help figure out what I am doing wrong please? Many thanks in advance.

Developer technologies | VB
SQL Server | Other
0 comments No comments
{count} votes

Answer accepted by question author
  1. Peter Fleischer (former MVP) 19,341 Reputation points
    2024-02-08T08:07:28.9666667+00:00

    Hi,
    in SQL Server you can concatenate values to one string with following sql:

    ;WITH u AS (
    SELECT EmployeeID AS ID, STRING_AGG('Source: ' + SourceName + '=' + SourceAddress, ', ') AS Col FROM SourceDetails GROUP BY EmployeeID
    UNION ALL
    SELECT EmployeeID AS ID, STRING_AGG('Spouse: ' + SpouseName + '=' + SpouseAddress, ', ') AS Col FROM SpouseDetails GROUP BY EmployeeID
    UNION ALL
    SELECT EmployeeID AS ID, STRING_AGG('Divident: ' + DividentName + '=' + DividentAddress,', ') AS Col FROM DividentDetails GROUP BY EmployeeID
    )
    SELECT ID, STRING_AGG(Col,'; ') AS Result FROM u
    GROUP BY ID
    

    Result string in Result column::

    Divident: Dan Silver=2 Silverade Street, Divident: Kenny Lofton=! Athens University Way, Divident: Paul Fredrick=1 best mens clothing co; Source: John Doe=123 ABC rd, Source: Angel Gabriel=109 Paulica way, Source: Mike Pence=120 hang Mike Avenue; Spouse: Terrel Owns=30 San fran Dr, Spouse: Jerry Rice=351 Catches Way, Spouse: Joe Montana=1 Quarterback Avenue
    

    After splitting result string you get this:

    Divident: Dan Silver=2 Silverade Street, 
    Divident: Kenny Lofton=! Athens University Way, 
    Divident: Paul Fredrick=1 best mens clothing co; 
    Source: John Doe=123 ABC rd, 
    Source: Angel Gabriel=109 Paulica way, 
    Source: Mike Pence=120 hang Mike Avenue; 
    Spouse: Terrel Owns=30 San fran Dr, 
    Spouse: Jerry Rice=351 Catches Way, 
    Spouse: Joe Montana=1 Quarterback Avenue
    
    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Erland Sommarskog 129.6K Reputation points MVP Volunteer Moderator
    2024-02-04T19:56:10.97+00:00

    You are joining these eight tables on EmployeeID alone, and EmployeeID is not a unique key. Each row table 1 will join to each row with the same employee ID in the other rows. So that is 3x3x3x3x3x3x3x4 = 8748 rows.

    The query asks for data from all questions, so you get all combinations of sourcename/spousename/whatevername. That is not particularly meaningful.

    Then again, what is a meaningful output here? What result are you expecting? Do you have a clear idea of this?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.