Matching congruent datasets in T-SQL

pmscorca 1,052 Reputation points
2023-10-03T05:55:33.8266667+00:00

Hi,

I need two tables having congruent dataset to match between them.

The first table is named "Diagnosis_groups":

CREATE TABLE [dbo].[Diagnosis_groups]( 	[Diagnosis_group_code] [nvarchar](10) NOT NULL, 	[Diagnosis_code] [nvarchar](10) NOT NULL, 	[Diagnosis_order] [smallint] NULL,  CONSTRAINT [PK_Diagnosis_groups] PRIMARY KEY CLUSTERED  ( 	[Diagnosis_group_code] ASC, 	[Diagnosis_code] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY]

the second table is named "Hospital_diagnosis":

CREATE TABLE [dbo].[Hospital_diagnosis]( 	[Hospitalization_code] [nvarchar](10) NOT NULL, 	[Diagnosis_code] [nvarchar](10) NOT NULL, 	[Diagnosis_order] [smallint] NULL,  CONSTRAINT [PK_Hospital_diagnosis] PRIMARY KEY CLUSTERED  ( 	[Hospitalization_code] ASC, 	[Diagnosis_code] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY]

I need to match these data:

User's image

User's image

For example, all diagnosis of HD1 code have to match with the all diagnosis of DG1 and DG2 codes. The diagnosis order is not important, while the number of a diagnosis group is important: the HD1 diagnosis haven't match with the DG3 diagnosis, because DG3 diagnosis group has 3 diagnosis and not 2 ones.

It is important that each diagnosis of an HD code matches with each diagnosis of a DG code.

Now, any suggests to write the right T-SQL statement to implement a such matching, please?

Many thanks

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
14,494 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Wilko van de Velde 2,231 Reputation points
    2023-10-03T06:38:54.7366667+00:00

    Hi @pmscorca ,

    Hope I understood your question correctly, I think this will be your solution:

    DROP TABLE IF EXISTS #Diagnosis_groups 
    DROP TABLE IF EXISTS #Hospital_diagnosis
    
    CREATE TABLE #Diagnosis_groups ( 	[Diagnosis_group_code] [nvarchar](10) NOT NULL, 	[Diagnosis_code] [nvarchar](10) NOT NULL, 	[Diagnosis_order] [smallint] NULL,  CONSTRAINT [PK_Diagnosis_groups] PRIMARY KEY CLUSTERED  ( 	[Diagnosis_group_code] ASC, 	[Diagnosis_code] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
    CREATE TABLE #Hospital_diagnosis( 	[Hospitalization_code] [nvarchar](10) NOT NULL, 	[Diagnosis_code] [nvarchar](10) NOT NULL, 	[Diagnosis_order] [smallint] NULL,  CONSTRAINT [PK_Hospital_diagnosis] PRIMARY KEY CLUSTERED  ( 	[Hospitalization_code] ASC, 	[Diagnosis_code] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
    
    
    INSERT INTO #Diagnosis_groups VALUES ('DG1', 'D1', 1)
    INSERT INTO #Diagnosis_groups VALUES ('DG1', 'D2', 2)
    INSERT INTO #Diagnosis_groups VALUES ('DG2', 'D1', 1)
    INSERT INTO #Diagnosis_groups VALUES ('DG2', 'D2', 2)
    INSERT INTO #Diagnosis_groups VALUES ('DG3', 'D1', 1)
    INSERT INTO #Diagnosis_groups VALUES ('DG3', 'D2', 2)
    INSERT INTO #Diagnosis_groups VALUES ('DG3', 'D3', 3)
    INSERT INTO #Diagnosis_groups VALUES ('DG4', 'D4', 1)
    INSERT INTO #Diagnosis_groups VALUES ('DG4', 'D1', 2)
    
    INSERT INTO #Hospital_diagnosis VALUES ('HD1', 'D1', 1)
    INSERT INTO #Hospital_diagnosis VALUES ('HD1', 'D2', 2)
    INSERT INTO #Hospital_diagnosis VALUES ('HD2', 'D1', 1)
    INSERT INTO #Hospital_diagnosis VALUES ('HD2', 'D2', 2)
    INSERT INTO #Hospital_diagnosis VALUES ('HD3', 'D1', 1)
    INSERT INTO #Hospital_diagnosis VALUES ('HD3', 'D2', 2);
    
    --Count records per Diagnosis_group_code
    WITH TMP_D as
    (
    	SELECT	Diagnosis_group_code, 
    			count(1) as [count]
    	FROM	#Diagnosis_groups
    	GROUP BY Diagnosis_group_code
    ),
    --Count records per [Hospitalization_code]
    TMP_H as
    (
    	SELECT	[Hospitalization_code], 
    			count(1) as [count]
    	FROM	#Hospital_diagnosis
    	GROUP BY [Hospitalization_code]
    ),
    --take original table and add the counter
    TMP_Diagnosis_groups as
    (
    	SELECT	d.*,
    			td.[count]
    	FROM	#Diagnosis_groups d
    	LEFT JOIN TMP_D td
    	on		d.[Diagnosis_group_code] = td.[Diagnosis_group_code]
    ),
    --take original table and add the counter
    TMP_Hospital_diagnosis as
    (
    	SELECT	h.*,
    			th.[count]
    	FROM	#Hospital_diagnosis h
    	LEFT JOIN TMP_H th
    	on		h.[Hospitalization_code] = th.[Hospitalization_code]
    )
    
    SELECT	*
    FROM	TMP_Hospital_diagnosis h 
    INNER JOIN	
    		TMP_Diagnosis_groups d
    ON		
    		-- Match where the number of [Hospitalization_code] and [Diagnosis_group_code] is the samen
    		RIGHT(h.[Hospitalization_code],1) = RIGHT(d.[Diagnosis_group_code],1) 
    AND	
    		--And where there are the same number of records
    		h.[count] = d.[count]
    

    Kind regards, Wilko


    Please do not forget to "Accept the answer” wherever the information provided helps you, this can be beneficial to other community members. If you have extra questions about this answer, please click "Comment".


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.