what the fastest way or logic to compare 240 milion rows from two tables ?

ahmed salah 3,216 Reputation points
2022-09-06T20:55:08.107+00:00

I work on sql server 2019 i have table name recomendations have 16 milions rows as below

CREATE TABLE [dbo].[Recomendations](  
 [Recomendation] [int] IDENTITY(1,1) NOT NULL,  
 [OriginalPartID] [int] NULL,  
 [AlternativePartId] [int] NULL,  
 CONSTRAINT [PK_Recomendations] PRIMARY KEY CLUSTERED   
(  
 [Recomendation] ASC  
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]  
) ON [PRIMARY]  
INSERT [dbo].[Recomendations] ([OriginalPartID], [AlternativePartId]) VALUES (122221, 998712)  

and when i need to get features for every original part
i get it from table original parts as below

  CREATE TABLE [dbo].[OriginalParts](  
 [OriginalPartId] [int] NOT NULL,  
 [FeatureId] [int] NOT NULL,  
 [FeatureValue] NOT NULL  
) ON [PRIMARY]  
  
GO  
SET ANSI_PADDING OFF  
GO  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555513, 9)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555514, 91)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555515, 215)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555516, 59)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555517, 49)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555518, 211)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555519, 321)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555520, 702)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555521, 302)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555522, 321)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555523, 2171)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555524, 3021)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555525, 1234)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555526, 501)  
INSERT [dbo].[OriginalParts] ([OriginalPartId], [FeatureId], [FeatureValue]) VALUES (122221, 1555527, 301)  

and when i need to get features for every alternative part
i get it from table Alternative parts as below

CREATE TABLE [dbo].[AlternativeParts](  
[AlternativePartID] [int] NOT NULL,  
[FeatureId] [int] NOT NULL,  
[FeatureValue] int NOT NULL  
) ON [PRIMARY]  
  
GO  
SET ANSI_PADDING OFF  
GO  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555513, 9)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555514, 91)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555515, 215)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555516, 59)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555517, 49)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555518, 211)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555519, 321)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555520, 702)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555521, 302)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555522, 321)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555523, 2171)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555524, 3021)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555525, 1234)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555526, 501)  
INSERT [dbo].[AlternativeParts] ([AlternativePartID], [FeatureId], [FeatureValue]) VALUES (998712, 1555527, 301)  

table [dbo].[OriginalParts] have 4000 partid and every part have same 15 feature id with different values
so count of rows on this table 4000 * 15 =60000

table [dbo].[AlternativeParts] have 4000 partid and every part have same 15 feature id with different values
so count of rows on this table 4000 * 15 =60000

4000 parts for original parts different from 4000 rows on alternative parts
so what is fastest way to compare values from original parts to alternative parts to 16 milion rows on table recomendations

what i try is

select r.Recomendation] ,r.OriginalPartID,r.AlternativePartId  from [dbo].[Recomendations] r  
inner join [dbo].[OriginalParts] o with(nolock) on o.OriginalPartId=r.OriginalPartId  
inner join [dbo].[AlternativeParts] a with(nolock) on a.AlternativePartId =r.AlternativePartId    
where o.featureid=a.featureid and a.featurevalue>=o.featurevalue  

but it take too much time
when do count it give toomuch rows it give me 240 milion rows

select count(1)  from [dbo].[Recomendations] r  
inner join [dbo].[OriginalParts] o with(nolock) on o.OriginalPartId=r.OriginalPartId  
inner join [dbo].[AlternativeParts] a with(nolock) on a.AlternativePartId =r.AlternativePartId    
where o.featureid=a.featureid and a.featurevalue>=o.featurevalue  

are there are another logic to be fast

table recomendations have 16 milion Recomendation (4000 distinct on OriginalPartID and 4000 distinct on AlternativePartId )
table [dbo].[OriginalParts] have 4000 part original distinct but every part have 15 feature so count rows 60000 rows
table [dbo].[AlternativeParts] have 4000 part original distinct but every part have 15 feature so count rows 60000 rows

every part from both tables [dbo].[OriginalParts] and [dbo].[AlternativeParts] have same 15 features but values may be different

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,654 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,551 questions
0 comments No comments
{count} votes

Accepted answer
  1. Guoxiong 8,126 Reputation points
    2022-09-06T21:53:31.667+00:00

    Did you try to add the indexes to both tables OriginalParts and AlternativeParts:

    CREATE NONCLUSTERED INDEX [IX_OriginalParts_OriginalPartId]  
        ON [dbo].[OriginalParts]([OriginalPartId] ASC) INCLUDE([FeatureId], [FeatureValue]);  
    GO  
      
    CREATE NONCLUSTERED INDEX [IX_AlternativeParts_AlternativePartId]  
        ON [dbo].[AlternativeParts]([AlternativePartId] ASC) INCLUDE([FeatureId], [FeatureValue]);  
    GO  
    

0 additional answers

Sort by: Most helpful