Using Checksum to create random data sets
Here’s a quick tip if you don’t want to try to use random and figure out seed, etc. If you’ve got a lot of columns in a table, it is pretty likely that the checksum for all those data values are going to be random across your primary keys.
I recently needed to create a 5% random sample of my equities, so just ran the following query:
select TradingSymbol
from dbo.EquitySymbol
where EquityType=0
and CHECKSUM(*) % 20 = 0
That got me 367 out of about 7300 possible rows. If you’re doing random sampling, I suggest the following table structure.
Then, you can create a random sample by inserting a sample ID into Data Sample and then insert from your selection into the DataSampleValue table:
insert into olap.DataSample select 'Random Sample 5% for EquityType=0'
select @@IDENTITY
insert into olap.DataSampleEquity
select 18, TradingSymbol
from dbo.EquitySymbol
where EquityType=0
and CHECKSUM(*) % 20 = 0
select * from olap.DataSample
(No column name) | DataValue |
18 | AAII |
18 | AAON |
18 | ABC |
18 | ADGF |
18 | ADLS |
18 | ADP |
18 | AEC-B |
18 | AFCB |
18 | AFFM |
18 | AIMC |
18 | AIR |
18 | AKT |
18 | ALLB |
18 | ALZM |
18 | … |