Could it be, that the optimizer uses just one of the stats objects to get the value for EQ_ROWS and assumes they are the same for all objects starting with CLIENTID? And, that it would explain why it shows 1,00xxx as estimated rows for the CLIENTID (based on that particular stats object) whereas the INDX3 has a very different EQ_ROWS? If this is true, I can imagine the optimizer assumes this, after all, statistics are the thing it uses as being correct.
Yes, I think you are on to something there.
In the case there are multiple indexes with the same leading column, I would expect that the optimizer looks at all statistics for the density information. But I would only expect it to read one historgram. Reading more would only be confusing. (Just think if you were to optimize something, and you were given multiple histograms for the same data!)
One thing we should keep in mind: the optimizer reads statistics before it decides on indexes. so it may well read statistics for one index and then use a different one.
So which one statistics does it read? I don't know. I did some googling and checked some books in my bookshelf, but came back empty-handed. To compensate for this, I did some testing. By no means any thorough testing, so my results may be due to happenstance, and my conclusions may be incorrect.
What I seemed to see, though, is that if there are two statistics for the same column, it picks the one with the lowest modification counter (that is, the most recent), and if they are the same, it picks the one with the highest sample rate. This certainly makes sense. The fact that a statistics was created with FULLSCAN is of little relevance if data a lot has changed since it was produced.
I was actually able to compose a repro. To try it, you need the database BigDB, which you find at https://www.sommarskog.se/present/BigDB.bak. (It's a compress backup for SQL 2016 which expands to 20 GB.) First run this:
CREATE INDEX empid_shipper_ix ON TallOrders(EmployeeID, ShipVia)
CREATE INDEX empid_custid_ix ON TallOrders(EmployeeID, CustomerID)
UPDATE STATISTICS TallOrders empid_custid_ix
Note here that EmployeeID and ShipVia are int, while CustomerID is nchar(5), so for a query with only EmployeeID like the one below, the optimizer will always pick empid_shipper_ix since is more slender.
Run this query:
SELECT SUM(Freight) FROM TallOrders WHERE EmployeeID = 39
OPTION (RECOMPILE)
When I run it, the estimate matches the actual rows perfectly, because there is a step with 39 as RANGE_HI_KEY. Thus, statistics is taken from empid_shipper_ix. And, thus, it seems that the optimizer does not care that the statistics on empid_custid_ix was created later.
Now run:
INSERT TallOrders (OrderID, CustomerID, EmployeeID, OrderDate, RequiredDate, ShippedDate,
ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion,
ShipPostalCode, ShipCountry, Discount, Status, TotalAmount)
SELECT TOP 10000 OrderID + 1000000000, CustomerID, EmployeeID, OrderDate, RequiredDate, ShippedDate,
ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion,
ShipPostalCode, ShipCountry, Discount, Status, TotalAmount
FROM TallOrders
UPDATE STATISTICS TallOrders empid_custid_ix
Now try the query again:
SELECT SUM(Freight) FROM TallOrders WHERE EmployeeID = 39
OPTION (RECOMPILE)
The exact outcome depends on how the empid_custid_ix was sampled, but right now I see an estimate of 115138 rows which matches very well with RANGE_ROWS for a histogram step with 41 as RANGE_HI_KEY. Depending on your sampled statistics turns out, you may want to try a different value.