Becaues optimization is not a simple affair.
This the model: the optimizer makes estimates of what is the best places from statistics that in most cases have been sampled from the data. In these estimations, it makes a number of assumptions for things not captured in the statistics. Consider for instance this WHERE clause:
WHERE a = 1
AND b = 2
AND c = 3
From the statistics, the optimizer can get some ideas about how many rows each condition alone will hit. But it does not know what happens when you combine them. They could be entirely uncorrelated, which means that quite a few rows will be hit by the combined condition. But they may also be closely correlatad, in which case quite a lot more rows will be hit.
And this is just one example. And, incidently, this is also an area where the old and new cardinality estimator makes different assumptions. The old one assumes total independence, if I remember correctly, whereas the new assumes some level of correlation.
Now, how well these assumptions work depends on your data. If in your case, the conditions are entirely uncorrelated, the old optimizer will work better.
And, generally, while thie new CE makes a better work in most cases, there are also cases where it backfires. Not the least in cases where the ole CE produced a good plan by chance, even if there qurery or the indexes were not the best.