Hello David,
Your observations about BCryptDeriveKeyCapi are correct and are consistent with the documentation of CryptDeriveKey; the "Remarks" section of CryptDeriveKey describes the steps taken to derive a key and for stream ciphers this just amounts to completing the hash (BCryptDeriveKeyCapi really adds value when porting legacy symmetric block cipher code).
I first tried comparing the ciphertext output of the legacy and CNG code. CryptDeriveKey only generates RC4 keys with lengths between 40 and 128 bits (lengths outside of this range fail with error NTE_BAD_FLAGS). The output matched for all lengths except 40 bits.
There is an RFC (RFC 6229 - Test Vectors for the Stream Cipher RC4) that contains test vectors for 40 bit RC4 keys. The output of the CNG routines match the documented results but the legacy routines produce a different result (the legacy routines match the documented results for the key lengths 56, 64, 80 and 128).
Unlikely as it may seem, I think that there might be a bug in the legacy (Microsoft) code when using 40 bit keys...
Update: It is not a bug but rather "by design" behaviour - still trying to work out how to control it.
Update 2: The Flags parameter to both CryptDeriveKey and CryptImportKey support the value CRYPT_NO_SALT. The description of this value is:
A no-salt value gets allocated for a 40-bit symmetric key. For more information, see Salt Value Functionality.
If your old code uses CryptDeriveKey without including the flag CRYPT_NO_SALT then 11 zero bytes are used as the salt. If the same salt bytes are appended to the CNG key then BCryptEncrypt produces the same ciphertext.
Gary