by definition copying memory is cpu bound. the cpu doing the copy, can not do anything else. also memory access is one of the slowest cpu instructions. if you use threads, don't use more than the machine has cores or it will be slower.
you would a custom hardware device to perform DMA like transfers.
note: Buffer.MemoryCopy is supposed to use c's memcpy under the covers, so its the fastest. but why are you moving memory. why not a ring buffer or better solution.