I don't have a demo to share, but I have seen this more than once. A parallel plan that runs for ages, add OPTION (MAXDOP 1) and boom! Query is instant.
In cases where I have seen this, the optimizer has gone wrong on partitioning the data between the threads, and one thread has ended up with all the data. And that with a plan which was designed for heavy crunching and take benefit parallelism. MAXDOP 1 has forced a less heavy plan which has been faster in real life.
I can't recall that I have seen a case where the serial version of the same parallel plan has been faster, but it is perfectly conceivable, since the parallelism operators add some overhead.