A family of Microsoft word processing software products for creating web, email, and print documents.
As Jay said, there is no HTML in Word documents.
You say you "cut and paste the HTML from old word docs". Exactly how did you do that? What version of Word, and how did you do it.
I just thought of something. Although DOCX files are structured by XML, you do have the option of using File > Save As to save them to HTML format. I prefer "Web Page, Filtered", it removes a lot of redundant carp that otherwise Word would put in the straight HTML file.
After that you could copy HTML to paste elsewhere.
Lets take a high level look at some Word history.
The old "DOC" format files were "binary". A loose equivalent to a compiled program. If you opened one in a text editor, it was unreadable. Even if you opened it in a hex editor, the file was still essentially unreadable.
In 2007 MS switched to the DOCX format. As Jay pointed out the DOCX is actually a renamed ZIP format file, that contains a bunch of text only "XML" code files that can be easily read by any text editor (ie Wordpad). The easiest way I've found to rename it is to ADD the .ZIP file extension (rather than remove the DOCX and replace it with ZIP). Inside the zip are a bunch of standard folders containing the various bits and bobs. The most relevant folder is /Word. Inside it, if you have any pictures is the /Media folder to hold the pictures. The most relevant subfile is "document.xml" It contains the body text, but it is buried inside of XML that looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14
XML is a variation of HTML.
Most (99,99999%) Office users don't care about the underlying XML/HTML. So MS has not bothered to provide direct access to it. MS feels if you need the XML, you can unzip the file and access it. And even that was a "secret" that MS didn't really share publicly. A few geeks figured it out and spread the word.
The point is, DOCX files are complex structures of files and subfolders. There is no simple way to directly access XML (or HTML) inside of Word.