Can't convert Word Document Equation into HTML readable format text like MathML in C#

conduct exam 0 Reputation points
2024-04-24T04:35:09.1466667+00:00

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.

At the time of converting from word file to html my equations which are in the word document file was convert into image.

 

Globals.ThisAddIn.Application.ActiveDocument.Select();

Microsoft.Office.Interop.Word.Document doc = Globals.ThisAddIn.Application.ActiveDocument;

 

string result = Path.GetTempPath();

 

string tmpFileName = Globals.ThisAddIn.Application.ActiveDocument.FullName;

doc.SaveEncoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUSASCII;

if (File.Exists(result + "temp.html"))

{

    File.Delete(result + "temp.html");

}

doc.SaveAs(result + "temp.html", WdSaveFormat.wdFormatFilteredHTML);

 

doc.Close(Microsoft.Office.Interop.Word.WdSaveOptions.wdDoNotSaveChanges);

 

HtmlAgilityPack.HtmlDocument mangledHTML = new HtmlAgilityPack.HtmlDocument();

mangledHTML.Load(result + "temp.html");

 

 

if (File.Exists(result + "newtemp.html"))

{

    File.Delete(result + "newtemp.html");

}

 

mangledHTML.Save(result + "newtemp.html");

// Remove standalone CRLF

 

string badHTML = File.ReadAllText(result + "newtemp.html");

badHTML = badHTML.Replace("\r\n\r\n", "ackThbbtt ");

badHTML = badHTML.Replace("\r\n", " ");

badHTML = badHTML.Replace("ackThbbtt ", "\r\n");

badHTML = badHTML.Replace('�', ' ');

if (File.Exists(result + "finaltemp.html"))

{

    File.Delete(result + "finaltemp.html");

}

File.WriteAllText(result + "finaltemp.html", badHTML);

 

// Clean up temp files, show the finished result in Notepad

File.Delete(result + "temp.html");

File.Delete(result + "newtemp.html");

 

Microsoft.Office.Interop.Word.Document orignalDoc = new Document();

orignalDoc = Globals.ThisAddIn.Application.Documents.Open(tmpFileName);

 Basically, what I want to do is I want to store all word document paragraph data separately in database and I also want it’s all property like font size, font width, font name and font style. So that I can show it in my application as it is as I written in word document file.

To represent it as it is I need to convert it html format and the by sepreting all paragraphs I can store it in database. But when in my word document has paragraph which have equations then

 

Globals.ThisAddIn.Application.ActiveDocument.Select();

Microsoft.Office.Interop.Word.Document doc = Globals.ThisAddIn.Application.ActiveDocument;

 

string result = Path.GetTempPath();

 

string tmpFileName = Globals.ThisAddIn.Application.ActiveDocument.FullName;

doc.SaveEncoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUSASCII;

This code converts my word documents all equations in Images and as it convert in image I can’t show the equation properly in my application.

So I tried to convert this equations in MATHML form but I couldn’t solve this.

.NET
.NET
Microsoft Technologies based on the .NET software framework.
3,398 questions
ASP.NET Core
ASP.NET Core
A set of technologies in the .NET Framework for building web applications and XML web services.
4,190 questions
C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,279 questions
Office Development
Office Development
Office: A suite of Microsoft productivity software that supports common business tasks, including word processing, email, presentations, and data management and analysis.Development: The process of researching, productizing, and refining new or existing technologies.
3,509 questions
0 comments No comments
{count} votes