Tesseract OCR with RegEx

Kmcnet 1,006 Reputation points
2023-12-27T02:55:13.4333333+00:00

Hello everyone and thanks for the help in advance. I am developing a C# application that reads PNG files with Tesseract 5. I am having problems extracting a date from the Tesseract recognized text. The problem is the extracted text, when displayed in a textbox, is correct and displays like:

 DOB: 04/02/2016 

My code to extract the text:

        public string PatientDateOfBirth { get; set; }         public ExtractPatientDateOfBirth(string TextToParse)         {             string MatchPattern = "DOB:" + @"(\ +)([0-9\.\<\/\s+]+)";             Regex r = new Regex(MatchPattern, RegexOptions.IgnoreCase);             Match m = r.Match(TextToParse);             string PatientDOB = "";             while (m.Success)             {                 PatientDOB = m.ToString();                 m = m.NextMatch();             }             PatientDOB = PatientDOB.Replace("DOB:", "");             PatientDOB = PatientDOB.Trim();             //var PatientDateTime = DateTime.ParseExact(PatientDOB, "M/d/yyyy", CultureInfo.InvariantCulture);             //PatientDateOfBirth = PatientDateTime.ToString("MM/dd/yyyy");             PatientDateOfBirth = PatientDOB;         }

returns the value 04702/2016 where one of the forward slashes is a 7. So I am not sure what is causing this problem or how to correct it.

Not Monitored
Not Monitored
Tag not monitored by Microsoft.
43,249 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.