You need to pick an OCR vendor or api. You send the documents to them and they return the formatted data.
you sample vendor has a rest api. You would probably use HttpClient to call.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Currently we have one page and this page contains one file-upload and one submit button, then the user will select a file from file-upload and submit the pdf,jpeg,gif file.
Now my requirement is the user will upload pdf,jpeg,gif read the content of the file with maximum Accuracy and data need to be insert into sql table like below how the file content and our requirement is same like below link
https://nanonets.com/blog/how-to-ocr-purchase-orders-for-automation/#digitising-purchase-orders
I don't know how to start and achieve this. Please suggest and give us a sampling code.
You need to pick an OCR vendor or api. You send the documents to them and they return the formatted data.
you sample vendor has a rest api. You would probably use HttpClient to call.
Hi @coder rock,
It is best to provide your test pictures (privacy is not involved) so that we can test the code that satisfies you based on your data.
I see you've already solved it elsewhere, but I've added comments and some new code based on your needs. You can refer to the code below:
@using (Html.BeginForm("Index", "Home", FormMethod.Post, new { enctype = "multipart/form-data" }))
{
<span>Select File:</span>
<input type="file" name="postedFile" />
<input type="submit" value="Upload" />
<hr />
<span>@ViewBag.Message</span>
}
private static readonly HashSet<string> _extractKeys = new HashSet<string> { "Name", "Mobile", "Address" };
private static readonly HashSet<string> _ignoredKeys = new HashSet<string> { "Bill" };
public ActionResult Index(HttpPostedFileBase postedFile)
{
if (postedFile != null)
{
string filePath = Server.MapPath("~/Uploads/" + Path.GetFileName(postedFile.FileName));
postedFile.SaveAs(filePath);
string extractText = this.ExtractTextFromImage(filePath);
var text = extractText.Replace(Environment.NewLine, "<br />");
//var a = "logo Name raj mobile 9038874774 address 6-98 india bill auto generated";
//String.Split method uses space character as separator to separate strings
var splitLine = text.Split(' ');
//The Dictionary<TKey, TValue> is a generic collection that stores key-value pairs in no particular order.
//You can create the Dictionary < TKey, TValue > object by passing the type of keys and values it can store.
var pairs = new Dictionary<string, string>();
//Traverse the array to get the Key
for (var i = 0; i < splitLine.Length; i++)
{
//Locate characters based on the length of the resulting string array
var candidateKey = splitLine[i];
//Check if _extractKeys contains candidateKey
if (!_extractKeys.Contains(candidateKey))
{
//If it is not included, continue execution.
continue;
}
//Traverse the array to get the Value
var value = "";
for (var v = i + 1; v < splitLine.Length; v++)
{
var candidateValuePart = splitLine[v];
//Check if next field contains _ignoredKeys or _extractKeys
if (_ignoredKeys.Contains(candidateValuePart) || _extractKeys.Contains(candidateValuePart))
{
i = v - 1;
break;
}
value = value + candidateValuePart + " ";
}
//Gets the dictionary key if the value contains a string in _extractKeys
pairs.Add(candidateKey, value.Trim());
}
foreach (var kv in pairs)
{
DB.Customers.Add(new Test()
{
Key = kv.Key,
Value = kv.Value
});
}
DB.SaveChanges();
}
return View();
}
private string ExtractTextFromImage(string filePath)
{
string path = Server.MapPath("~/") + Path.DirectorySeparatorChar + "tessdata";
using (TesseractEngine engine = new TesseractEngine(path, "eng", EngineMode.Default))
{
using (Pix pix = Pix.LoadFromFile(filePath))
{
using (Tesseract.Page page = engine.Process(pix))
{
return page.GetText();
}
}
}
}
Best regards,
Lan Huang
If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.