Hello,
The Computer Vision Read API is Azure's latest OCR technology (learn what's new) that extracts printed text (in several languages), handwritten text (English only), digits, and currency symbols from images and multi-page PDF documents. It's optimized to extract text from text-heavy images and multi-page PDF documents with mixed languages. It supports detecting both printed and handwritten text in the same image or document.
The Read API includes the following features.
Print text extraction in 73 languages
Handwritten text extraction in English
Text lines and words with location and confidence scores
No language identification required
Support for mixed languages, mixed mode (print and handwritten)
Select pages and page ranges from large, multi-page documents
Natural reading order for text lines
Handwriting classification for text lines
Available as Distroless Docker container for on-premise deployment
I think this is a good way for novel since novel is a kind of heavy text document.
https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/overview-ocr#read-api
Regards,
Yutong