الحالة::مؤرشفة
المراجع:: https://github.com/koreader/koreader/wiki/Dictionary-support#dictionary-lookups-in-scanned-pages
KOReader has a built-in OCR engine for recognizing words in scanned PDF/DJVU pages. In order to use OCR in scanned pages, you need to install respective Tesseract trained data and add new document languages to
koreader/defaults.lua, if your language is other than English or Chinese.
Download language data files for Tesseract 4.00+ and copy the appropriate language data file (e.g.
eng.traineddatain thetesseract-fast repositoryfor English andspa.traineddatafor Spanish) intokoreader/data/tessdata.To add new languages, open
koreader/defaults.custom.luaand add languages via theirISO 3-letter code(important, this needs to match the training data filename!) to theDKOPTREADER_CONFIG_DOC_LANGS_CODEarray:DKOPTREADER_CONFIG_DOC_LANGS_CODE = {"eng", "chi_sim"} -- language code, make sure you have corresponding training dataFor example, for Kazakh these would be
kaz; for Russian -rus, etc. If you are unsure of the code for your language, look at the tessdata filenames first.If you’ve never customized any advanced settings before, the file will not exist, in which case, just follow the directions in the next sentence, any modified entries will appear in bold, and will automatically be added to the file on exit (this will also help making sure that file is syntactically sound).
If you don’t need to add new entries, and simply want to modify the existing ones, you can also go to
Tools>More tools>Advanced settingsin the file-manager’s top menu, and find theDKOPTREADER_CONFIG_DOC_LANGS_CODEentry there.
Forced OCRoption make KOReader to ignore any built-in text layers that come with pdf/djvu and use only OCR tessdata instead.