الحالة::مؤرشفة
المراجع:: https://github.com/koreader/koreader/wiki/Dictionary-support#dictionary-lookups-in-scanned-pages
KOReader has a built-in OCR engine for recognizing words in scanned PDF/DJVU pages. In order to use OCR in scanned pages, you need to install respective Tesseract trained data and add new document languages to
koreader/defaults.lua
, if your language is other than English or Chinese.
Download language data files for Tesseract 4.00+ and copy the appropriate language data file (e.g.
eng.traineddata
in thetesseract-fast repository
for English andspa.traineddata
for Spanish) intokoreader/data/tessdata
.To add new languages, open
koreader/defaults.custom.lua
and add languages via theirISO 3-letter code
(important, this needs to match the training data filename!) to theDKOPTREADER_CONFIG_DOC_LANGS_CODE
array:DKOPTREADER_CONFIG_DOC_LANGS_CODE = {"eng", "chi_sim"} -- language code, make sure you have corresponding training data
For example, for Kazakh these would be
kaz
; for Russian -rus
, etc. If you are unsure of the code for your language, look at the tessdata filenames first.If you’ve never customized any advanced settings before, the file will not exist, in which case, just follow the directions in the next sentence, any modified entries will appear in bold, and will automatically be added to the file on exit (this will also help making sure that file is syntactically sound).
If you don’t need to add new entries, and simply want to modify the existing ones, you can also go to
Tools
>More tools
>Advanced settings
in the file-manager’s top menu, and find theDKOPTREADER_CONFIG_DOC_LANGS_CODE
entry there.
Forced OCR
option make KOReader to ignore any built-in text layers that come with pdf/djvu and use only OCR tessdata instead.