Linux Software: OCR Software
Showing posts with label OCR Software. Show all posts
Showing posts with label OCR Software. Show all posts

October 9, 2018

Tesseract OCR Optical Character Recognition for Linux

Tesseract OCR Optical Character Recognition Software for Linux whicn run in Terminal with command -command line OCR tool. If have scanned document of ebooks, journal, or papers and want to convert the scanner picture to text file you should you use Tesseract OCR.

The Optical Character Recognition software, Tesseract OCR perform better for converting image from scanned ebook paper into txt file.

So I would like to know the recommended Optical Character Recognition softwares for Linux is Tesseract OCR which is available for free.

Tesseract OCR Optical Character Recognition for Linux

The best Optical Character Recognition Softwares is Tesseract OCR. Definition of OCR is a technology that allows you to convert scanned images of text into plain text. This enables you to save space, edit the text and search/index it.

Tesseract OCR is the most accurate, under Linux now they lack graphical interface (GUI), which is a very important usability feature for a typical desktop user.

The current version of Tesseract in the Ubuntu repository is a command-line-only tool. After successful installation, the command to use is tesseract <path to image> <basename of output file>. Tesseract will automatically give the output file a .txt extension. If you have installed the language specific data files from one of the tesseract-ocr-??? packages, you can give an -l option followed by the language code.

Tesseract OCR Optical Character Recognition for Linux tesseract ocr linux install tesseract ocr linux download tesseract ocr linux mint tesseract ocr linux java tesseract ocr linux pdf tesseract ocr gui linux tesseract ocr tutorial linux tesseract-ocr arch linux tesseract ocr example linux tesseract ocr linux command tesseract ocr for linux how to install tesseract-ocr linux how to use tesseract ocr in linux

How to Install Tesseract OCR for Linux and How to Use It

Tesseract OCR is available on linux repository, you can install Tesseract OCR by typing these command:
sudo apt-get install tesseract-ocr
#How to Use Tesseract OCR in Linux
Below is an example of Tesseract OCR usage from Terminal:
If you have scanned book named with ScannedBook.png then you can convert the image to text by typing:
tesseract ScannedBook.png output
Will produce a file called output.txt

Tesseract OCR Review in Linux

Tesseract is probably the most accurate open source OCR engine available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 60 languages.

It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google. It is released under the Apache License 2.0.

Tesseract was originally designed to recognize English text only. Efforts have been made to modify the engine and its training system to make them able to deal with other languages and UTF-8 characters. Tesseract 3.0 can handle any Unicode characters (coded with UTF-8), but there are limits as to the range of languages that it will be successful with, so please take this section into account before building up your hopes that it will work well on your particular language!

Tesseract 3.01 added top-to-bottom languages, and Tesseract 3.02 added Hebrew (right-to-left). Tesseract currently handles scripts like Arabic with an auxiliary engine called cube (included in Tesseract 3.0+)

NOTES: If you are not familiar with command using linux Terminal, you should install YAGF --the GUI for Tesseract OCR.

May 31, 2016

YAGF - Front End of Tesseract OCR in Linux Open Source

YAGF - Front End of Tesseract OCR in Linux Open Source. YAGF is a graphical interface for cuneiform and tesseract text recognition tools on the Linux platform. This guide will inform you how to install YAGF in linux and YAGF review briefly.

With YAGF you can scan images using software called XSane, import pages from PDF documents, perform images preprocessing and recognize texts using cuneiform from a single command centre. YAGF also makes it easy to scan and recognize several images sequentially.

YAGF - Front End of Tesseract OCR in Linux Open Source

OCR stands for Optical Character Recognition, and YAGF stands for, uh, something maybe Yet Another Graphical Frontend.

If you are using Tesseract OCR via command line Terminal, now you can use Tesseract OCR via YAGF and it has GUI window. Most people probably just want a simple utility that can scan and convert their documents and extract text, the Tesseract OCR with YAGF is the solution!

YAGF is a graphical front-end for cuneiform and tesseract OCR tools. With YAGF you can open already scanned image files or obtain new images via XSane (scanning results are automatically passed to YAGF).

Once you have a scanned image you can prepare it for recognition, select particular image areas for recognition, set the recognition language and so on. Recognized text is displayed in a editor window where it can be corrected, saved to disk or copied to clipboard.

YAGF - Front End of Tesseract OCR in Linux Open Source ocr program linux ocr freeware linux ocr software linux debian ocr software linux download ocr program for linux ocr software for linux ubuntu ocr software for linux download ocr software for linux mint ocr software für linux ocr software linux gui

How to Install YAGF in Linux Mint and Ubuntu

YAGF is available on linux repository, so you can install YAGF using Terminal and start by typing:
sudo apt-get install yagf
Wait until installation completed! Now you can open YAGF after install by clicking Start/Menu >> >> Office >> YAGF

YAGF Review - GUI Version of OCR Software in Linux Using Tesseract

New version of YAGF offer features which able to work with PDF files. In some cases, the files might be protected, and you might not have the option to copy text, or there might be useful information embedded inside images included in the PDF documents.

You can try online conversion tools, but perhaps YAGF can offer similar, if not better results. As a test file, I grabbed my own Linux kernel crash book, which comes with an interesting assortment of formatted text, plain-text paragraphs, as well as screenshots.

YAGF handled the 182-page document well, so this is an encouraging sign, because it means it can probably work with large data sets.

The output is, well, not as good as one might hope for. Plain text, which can just be copied and pasted, is fine. But YAGF did not handle code/command blocks and images that well. I can understand that images might pose some problem, but text boxes really shouldn’t be a challenge.