Abstract
Abstract
Introduction
Product Functions
Applications of Tamil-OCR
Existing system
Proposed System
Literature Survey
Operating Environment
Software Requirements:
System Features
Language Auto Detection
Character Mapping
10 Best Smartphones Under 15,000 Rs in India
Font & Style Detection
MODULE
Module 1: Image Acquisition
Module 2: Preprocessing
Module 3: Segmentation
Module 4: Feature Extraction
Module 5: Classification and Recognition
Module 6: Post Preprocessing
Design and Implementation Constraints
Assumptions and Dependencies
Usecase Diagram
Sequence Diagram
Activity Diagram
Class Diagram
Tamil OCR-GUI
Final View OCR-GUI
Output OCR-GUI
Result
Conclusion
References
The aim of the project is to develop OCR software for Tamil character recognition. OCR is an optical character recognition and is the mechanical or electronic translation of images of typewritten or handwritten (usually captured by a scanner) into machine-editable text. OCR is a field of research in pattern recognition, artificial intelligence and machine vision. Character recognition is used most often to describe the ability of computer to translate printer or human writing into text. In this paper we focus on recognition of English alphabet in a given scanned text document with the help of Neural Networks. Using Mat labNeural Network toolbox, we are going to recognize handwritten characters by projecting them on different sized grids. The first step is image acquisition which acquires the scanned image followed by noise filtering, smoothing and normalization of scanned image, rendering image suitable for segmentation where image is decomposed into sub images. Feature Extraction improves recognition rate and misclassification. We going to use character extraction and edge detection algorithm for training the neural network to classify and recognize the handwritten character. Existing Applications which are similar to our application contain many mismatches and errors that will be rectified in our project which increases the accuracy of the text character recognition.
IMPLEMENTATION OF TAMIL-Optical character recognition USING NEURAL NETWORK
Objectives- The objective of this project is to develop robust OCR's for printed Tamil scripts, which can deliver desired performance for possible conversion of legacy, printed documents into electronically accessible format.
- It should be boon to the people who uses the proprietary OCR’s.
- It will be good arrival to magazine industry.
- The aim of the project is to develop OCR software for Tamil character recognition.
- OCR is an optical character recognition and translation of images of typewritten or handwritten (usually captured by a scanner) into machine-editable text.
- In this project, the focus is on recognition of Tamil alphabet in a given scanned text document with the help of Neural Networks.
- Handwritten characters are recognized by projecting them on different sized grids using Java.
- OCR, is the process of translating images of handwritten, typewritten, or printed text into a format understood by machines for the purpose of editing, indexing/searching, and a reduction in storage size.
- Optical Character Recognition that would use an Artificial Neural Network as the backend to solve the classification problem.
- The input for the OCR problem is pages of scanned text.
Product Functions
- Tamil Optical Character Recognition converts the text image into text document.
- OCR that includes a way to edit the text directly.
- It enables the users to store the text in as a separate file in the system.
- Data entry for business documents, e.g. check, passport, invoice, bank statement and receipt
- Automatic insurance documents key information extraction
- Extracting business card information into a contact list
- More quickly make textual versions of printed documents, e.g. book scanning for Project Gutenberg
- Make electronic images of printed documents searchable, e.g. Google Books
- Defeating CAPTCHA anti-bot systems, though these are specifically designed to prevent OCR
Existing system
- In the running world there is a growing demand for the users to convert the printed documents in to electronic documents for maintaining the security of their data.
- Hence the basic OCR system was invented to convert the data available on papers in to computer process able documents, So that the documents can be editable and reusable.
- The existing system is not efficient for the language Tamil and also have lots of errors in detecting the characters.
- The existing system consume much more time to recognize the characters from the image.
Proposed System
- The proposed Tamil Optical Character Recognition will perform some serious of operation to perform the Recognition process easier and accurate.
- To perform the recognition of character faster.
- To obtain complete accuracy in the text recognition.
- To develop OCR for Tamil language.
Literature Survey
[1] Kauleshwar Prasad, Devvrat C. Nigam, Ashmika Lakhotiya and Dheeren Umre (2013)“Character Recognition Using Neural Network Toolbox”, International Journal of u- and e- Service, Science and Technology
- This paper focus on recognition of alphabet in a given scanned text document with the help of neural Networks.
- Here we use character extraction and edge detection algorithm for training the neural network to classify and recognize the characters.
[2]VenuGovindaraju, SrirangarajRangaSetlur (2013) “Guide to OCR for Indic Scripts: Document Recognition and Retrieval”. International Journal of Advanced Research in Computer Science and Software Engineering
- It helps in developing a new approach to deal with the problem with indic scripts.
[3]Java Neural Network Framework Neuroph,Link: “http: //sourceforge.net/projects/neuroph/?source=directory “
- The above website provides the informations about the neuroph.
Operating Environment
Software Requirements:
- Windows/Linux: The Tamil optical character recognition application using neural network will operates on windows (XP/7/8), Linux. All device that supports the version of the windows or Linux operating system will be able to run the software.
- NetBeans: NetBeans’s extensive GUI features/toolkits make GUI development easy and flexible. The software is developed using NetBeans.
- Open Office: Open office is the leading open-source office suite for word processing.
System Features
Language Auto Detection
- Tamil-Optical Character Recognition will detect the language based on Tamil Unicode range.
- Tamil characters fall within a specific Unicode range.
Character Mapping
- The Tamil-OCR will automatically maps the character by defining the box for each character.
- The space will act as a delimiter.
10 Best Smartphones Under 15,000 Rs in India
Font & Style Detection
- The OCR will automatically detects the font style and the size of the font.
MODULE
Module 1: Image Acquisition
- In Image acquisition, the recognition system acquires a scanned image as an input image.
- The image should have a specific format such as JPEG, GIF, etc.
Module 2: Preprocessing
- The role of pre-processing is to segment the interesting pattern from the background.
- The noise filtering, smoothing and normalization should be done in this step.
Module 3: Segmentation
- An image of sequence of characters is decomposed into sub-images of individual character.
- This labelling provides information about number of characters in the image.
Module 4: Feature Extraction
- The features of the characters that are crucial for classifying them at recognition stage are extracted.
- Every character image is divided into equal zones.
Module 5: Classification and Recognition
- A feed forward back propagation neural network is used in this work for classifying and recognizing the handwritten characters.
- The pixels derived from the resized character in the segmentation stage form the input to the classifier.
Module 6: Post Preprocessing
- Post-processing stage is the final stage of the proposed recognition system.
- It prints the corresponding recognized characters in the structured text form.
- The system is designed to identify the character from image, it is necessary to train with each characters for many times.
- The training to improve the quality of recognition will pose a difficult challenge.
- Other constrains such a noise filtering and segmentation of each character are also worth considering.
- The application is meant to be accurate even when dealing with noisy data so each portion must be designed and implemented with efficiency in mind.
- It can recognize only Tamil language.
Assumptions and Dependencies
- The training should be given for each characters with various size.
- It is necessary to convert the image into binary format.
- We convert it into grayscale image, then by using the threshold of the grayscale it is converted to binary.
Usecase Diagram
Sequence Diagram
Activity Diagram
Class Diagram
Tamil OCR-GUI
Final View OCR-GUI
Output OCR-GUI
Result
- The result of the project is that the Tamil-OCR is implemented for recognizing the Tamil text in the scanned image and to convert it into an editable text format.
- This increase the accuracy of the OCR process for the language Tamil.
- This increases the most of the Tamil press people to migrate towards free and open source software.
- The GUI for Tamil-OCR will make easy for all the peoples who don’t have knowledge about the OCR to use it in a perfect manner.
Conclusion
- Our system will be developed for end users who have basic knowledge on Linux/Windows.
- It will perform intended operation under almost all circumstances.
- The GUI for Tamil-OCR ill make easy for all the peoples who don’t have knowledge about the use of OCR in a perfect manner.
- The process of the unit testing involves independent analysis of the system in parts or in units.
- This project is OS independent and hence people can work on their desire Operating System.
- Kauleshwar Prasad, Devvrat C. Nigam, Ashmika Lakhotiya and Dheeren Umre “Character Recognition Using Neural Network Toolbox”, International Journal of u- and e- Service, Science and Technology Vol. 6, No. 1, February, 2013.
- Venu Govindaraju, Srirangaraj (Ranga) Setlur “ Guide to OCR for Indic Scripts : Document Recognition and Retrieval”.
- Java Neural Network Framework Neuroph, Link: “http://sourceforge.net/projects/neuroph/?source=directory “
Tag:
IMPLEMENTATION OF OCR USING NEURAL NETWORK IEEE Paper Download
IMPLEMENTATION OF OCR USING NEURAL NETWORK Paper Presentation ppt download.
IEEE Paper abstract and PPT Download for seminar and paper presentation. Project paper download