Disclaimer: This is an example of a student written essay.
Click here for sample essays written by our professional writers.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UKEssays.com.

Convolution Neural Network to Construct Model of Text Recognition

Paper Type: Free Essay Subject: Computer Science
Wordcount: 4781 words Published: 18th May 2020

Reference this


Recognizing handwritten character is still problematic. So that is the reason neural network has ended up as essential technique for recognizing character now days. The purpose behind this is to take handwritten English characters as information and then the character afterwards trains the algorithm of the neural network and then recognizes the pattern so that the character changes to an improved adaptation. HCR technique changes over pictures into editable format. This technique changes over pictures in the form of documents such as edit, modify and store date for long period. This technique includes pre – processing, segmentation, extraction of features, classification and recognition, etc. In this paper we proposed Convolution neural network to construct the model of text recognition.


Keyword: Convolution Neural Network, Feature Extraction, Classification, Recognition.




Handwritten character recognition (HWCR) defined as conversion of a handwritten text into a machine process ableformat. HWCR also has a computer’s ability to obtain and decipher intelligible handwritten input from sources like paper reports, photographs, touch screens, and other devices. The image of the written text might be detected “offline” from a piece of paper by optical examining. It basically involves optical character recognition. Additionally, a complete handwriting recognition system handles formatting, performs proper segmentation into characters, and finds the words most conceivable. The classification of handwriting recognition should be done in the following two major classifications:


Offline Handwriting Recognition (OFHWR): Under such handwriting recognition, the writing is accessible as an image. It should be possible with the help of a scanner which catches the writing optically.


Online Handwriting Recognition (ONHWR): The 2D co – ordinates of progressive points are represented as a function of time under such handwriting recognition, and the order of strokes made by the writer is also accessible. Neural networks in layers are organized. Layers consist of the number of ‘ nodes ‘ interconnected. Patterns are presented to the network through ‘input layer’, which impart to one or more ‘hidden layers’ the place real handling is done through ‘connections’. The hidden layers then connection to an ‘output layer ‘where the output shown. Neural networks are particularly useful for taking care of issues that cannot be communicated as arrangement of step such as recognizing patterns, arranging them into groups, series prediction and data mining. Recognition of patterns is the most popular use of neural networks. The neural networks is presented with a target vector furthermore vector which took the pattern details could be an image and handwritten data. The neural network then attempts to determine if the input samples match a pattern the neural network has remembered. A classification – trained neural network is designed to take input tests and arrange them into groups. Without clearly defined boundaries, these groups may be fluffy. In this paper “Hand written character recognition” is user interactive software for recognizing characters. The input can be given as a picture, with sufficient light and intensity to distinguish between black and white at least by the computer. It is a pattern recognition problem, where patterns are characters and numbers. Although a lot of research and work has already been done in this area, the overall results and performance are still being improved. It essentially takes single – character images, processes them using machine vision applications, and predicts the character using the pre – trained neural network model of deep convolution.



In paper [1] author proposed a neural convolution network (CNN) to recognize various traditional Chinese styles of calligraphy. He trained the CNN on a dataset of 5 classes consisting of 15,000 instances. The pre-pressed images had 96×96 sizes and built eight different models to test different numbers of depths and filters. He evaluated each model’s performance and analyzed each style’s recognition ratio. He also viewed the reconstructed images to maximize activation and activation maps of features. The author found that, in other words, building deeper network models increases the model’s overall accuracy of recognition. And there was no increase in the number of filters in the accuracy. To improve the result, the author suggested that better data preprocessing be done and that the training images be increased by data.


In paper [2] author proposed a Global Supervised Low – Rank Expansion (GSLRE) method and an Adaptive Drop – weight (ADW) method to solve handwritten Chinese character recognition (HCCR) issues with speed and storage space. They originally designed a nine – layer CNN for HCCR consisting of 3,755 classes and developed an algorithm which can decrease network computational costs by nine times and compress the network to 1/18 of the baseline model’s original size, with an accuracy drop of 0.21 percent. The Global Supervised Low – Rank Expansion accelerated calculations in the convolutionary layers, and an Adaptive Drop – weight method removed redundant connections using a dynamic increase in each layer’s pruning threshold. They also proposed Connection Redundancy Analysis technology in their work to analyze redundant connections in each layer to guide the CNN pruning that the authors claimed did not compromise the network’s performance.

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Essay Writing Service

In paper [3Author used two main approaches to accomplish this task: direct word classification and character segmentation. They used CNN for the former to train a model that could categorize words with different architectures accurately. For the latter, they used converted Long Short Term Memory (LSTM) networks to build bounding boxes for each character. However, this model struggled with segmenting cursive characters because of the breakdown of boundaries between some cursive characters. The authors pointed out that the segmentation model had no separate training. They could not attribute the cause of the error to the classification model or character classification segmentation model. In order to have more compelling and robust training, the authors suggested applying additional preprocessing techniques such as jittering. The authors also suggested considering a more comprehensive but still effective algorithm for decoding such as searching for beams to improve the character segmentation model.

In paper [4] Author proposed to improve DCNN’s online HCCR approach by incorporating a variety of domain – specific knowledge, including deformation, non – linear normalization, imaginary strokes, path signature, and 8-directional features. Double contribution was claimed by the authors. First, researching and integrating domain – specific techniques with DCNN to create a composite network for improved performance. Second, a hybrid serial – parallel (HSP) strategy combines the resulting DCNNs with diversity in their domain knowledge. As a result, CASIA-OLHWDB1.0 and CASIAOLHWDB1.1 databases achieved an accuracy of 97.20% and 96.87% respectively. The authors suggested finding a better way to combine all knowledge of the domain with appropriate DCNN architectures.

In paper [5], Deep Belief Neural Network (DBNN) is used for Arabic handwritten character recognition. This network family manages large-scale inputs, allowing the use of raw data inputs rather than extracting a feature vector and learning the complex boundary of class decisions. The system takes the raw data as input in this work and proceeds with a layer-wise grasping unsupervised learning algorithm.  Two different databases tested the approach. The results on the HACDB database were promising for the character level with an error classification rate of 2.1 percent. However, the assessment of word level on the ADAB database shows an error rate that exceeded 40 %.Hence, the proposed DBNN structure was not able to deal with high level dimensional data. The authors suggested reconfiguring the DBNN architecture to be able to deal with various length input data.

In paper [6], Author applied, and combined, max – out – based CNN and bidirectional Short – Term Memory (BLSTM) patterns created from online patterns and original online patterns. The three conclusions were drawn by the authors

1. DNN can improve mathematical symbols offline recognition because it can extract wider yet specific features;

2. As far as online methods are concerned, BLSTM outperforms Markov Random field as it can flexibly access the entire input sequence context;

3. Combining both online and offline recognition methods improves classification performance by taking their advantages.


In paper [7] presents hybrid feature extraction and GA based feature selection for off-line handwritten character recognition by using adaptive MLPNN classifier for achieving an improved overall performance on real world recognition problems. The development of an off – line character recognition system was based on seven feature extraction approaches, namely box method, diagonal distance method, mean and gradient operation, standard deviation, center of gravity and edge detection. The Adaptive MLP classifier without feature reduction and Adaptive MLP classifier with feature reduction are two different recognition networks. On the CEDAR CDROM-1 data set, the network is trained and tested. It can be concluded from the experimental results that in terms of speed and storage requirements, the network that uses GA-based feature selection method improves overall recognition system performance. The proposed adaptive MLP Neural Network has also been verified to work as a better classifier and provide better accuracy.

In paper [8] author uses single layer ANN to recognize handwritten English alphabets. This approach makes implementing and understanding the ANN simple, easy. To achieve optimum accuracy, the row – wise segmentation technique was developed and used here. This paper is an approach to developing a method for using the easily available resources to obtain optimized results. Row – wise segmentation helps to extract some common characteristics between different people’s distinct handwriting styles.

In paper [9] The author proposes that hybrid Hidden Markov Model (HMM)/Artificial Neural Network (ANN) models be used to recognize unconstrained handwritten texts. Markov chains were modeled on the structural part of the optical models and use a multilayer perceptron to estimate emission probabilities.  This paper also presents new techniques to remove slope and slant from handwritten text and with supervised learning methods to normalize the size of text images. Using Multilayer Perceptron to classify the local extreme of text contours, slope correction and size normalization are achieved. Slant is also removed by using Artificial Neural Networks in a non – uniform manner. Experiments were carried out on offline handwritten text lines from the IAM database, and the recognition rates achieved are among the best for the same task compared to those reported in the literature.


In paper [10], an off – line English character recognition system is proposed using the technique of hybrid extraction and neural network classifiers. A method of extraction of hybrid features combines the diagonal and directional characteristics. The proposed system suitably combines the salient features of the handwritten characters to enhance the recognition accuracy. To classify the characters, topologies of the Neural Network (NN) are built to classify the characters. For comparison, the k-nearest neighboring network is also built. Feed forward NN topology displays the highest recognition accuracy and is identified as the most suitable classifier. The proposed system will support applications for postal / package address recognition and the conversion of any handwritten document into structural text form. The performance of recognition systems is compared extensively using test data to draw the major conclusions of this paper.


In paper [11] author proposed to maintain convexity in the neural network (FHLSNN). The modified membership function is found to be superior to their defined function, which gives relatively lower values to patterns falling close to the hyperline segment (HLS) but far from two HLS endpoints. The performance of modified fuzzy hyperline segment neural network (MFHLSNN) is tested with the two splits of FISHER IRIS data and is found superior than FHLSNN. Also found to be superior to the modified neural network is the general fuzzy min – max neural network (GFMM) proposed by Bogdan Gsbrys and Andrzej Bargiela, and the general fuzzy neural hypersphere network (GFHSNN) proposed by D.D.Doye and T. V. Kulkarni. R. Sontakke, you know.






Pre-processing: It is to segment the interesting pattern from the background i.e. noise filtering. Smoothing and normalization should be done in this step. The pre-processing also defines a compact representation of the pattern. The main objective of pre-processing is to de noise and enhances the quality of image.


Input Text

Here the input is given through the characters, each character can be given by the user and the character can be processed future steps as shown in the above figure.

Noise Removal

Optical scanning devices introduce some noises like, disconnected line segments, bumps and gaps in lines, filled loops etc. It is necessary to remove all these noise elements prior to the character recognition.



The main component of the pre-processing stage is normalization, which attempts to remove some of the variations in the images, which do not affect the identity of the word. Handwritten image normalization from a scanned image includes several steps, which usually begin with image cleaning, skew correction, line detection, slant and slope removal and character size normalization.





Space domain techniques are required for compression. Two important techniques are thresholding and thinning. Thresholding reduces the storage requirements and increases the speed of processing by converting the gray-scale or color images to binary image by taking a threshold value. Thinning extracts the shape information of the characters.


In the segmentation stage, an image consisting of a sequence of characters is decomposed into sub-images of individual characters. The main goal is to divide an image into parts that have a strong correlation with objects or areas of the real world contained in the image. Segmentation is very important for recognition system. Segmentation is an important stage because the extent one can reach in separation of words, lines, or characters directly affects the recognition rate of the script. Image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain visual characteristics. In Character Recognition techniques, the Segmentation is the most important process. Segmentation is done to make the separation between the individual characters. The segmentation stage takes in a page image and separates the different logical parts, like text from graphics, lines of a paragraph, and characters (or parts there of) of a word. After the preprocessing stage, most HCR systems isolate the individual characters or strokes before recognizing them. Segmenting a page of text can be broken down into two levels: page decomposition and word segmentation, When working with pages that contain different object types like graphics, headings, mathematical formulas, and text blocks, page decomposition separates the different page elements, producing text blocks, lines, and sub-words. While page decomposition might identify sets of logical components of a page, word segmentation separates the characters of a sub-word.

The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. Script segmentation is done by executing the following operations: Line segmentation, Word segmentation and character segmentation.


Feature Extraction

Feature extraction is the method to retrieve the most significant data from the raw data. The main objective of feature extraction is to extract a set of features, which maximizes the recognition rate with the least amount of elements. Feature extraction is the heart of any pattern recognition application. Feature extraction techniques like Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), Independent Component Analysis (ICA), Chain Code (CC), Scale Invariant Feature Extraction (SIFT), zoning, Gradient based features, Histogram might be applied to extract the features of individual characters.


Classification and Recognition

The classification phase is the decision making part of the recognition system. The performance of a classifier based on the quality of the features. This stage uses the features extracted in the previous stage to identify the character. When input is presented to HCR system, its features are extracted and given as an input to the trained classifier like artificial neural network or support vector machine. Classifiers compare the input feature with stored pattern and find out the best matching class for input.



Write to word file

After character recognition we are going to write this file character in the word file. 





Convolution Neural Network:

Convolution neural networks (CNNs) are one of the most widely used types of deep artificial neural networks used in different fields such as image and video recognition, speech processing, and natural language processing. Biological processes inspired these networks, such as working the visual cortex in cats and spider monkeys. In 1969, Hubel and Wiesel studied the same thing to classify the cells in these as – simple, complex and hyper – complex cortex. Complex cells have been found to have a receptive field, i.e. the stimulus response area, about twice the size of a simple cell. Therefore the idea was realized to use translational invariance to recognize visual imagery. This property stated that an object’s exact location in an image was less important than the object being detected. Using convolutional neural networks is better than fully connected networks in many applications because instead of connecting any node in a layer to any other node in the previous layer, each node in the mth layer is connected to n nodes in the (m-1) th layer, where n is the size of the CNN’s receptive field. This reduces the total number of network parameters and thus prevents overfitting and also ensures an integrated invariance. LeCun et al introduced the first convolutionary network in 1998, known as Lenet-5. It was a seven-layered network used to digitize handwritten digits of bank checks. It introduced three basic network architectures – shared weights, local receptive fields, and pooling. CNN’s network architecture consists of various components described below.


Layers in a Convolutional Neural Network

Artificial neural networks are made up of different layers between the layers of input and output. In the case of convolutionary neural networks, these layers known as the hidden layers consisted mainly of the following.


Convolutional Layer

Each convolutionary neural network consists of various layers of convolution depending on the requirements of the network. It is the responsibility of the first convolution layers to learn low level features such as edges, corners, etc. These layers ‘ output is often fed to other convolutionary layers that learn features of a higher level. Every neuron in this layer is connected in the previous layer to only a limited number of neurons. The number of neurons to which they are connected is known as the convolution layer’s receptive field. These layers comprise of a filter which has a typical size of nxn. The filter is slid (or more precisely convolved) during the forward pass across the width and height of the input volume and dot products are computed between the entries of filters and the input position which comprises the final feature map. Since images have multiple features, multiple filters are used for convolution, however, and this means the depth of the feature maps (which is a hyper parameter) that is equal to the number of filters used.


Softmax Layer

The softmax layer is generally a linear layer with a softmax classifier that converts the activations into values between 0 and 1, so their sum is 1. It helps to determine the CNN’s final output by finding the output of a recognition or classification task by determining the highest probability value class or category. It essentially represents a distribution of probability.

Pooling Layer

To reduce the number of parameters, the pooling layer is used to decrease the spatial size or image resolution, thus reducing the computational burden. By reducing the number of connections between the convolution layers, it does so. They are usually alternated between the convolution layers.


Fully Connected Layer

These layers are present in a CNN, usually right before the softmax layers, in varying numbers. In the previous layer, each neuron in this layer is connected to each neuron. Itis used to achieve linearity in the networks but can be replaced or converted into convolutional layers to turn the system into a fully convolutional network (FCN).



Due to their high noise tolerance, the explanation behind selecting convolution neural network to perform character recognition. The designed systems can produce accurate results, providing the right dataset when the network is being trained. In terms of speed or accuracy, the software performs well. Be that as it may, since the size of each block varies, the location of the character is not proficient. This can be done by introducing the weights during the data set training. The current system has an extent to improve. The method’s performance has been tested for an arrangement of English text written in capitalized, yet further investigation is required.




[1] Boqi Li, “Convolution Neural Network for Traditional Chinese Calligraphy Recognition”, A project work of Mechanical Engineering, Stanford University.

[2] Xuefeng Xiaoa, LianwenJina, Yafeng Yanga, Weixin Yanga, Jun Sunb, Tianhai Changa, “Building Fast and Compact Convolutional Neural Networks for Online Handwritten Chinese Character Recognition”, joint work of School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China and Fujitsu Research & Development Center Co. Ltd., Beijing, China.

[3] BatuhanBalci, Dan Saadati, Dan Shiferaw, “Handwritten Text Recognition using Deep Learning”, A project work of Stanford University.

[4] Weixin Yang, LianwenJin, ZechengXie, ZiyongFeng, “Improved Deep Convolutional Neural Network For Online Handwritten Chinese Character Recognition using Domain-Specific Knowledge”, research work conduced at College of Electronic and Information Engineering, South China University of Technology, Guangzhou, China

[5] Mohamed Elleuch, Najiba Tagougui, Monji Kherallah, “Arabic handwritten characters recognition using Deep Belief Neural Networks”, 12th International Multi-Conference on Systems, Signals & Devices.  

[6] Hai Dai Nguyen, AnhDuc Le, Masaki Nakagawa, “Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols”, research work conducted at Tokyo University of Agriculture and Technology Nakacho, Koganei-shi, Tokyo.

[7] Gauri Katiyar and Shabana Mehfuz, ‘A hybrid recognition system for off‐line handwritten characters’, Springer Plus.

[8] Rakesh Kumar Mandal, N R Manna, ‘Handwritten English character recognition using row wise segmentation techniques’, ISDMISC by IJCA

[9] J.Pradeep, E.Srinivasan, S.Himavathi “Performance Analysis of Hybrid Feature Extraction Technique for Recognizing English Handwritten Characters” 978-1-4673-4804-1_c IEEE.

[10] Chirag I Patel, Ripal Patel, Palak Patel, “Handwritten Character Recognition Using Neural Networks”, International Journal of Scientific and Engineering Research Vol. 2

[11] Ashutosh Aggarwal, Rajneesh Rani, Renu Dhir, “Handwritten Character Recognition Using Gradient Features”, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 2.

[12] Kauleshwar Prasad, Devvrat C Nigam, Ashmika Lakhotiya, Dheeren Umre, “Character Recognition Using Matlab’s Neural Toolbox”, International Journal of u- and e- Service,Science and Technology.

[13] Ashutosh Aggarwal, Rajneesh Rani, Renu Dhir, “Handwritten Character Recognition Using Gradient Features”, International Journal of Advanced Research in Computer Science and Software Engineering.

[14] Dinesh Dileep, “A Feature Extraction Technique Based on Character Geometry for Character Recognition”, International Journal of Advanced Research in Computer Science and Software Engineering.

[15] Sheng Wang,” A Review of Gradient-Based and Edge-Based Feature Extraction Methods for Object Detection”, IEEE 11th International Conference.


Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: