Intellektual'nye Sistemy.
Teoriya i Prilozheniya
(Intelligent Systems.
Theory and Applications)

Addressing the Training Challenges of Siamese Networks for Optical Character Recognition

Abstract

The development of deep learning models for classification tasks poses particular challenges when dealing with a large number of classes under conditions of limited data and computational resources. Metric learning offers a promising approach to solving this problem, although its effectiveness is often limited by inherent shortcomings of standard loss functions, such as contrastive and triplet losses, as well as suboptimal training sample selection strategies. This paper presents solutions to these issues: a novel autoprobabilistic mining method for example selection and an improved metric loss function. The proposed autoprobabilistic mining method aids in selecting the most informative example pairs for training Siamese neural networks. In combination with a previously developed autoclustering method, this approach enhances training efficiency by maximizing data utility while minimizing computational costs. Additionally, a new triplet-based metric loss function is introduced, which accounts for cluster-specific characteristics and is designed to overcome specific drawbacks of traditional contrastive and triplet loss functions, thereby improving the process of feature embedding formation. The effectiveness of the proposed methods was validated through experiments on optical character recognition using the PHD08 (Korean alphabet) and Omniglot datasets. For the full Korean alphabet in the PHD08 dataset, the novel loss function with random mining achieved a classification accuracy of \(82.6\%\), establishing a new benchmark. Using a reduced alphabet, a baseline of \(88.6\%\) was set on PHD08. The application of the auto-probabilistic mining method alone improved accuracy to \(90.6\%\) (\(+2.0\%\)), and its combination with auto-clustering further increased it to \(92.3\%\) (\(+3.7\%\)). On the Omniglot dataset, the proposed mining method attained \(92.32\%\), which rose to \(93.17\%\) when coupled with auto-clustering. The results demonstrate that the proposed loss function and mining strategy offer a robust and effective solution for complex pattern recognition tasks, especially in scenarios characterized by a high number of classes and resource limitations.

Keywords: deep metric learning, optical character recognition, siamese neural networks, pattern recognition

BibTeX
@article{IS-Mokin2026,
  author  = {Mokin, Arseniy Kirillovich},
  title   = {{Addressing the Training Challenges of Siamese Networks for Optical Character Recognition}},
  journal = {Intelligent Systems. Theory and Applications},
  year    = {2026},
  volume  = {30},
  number  = {1},
  pages   = {101--123},
}
AMSBIB
\Bibitem{IS-Mokin2026}
\by A.\,K.~Mokin
\paper Addressing the Training Challenges of Siamese Networks for Optical Character Recognition
\jour Intelligent Systems. Theory and Applications
\yr 2026
\vol 30
\issue 1
\pages 101--123
\lang In Russian
Published under Creative Commons Attribution 4.0 International (CC BY 4.0)

← Back to issue

× Issue cover