Binarization of language models

Davydova Daria Nikolaevna

Binarization of language models

Davydova D. N.

Davydova Daria Nikolaevna

graduate student, Lomonosov Moscow State University, Faculty of Mechanics and Mathematics, Chair of Mathematical Theory of Intellectual Systems

Published: 2025, vol. 29, issue 3, pp. 119–145

Download full text (In Russian)

Abstract

Large language models are widely used in the field of natural language processing. However, despite their high eficiency, the application of large language models becomes dificult due to their high computational and memory costs. One of the ways to solve this problem is neural network quantization, that is, converting the weights and activations of the network to a representation with lower bit-width. A special case of quantization is binarization, which is the compression of network parameters to a bit-width of 1 bit. In this paper, the structure of binary neural networks is considered, an overview of current methods of language model binarization is provided, and the results obtained are described.

Keywords: natural language processing, binary neural networks, binarization, quantization, large language models.

BibTeX

@article{IS-Davydova2025,
  author  = {Davydova, Daria Nikolaevna},
  title   = {{Binarization of language models}},
  journal = {Intelligent Systems. Theory and Applications},
  year    = {2025},
  volume  = {29},
  number  = {3},
  pages   = {119--145},
}

AMSBIB

\Bibitem{IS-Davydova2025}
\by D.\,N.~Davydova
\paper Binarization of language models
\jour Intelligent Systems. Theory and Applications
\yr 2025
\vol 29
\issue 3
\pages 119--145
\lang In Russian

Published under Creative Commons Attribution 4.0 International (CC BY 4.0)

← Back to issue