000 05268nam a22006255i 4500
001 978-3-031-55389-9
003 DE-He213
005 20240423130340.0
007 cr nn 008mamaa
008 240410s2024 sz | s |||| 0|eng d
020 _a9783031553899
_9978-3-031-55389-9
024 7 _a10.1007/978-3-031-55389-9
_2doi
050 4 _aQA75.5-76.95
072 7 _aUNH
_2bicssc
072 7 _aUND
_2bicssc
072 7 _aCOM030000
_2bisacsh
072 7 _aUNH
_2thema
072 7 _aUND
_2thema
082 0 4 _a025.04
_223
100 1 _aToselli, Alejandro Héctor.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
245 1 0 _aProbabilistic Indexing for Information Search and Retrieval in Large Collections of Handwritten Text Images
_h[electronic resource] /
_cby Alejandro Héctor Toselli, Joan Puigcerver, Enrique Vidal.
250 _a1st ed. 2024.
264 1 _aCham :
_bSpringer Nature Switzerland :
_bImprint: Springer,
_c2024.
300 _aXXXIV, 344 p.
_bonline resource.
336 _atext
_btxt
_2rdacontent
337 _acomputer
_bc
_2rdamedia
338 _aonline resource
_bcr
_2rdacarrier
347 _atext file
_bPDF
_2rda
490 1 _aThe Information Retrieval Series,
_x2730-6836 ;
_v49
505 0 _aPreface -- Acronyms -- Introduction -- State of the Art -- Probabilistic Indexing (PrIx) Framework -- Probabilistic Models for Handwritten Text -- Probabilistic Indexing for Fast and Effective Information Retrieval. - Empirical Validation of Probabilistic Indexing Methods. - Conclusion and Outlook -- Appendices.
520 _aThis book provides a comprehensive presentation of a recently introduced framework, named "probabilistic indexing" (PrIx), for searching text in large collections of document images and other related applications. It fosters the development of new search engines for effective information retrieval from manuscripts which, however, lack the electronic text (transcripts) that would typically be required for such search and retrieval tasks. The book is structured into 11 chapters and three appendices. The first two chapters briefly outline the necessary fundamentals and state of the art in pattern recognition, statistical decision theory, and handwritten text recognition. Chapter 3 presents approaches for indexing (as opposed to “spotting”) each region of a handwritten text image which is likely to contain a word. Next, Chapter 4 describes models adopted for handwritten text in images, namely hidden Markov models, convolutional and recurrent neural networks and language models, and provides full details of weighted finite-state transducer (WFST) concepts and methods, needed in further chapters of the book. Chapter 5 explains the set of techniques and algorithms developed to generate image probabilistic indexes which allow for fast search and retrieval of textual information in the indexed images. Chapter 6 then presents experimental evaluations of the proposed framework and algorithms on different traditional benchmark datasets and compares them with other approaches, while Chapter 7 reviews the most popular keyword-spotting approaches. Chapter 8 explains how PrIx can support classical free-text search tools, while Chapter 9 presents new methods that use PrIx not only for searching, but also to deal with text analytics and other related natural language processing and information extraction tasks. Chapter 10 shows how the proposed solutions can be used to effectively index very large collections of handwritten document images, before Chapter 11 eventually summarizes the book and suggests promising lines of future research. The appendices detail the necessary mathematical foundations for the work and presents details of the text image collections and datasets used in the experiments throughout the book. This book is written for researchers and (post-)graduate students in pattern recognition and information retrieval. It will also be of interest to people in areas like history, criminology, or psychology who need technical support to evaluate, understand or decode historical or contemporary handwritten text.
650 0 _aInformation storage and retrieval systems.
650 0 _aComputer science
_xMathematics.
650 0 _aMathematical statistics.
650 0 _aArtificial intelligence.
650 0 _aData mining.
650 1 4 _aInformation Storage and Retrieval.
650 2 4 _aProbability and Statistics in Computer Science.
650 2 4 _aArtificial Intelligence.
650 2 4 _aData Mining and Knowledge Discovery.
700 1 _aPuigcerver, Joan.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
700 1 _aVidal, Enrique.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
710 2 _aSpringerLink (Online service)
773 0 _tSpringer Nature eBook
776 0 8 _iPrinted edition:
_z9783031553882
776 0 8 _iPrinted edition:
_z9783031553905
776 0 8 _iPrinted edition:
_z9783031553912
830 0 _aThe Information Retrieval Series,
_x2730-6836 ;
_v49
856 4 0 _uhttps://doi.org/10.1007/978-3-031-55389-9
912 _aZDB-2-SCS
912 _aZDB-2-SXCS
942 _cSPRINGER
999 _c187618
_d187618