Last update: Wed May 5 10:32:11 CEST 2021
Expected number of human labels given pattern distance from class centroid
This graph shows the distribution of number of human labels expected in
given the distance of a word or character sample
from the corresponding class centroid.
At this point in time, Febr. 24, 2017,
a total of 257576 human-labeled/human-confirmed
images was harvested over the collections at that time. At a
pattern distance of 0.1 and below,
at least 50 human-based labels
can be expected. For a lifelong machine learning engine
such as Monk, the challenge is to attract the labelers to prospect samples
that help the
learning process to enter a snowball
avalanche of label collection.
For a general discussion of this topic, see:
Schomaker, L. (2021)
Lifelong learning for text retrieval and recognition in
historical handwritten document collections
Handwritten Historical Document Analysis,
Recognition, and Retrieval - State of the Art and Future Trends:
Series in Machine Perception and Artificial Intelligence.
Fischer, A., Liwicki, M. & Ingold, R. (Eds.),
World Scientific Publishing, Vol. 89. p. 221-248
van Oosten, J-P. (2021).
The snowball principle for handwritten word-image retrieval:
The importance of labelled data and humans in the loop.
[Dissertation, promotor L. Schomaker]
University of Groningen. https://doi.org/10.33612/diss.160750597