In this paper, we propose a new theoretical framework for the data hiding problem of digital and printed text documents. We explain how this problem can be seen as an instance of the well known Gel¡¯fand-Pinsker problem.The main idea for this interpretation is to consider a text character as a data structure consisting of multiple quantifiable features such as shape, position, orientation, size, color, etc. We also introduce color quantization,a new semi-fragile text data-hiding method that is fully automatable, has high information embedding rate, and can be applied to both digital and printed text documents.