This paper focuses on processing of direct speech in Belarusian electronic texts for the purpose of audiobook creation. Usually, for creation of an audiobook, synthesis with only one voice is used. It gives us perspective on the likelihood of making text-to-speech synthesis many-voiced, thus making audiobooks more approximate to representation of characters’ unique speech features.