Orthography and tokenization
- explicitly defined orthographic systems
- classified tokenization
Create a tokenizable corpus:
- a citable corpus
- an orthographic system
Need to import trait as well as implementation:
import edu.holycross.shot.mid.orthography._
import edu.holycross.shot.latin._
val tokenizable = TokenizableCorpus(chapter, Latin23Alphabet)
Two common activities in analyzing a corpus:
- Generate a word list
- Create a tokenized corpus:
tokenizable.wordList
tokenizable.tokenizedCorpus