A tokenizer accepts a string as input, processes the string to break it into individual words, or tokens(perhaps discarding some characters like punctuation), and emits a token stream as output.tokenizer 는 입력으로 string을 받고, string을 개별 단어나 token 으로 나누고(아마도, 문장 부호 같은 몇 가지 문자는 버린다), token stream 을 출력한다.What is interesting is the algorithm that is used to identify words. The whitespace tokenizer simp..