PCs& Hardware GPT-4o’s Chinese token-training data is polluted by spam and porn websitesBy zppiot05/17/20240 The new tokenizer has 200,000 tokens in total, and about 25% of the tokens are in non-English languages, says Deedy…