PCs& Hardware The Download: GPT-4o’s polluted Chinese training data, and astronomy’s AI challengeBy zppiot05/20/20240 Soon after OpenAI released GPT-4o last Monday, some Chinese speakers started to notice that something seemed off about this newest…
PCs& Hardware GPT-4o’s Chinese token-training data is polluted by spam and porn websitesBy zppiot05/17/20240 The new tokenizer has 200,000 tokens in total, and about 25% of the tokens are in non-English languages, says Deedy…