PCs& Hardware OpenAI’s latest blunder shows the challenges facing Chinese AI modelsBy zppiot05/22/20240 In fact, among the few long Chinese tokens in GPT-4o that aren’t either pornography or gambling nonsense, two are “socialism…
PCs& Hardware The Download: GPT-4o’s polluted Chinese training data, and astronomy’s AI challengeBy zppiot05/20/20240 Soon after OpenAI released GPT-4o last Monday, some Chinese speakers started to notice that something seemed off about this newest…
PCs& Hardware GPT-4o’s Chinese token-training data is polluted by spam and porn websitesBy zppiot05/17/20240 The new tokenizer has 200,000 tokens in total, and about 25% of the tokens are in non-English languages, says Deedy…