A few months after the release of ChatGPT, Chinese companies released a flurry of LLM, with the number released by Chinese companies in 2023 already exceeding 130.
OpenAI was able to achieve technological breakthroughs, similar to many companies in the field of technological innovation. They have excellent talent, massive financial support, years of continuous investment, and a firm commitment to their goals. For a long time before the release of ChatGPT, the industry and investment community were mainly sceptical about OpenAI, but this did not shake the company's direction. In 2023, almost everyone recognized the direction of LLM, believing that OpenAI had laid out the results, and what other companies needed to do was to catch up as soon as possible, continuously optimize, and ensure they could participate in the future.
Some attribute the past lack of China’s large-scale investment in LLM to uncertain outcomes. Now that it's confirmed, computational power, data, and talent can all receive increased investment. Chinese companies are good at engineering optimization and are expected to produce LLM products that can be practically applied very soon.
But is this really the case? For OpenAI, LLM has always been a definite direction, with most of OpenAI's funds spent on computational power. At that time, the price of Nvidia's A100 (AI-specific chip) was much lower than today. According to estimates by the third-party data agency SemiAnalysis, OpenAI used about 3617 HGX A100 servers, containing nearly 30,000 Nvidia GPUs. Just having GPUs was not enough; Microsoft, an investor, helped OpenAI build a customized computational power cluster for large models, which could further improve the efficiency of these GPUs. In terms of data, OpenAI has continuously invested in every link, from data collection, data labelling, data cleaning, and data organization to data optimization. Most people on the OpenAI team come from top research institutions or tech giants.
That is to say, with such strength and investment, OpenAI still took more than eight years to create the breakthrough product GPT-4, and it still has "hallucinations" (i.e., answering questions incorrectly or nonsensically).
Why could Chinese companies produce LLM that claims to rival GPT-4 in just a few months? Whose illusion is this?
In the second half of 2023, some LLMs were pointed out to be "repackaged", directly using foreign open-source large models. These models ranked high on some benchmarks for testing their capabilities, with many indicators close to GPT-4. Several industry insiders told Caijing reporters that the better the performance on the leaderboard, the higher the proportion of repackaging, with slight adjustments leading to worse performance.
"Repackaging" is just the tip of the iceberg of the current situation of China's large model industry. It reflects five problems in industrial development, which are interrelated and causal, and none of them can be solved independently. By today, the public enthusiasm for LLM has clearly declined, and in 2024, the problems of China's large model industry will be further exposed. But beneath the buzz and problems, large models have already played a valuable role in the industry.
1. Model: Original, Assembled, or Repackaged?
In November 2023, Jia Yangqing, a former Vice President of Technology and AI scientist at Alibaba posted an article claiming that a large model developed by a major domestic company used Meta's open-source model LLaMA, with only a few variable names changed. Jia stated that the renaming required them to do much work to adapt.
Previously, foreign developers had claimed that "01.AI," founded by Kai-Fu Lee, used LLaMA, only renaming two tensors. This led to industry scepticism that 01.AI was merely "repackaging." Later, Kai-Fu Lee and 01.AI responded, stating they had adopted an open-source architecture during the training process. The starting point was to test the model fully and conduct comparative experiments for a quick start, but their released Yi-34B and Yi-6B models were trained from scratch and underwent a lot of original optimization and breakthrough work.
Currently, domestic LLM falls into three categories: original LLM, repackaged foreign open-source large models, and assembled large models, which combine previous smaller models into what appears to be a "large model" with many parameters.
Among these, the number of original LLM is the smallest. Creating original large models requires strong technical accumulation and continuous high investment, which is risky because if the model does not have strong competitiveness, the massive investment would be wasted. The value of large models needs to be proven commercially. When the market already has sufficiently good foundational large models, other companies should explore new value points, such as applications of large models in different fields, or the middle layer, such as helping with large model training, data processing, and computational power services.
However, the current situation is that most participants are "scrambling" for so-called "original LLM," worried about the high risk, leading to a large number of repackaged and assembled LLM. Whether directly using open-source models or assembling models, as long as they comply with relevant standards, there's no problem. When it comes to monetarization, customers may not care much about originality as long as it's useful. In fact, many customers, due to lower costs, might prefer non-original technology.
The problem is that even with assembly and repackaging, there's a constant emphasis on "originality." To prove "originality," adjustments and modifications are needed, which can affect LLM's iterative capability and lead to internal consumption.
2. Computing Power: A Bottleneck or Just Unwilling to Buy?
One of the foundations of LLM is massive computing power, specifically advanced computing power. Hence, LLM is also referred to as a form of brute-force aesthetics. Nvidia's A100 was previously considered the most suitable for training large models, and recently Nvidia introduced an even more advanced computing chip, H100, but it has not yet been launched in the Chinese market.
A long-term partner of Nvidia told Caijing reporters that in 2023, the price of A100 had doubled. According to his understanding, the Chinese companies that bought A100s in bulk in 2023 were mainly large firms with business needs, including Alibaba, Tencent, ByteDance, Baidu, etc., with few startups among them. Some well-known large model startups would proactively request to establish strategic partnerships with him, using this to prove their investment in computing power to the outside world, the kind "that doesn't pay."
Despite the U.S. government's "export control rules," it's not impossible for Chinese companies to obtain Nvidia's computing power; currently, there are many ways to choose from. Apart from direct purchases, they can also buy through Nvidia's partners in China. GPUs are expensive, and the costs of deployment, operation, debugging, and use after purchase are all significant. A saying previously circulated in the industry was that many Chinese research institutions couldn't even afford the electricity bill for an A100.
A DGX server consisting of eight A100s has a maximum power of 6.5kW, meaning it needs 6.5 kilowatt-hours of electricity to run for an hour. It also requires cooling equipment that consumes about the same amount of electricity. Based on the average industrial electricity cost of 0.63 yuan per kWh, the electricity bill for running one server daily (24 hours) is about 200 yuan. If there are 1,000 servers, the daily electricity bill would be about 200,000 yuan.
Therefore, apart from large companies, it's difficult for startups to purchase and deploy GPUs on a large scale.
Keep reading with a 7-day free trial
Subscribe to Geopolitechs to keep reading this post and get 7 days of free access to the full post archives.