Kai-Fu Lee‘s latest remarks on LLM and China's AI industry
During his participation in the Beijing Academy of Artificial Intelligence Conference, Kai-Fu Lee gave an interview to Tencent Technology.
In the interview, Lee presented his serious reflections on the realities of entrepreneurship in large models: the capabilities of the models are fundamental skills that must be honed through hard work, and only authoritative and reliable benchmarks can showcase their true strength. He pointed out that driving growth through heavy investment in promotion requires strong product capabilities and retention rates; otherwise, it's just "wasting money." He mentioned that the AI assistant sector for large models in China is far from reaching its explosion point, with very low user penetration and high user education costs. Despite the numerous AI assistant products available, too many users still treat them merely as "search engines."
He also emphasized, "In the past, many innovations like mobile payments and short videos quickly took off in China, and then the U.S. followed. Why is it the other way around this time? Our most important task now is to rapidly advance market education to foster the healthy development of the entire large model ecosystem."
Tencent Technology: How can we objectively evaluate the strength of a large model, and what does the capability of a large model mean for the future development of companies using these models?
Kai-Fu Lee: When it comes to evaluating large models based on benchmarks, I don't believe every company's numbers are entirely trustworthy. There are a few ways to assess them objectively. Firstly, you can try using them yourself, integrate the APIs, and compare at least two or three different models. This method is foolproof but time-consuming. And if there are 20 or 30 models, it's impractical to test each one individually.
I recommend referring to a reputable third-party platform. For instance, Chatbot Arena allows millions of users to "blind test" models, which I believe is currently the fairest method. Besides Berkeley's LMSYS Chatbot Arena, there's also Stanford's AlpacaEval, an LLM-based fully automated evaluation benchmark, using machines rather than humans for testing.
So, I suggest selecting the most cost-effective models from these two platforms and then conducting your own tests. Third-party platforms ensure that models haven't "prepped" for the test, involve many real users, and employ scientific methods. They match the top models against each other, similar to how chess and Go are scored, ensuring high granularity and credibility in the results.
Yi-Large, a model with a trillion parameters developed by Zero One Infinity, participated in these two authoritative evaluations and achieved leading international results. Especially in the evaluation published by LMSYS on May 21, Yi-Large ranked first among Chinese large models. In the company rankings, Zero One Infinity is just behind the Silicon Valley giants OpenAI, Google, and Anthropic, making it the only Chinese company in the global top tier.
Many domestic companies claim to have beaten the best models from Google, OpenAI, and Anthropic. I suggest that before making such claims, they should have their models tested on these two "large model arenas" to ensure credibility.
Tencent Technology: Professional or enterprise users can objectively perceive the strength of a model through hands-on testing and API integration. However, C-end users may not notice the differences as clearly through AI personal assistant products. How should they choose?
Kai-Fu Lee: You're absolutely right. Among the many AI assistants in China, we recognize the models of some assistants but are not satisfied with their user experience, while for others, we are satisfied with the user experience but not with the model. There is naturally a correlation between the two. If the model itself is poor in quality, it is difficult to compensate through other means. However, starting with a fundamentally adequate model, product experience can be significantly enhanced through engineering methods. This includes the interaction process, conversation style, response formatting, and creating aesthetically pleasing charts to make responses more friendly and engaging, thereby winning users' favour.
Additionally, RAG (Retrieval-Augmented Generation) can enhance the experience. RAG uses more information databases and real-time information to compensate for the model's shortcomings. This can not only provide the latest corpus or news that large models may lack but also address factual issues and mitigate hallucination problems to some extent.
RAG technology is very practical. In March, Zero One Infinity launched a new vector database, Descartes, based on a comprehensive navigation map, providing an efficient retrieval mechanism for RAG. It can determine user intent within 0.1 seconds, quickly retrieve information, and provide high-quality feedback. This technology is also applied to Zero One Infinity's AI "special assistant" Wanzhi. In addition to real-time internet access and integration for knowledge Q&A scenarios, providing users with the latest data and insights, the combination of Yi-Large's ultra-long context window and the leading RAG solution creates Wanzhi's "5000-page document speed reading" capability. The 600,000-word English novel "Elon Musk" once crashed many AI assistants, but Wanzhi can easily interpret it.
Some companies excel in RAG. When you ask about news facts, their responses are very accurate, but it is not their large model answering.
In summary, I believe every user has their preferred assistant. We have launched the "Wanzhi" assistant and have done well in terms of user experience, but we will continue to improve.
Tencent Technology: Why do large model companies choose to launch personal assistants for the C-end?
Kai-Fu Lee: I'm not sure why others do it, but our reason for creating a personal AI assistant is that China currently faces a significant challenge: widespread use of large models hasn't yet been achieved. Using such a ChatBot can help everyone realize how useful and intelligent it is, gradually educating the market. With this foundation, we can develop productivity tools, games, and various TOC and TOB applications.
Today, the total DAU of all large models in China might be in the millions, which is very small compared to the daily use of each application. This indicates that market education is far from complete. The "ChatGPT moment" occurred in December of the previous year, rapidly gaining 100 million users in two months. Such a phenomenal event has not yet happened in China.
Some Chinese assistants are quite good and can be compared to ChatGPT at that time. ChatGPT ignited the US market, making entrepreneurship, sales, and acceptance by large companies easier after the market was educated. Our most important task now is to accelerate market education to promote the healthy development of the entire large model ecosystem.
This is a goal that all peers should push for together. If this goal is not achieved, advancing TOC and TOB will be challenging.
Tencent Technology: Since China's best tools can rival ChatGPT of that time, why hasn't it ignited the market like in the US?
Kai-Fu Lee: This is worth exploring. Many things, like mobile payments and short videos, quickly exploded in China before the US caught on. Why is it different this time? I'm not sure of the answer, but it might be because ChatGPT in the US was a unique product at the time, unprecedented, with extensive media coverage, which ignited the market and gave OpenAI a lot of cheap traffic.
Today in China, several companies are doing well, but they haven't ignited the market. Therefore, I think market education is an urgent priority.
China's large model tools have only reached millions of DAUs after this long, so we must reflect on this issue. Simply spending money to buy traffic is not effective.
Today, Chinese AI assistants are not underfunded but suffer from poor retention. Why is retention poor? Because there are too many competing products, and users don't find them unique, which is a significant reason.
Another reason is that when market education is insufficient, a user might treat such a ChatBot like a search engine when first encountering it. But a ChatBot might not answer as well as a search engine. For example, asking about today's weather or the largest city in a province is something search engines have perfected over years.
Therefore, when users treat a smart assistant capable of writing essays, analyzing scenarios, and creating presentations as a search engine, its full potential is wasted.
We urge users to use AI assistants as assistants, not search engines.
Tencent Technology: Another issue might be the high usage threshold since different prompts given to the assistant can yield very different results.
Kai-Fu Lee: Yes, treating it as a search engine is one problem. Another is not knowing how to ask questions properly. For example, asking for help writing a speech about AI without giving details will result in a poorly written piece.
You need to provide many details, which is part of prompt engineering and market education. I have made many short videos to help users understand that if the results are poor, it's likely because they need to ask better questions.
In the era of large models, the most powerful person isn't the one who can write the best content but the one who can ask the best questions. Pairing a skilled questioner with a top assistant is far more powerful than the best content generator.
Tencent Technology: Given that retention is insufficient, how do you view the current phenomenon of AI applications being heavily promoted?
Kai-Fu Lee: The answer hasn't changed in the past 20 years: you can start spending on user acquisition once your product achieves Product Market Fit (PMF) and sufficient user retention. This is because acquired users will convert to retained users, who might monetize in various ways later. All business models follow a funnel model. You attract many users to try your product, some of whom will become regular users, and a portion of these will pay. The lifetime value (LTV) of these users can then be calculated.
For example, if it costs 10 units of currency to acquire a user, with a 40% retention rate and 10% of those retained users paying 100 units, the money spent on acquisition will eventually return because some users will stay and convert to paying users.
This balancing act is crucial for anyone involved in user growth and product management. Today, some smart tools are heavily promoted, but whether their ROI is reasonable and sustainable remains to be seen. If your product attracts users who leave within a week or two, you're constantly refilling a leaky pool.
Therefore, we need to reflect on why this happens. Firstly, user perception: users might not find AI assistants impressive because multiple companies can do it, making it seem cheap and free. This market education issue can't be solved overnight.
Secondly, users treating it like a search engine and finding it inferior will lead to inevitable churn. This is a core issue.
Further down, it might be that the product features aren't strong enough, the model isn't good enough, the user experience isn't clear, and there hasn't been a breakthrough scenario to ignite user demand.
At Zero One Infinity, we chose to continue developing "model-driven integration," iterating top models, and refining products until we see a TC-PMF (technical cost x product market fit), indicating that the investment is constructive.
Tencent Technology: As the Chief Experience Officer of Wanz
hi, do you have any interesting stories to share?
Kai-Fu Lee: Friends often present me with challenges. One friend had an argument with his wife and needed to write an apology letter. I helped by incorporating their issues into the prompt, producing a touching letter.
He made some modifications because large models don't include personal details, generating the most probable text based on history. Combining human input with the model resulted in a heartfelt letter that made his wife cry, and they reconciled.
Tencent Technology: How do you provide product suggestions after using it? Is product development in the era of large models different from the mobile internet era, for instance, in fixing bugs?
Kai-Fu Lee: Yes, very different. In non-large model products, you fix a bug directly in the code. With large models, you can't instruct the model to respond differently next time due to its technical characteristics. You need to collect a lot of data and retrain or fine-tune the model.
Though each update may not fix every issue, the repair rate in the large model era can expand rapidly. Traditional apps with 100,000 bugs are challenging to fix, but large models can input these bugs and potentially solve 80,000 by the next day.
Tencent Technology: Finally, the media often describe you as the oldest entrepreneur in the AI industry. Does your team's respect for your experience lead them to follow you without question? What if you make a mistake?
Kai-Fu Lee: As a CEO, it's crucial to have self-awareness, knowing your strengths and areas for improvement. When leading a team, if you have expertise in a certain area or clear insights into company strategy, sales, product features, or technology, you should guide the team accordingly.
Given the competitive market, we can't afford internal friction, so decisiveness is necessary. However, making decisions blindly in areas you're unfamiliar with is worse than inaction.
I'm aware of my strengths and focus on critical issues that only I can solve, as no one else can replace me in these areas. These issues occupy about 80% of my time, while I delegate less critical tasks to others. Clear delegation is essential to avoid confusion within the company. Strategic adjustments should be infrequent and well-explained.
I believe I'm not only the oldest entrepreneur but also the most experienced, having witnessed various successes and failures and having the most comprehensive understanding. I know my strengths and manage the company flexibly, adapting leadership styles to specific situations and environments.
For instance, although I'm an AI expert, I won't interfere with the team's algorithm choices. I'll suggest papers to reference if necessary, but trust their professional judgment.
In product matters, as Chief Experience Officer, I have more input but ensure decisions are made by the product manager. The CEO's opinions are valued, so I must downplay my influence to avoid overstepping.
In company strategy, I make the final decisions after considering everyone's input. Major personnel appointments, development directions, resource allocation, and overall planning fall within my responsibilities. I need to focus on areas where I excel and delegate other decisions to professionals, ensuring clear communication of strategies and empowering responsible individuals.