Founder of Alibaba Cloud on AI, AI+ and AI Infrastructure
On September 5, during the opening forum of the 2024 Inclusion·外滩大会, Wang Jian(王坚), an academician of the Chinese Academy of Engineering, director of Zhijiang Lab, and founder of Alibaba Cloud, shared his thoughts on AI, AI+, and AI infrastructure.
Wang highlighted the evolving role of AI in technological innovation and industrial applications. He emphasized that AI has a long history but only a short period of practical significance in its current form. His discussion focused on the transformative impact of AI, moving from foundational AI technologies to AI+ (AI combined with specific industries) and the need for robust AI infrastructure to support these advancements.
He also compared the development of AI infrastructure to past technological revolutions, noting that just as the internet and mobile technologies created new infrastructures, AI is now at a similar juncture. He stressed the importance of cloud computing as a key part of this infrastructure and how it is supporting advancements in AI applications across industries.
Key points:
Today’s AI is completely different from the AI people discussed in the early 1980s. When you can’t create something better than ChatGPT, at least two factors are holding you back: first, your technology—its foundation, its model; and second, your understanding of the problem. The biggest constraint is whether you can truly identify the key issues in the field.
Once a technology evolves into infrastructure, it becomes the ultimate form of technological penetration. Throughout human history, the technologies that have had the longest-lasting impact are those that became infrastructure.
Data is the core component of this infrastructure. It’s not just an accessory to the model or computation. Only when data, models, and computing power combine into a complete infrastructure can we achieve the next exciting wave of innovation.
When you examine AI, AI+, and AI infrastructure, you realize that not only is technology revolutionizing, but so are mechanisms and infrastructure. The fact that these three revolutions are happening simultaneously is incredibly exciting.
Distinguished guests, I am very grateful for this opportunity to share some ideas, lessons, and experiences from the past few years, or even decades.
The full Speech:
Today, I have chosen three keywords: AI, AI+, and AI infrastructure.
These three keywords all revolve around one central term: AI. As Michael mentioned earlier, the concept of AI may vary among individuals—1,000 people may have 1,000 interpretations and ideas. But today, these three different aspects—AI, AI+, and AI infrastructure—have been brought together.
I often say, “Artificial intelligence has a long past but only a very short history.” This is a rather complex situation. What confuses me the most is that even today, we are still debating what AI really means. It’s something worth exploring.
Let me show you a chart that made me realize why AI has such a long past but only a short history. The red line on the chart indicates a period around the late 1940s to early 1950s, when Turing published his famous article titled *Intelligent Machines*.
I believe this marks the beginning of AI’s long past. If you trace the history of machine intelligence, you could go back several centuries. In Turing’s article, he discussed some fascinating topics. Interestingly, this article was published in a psychological and philosophical journal, marking the first time that the relationship between machines and intelligence was explored.
At that time, the term "computer" wasn’t fully established, and people still referred to it as "computing machinery." That’s also why the Association for Computing Machinery (ACM) got its name.
Though the term "computer" didn’t exist back then, Turing was the first to use the term "digital computer" in that article. The profound significance of this article still resonates today, making us reconsider many of our early concepts about AI.
Of course, the Dartmouth Conference is frequently mentioned in AI history. Michael also mentioned *Cybernetics* earlier, which left a deep impression on me. Had the Dartmouth Conference not been held, the ideas of those ten people might have been overshadowed by cybernetics.
We might still be referring to AI as "cybernetics" today, and in some ways, the term could have been more appropriate. But AI became the more popular term.
Why did I highlight Herbert Simon’s name? My personal understanding of AI began with Herbert Simon.
He was an incredible figure—a psychologist who participated in that conference and later won the Nobel Prize in Economics. He first visited China in 1972 as a representative of the ACM, and again in the early 1980s as part of the American Psychological Association.
At that time, I was in my third year of university, and Simon came to lecture on AI. Imagine being a third-year university student in China in the early 1980s and hearing someone tell you that AI would undergo a transformative change in the next ten years.
I was incredibly excited, but after waiting ten years, nothing much happened, so I went on with life.
However, many of the foundational concepts of AI were introduced during that time, including neural networks. I vividly remember a textbook from the late 1980s called *Parallel Distributed Processing* (PDP), which was entirely about neural network theory.
Back then, the theory involved two nodes per layer and only three layers—what we now call a very basic model compared to today’s vast networks.
So, one thing I want to emphasize is that the AI of today is entirely different from the AI people discussed in the early 1980s. Michael touched on this earlier, and I agree.
Why do I say AI has a very short history?
Returning to the chart I showed earlier, the red line points to 2017 when Google introduced the concept of the transformer.
That marks the point when AI re-entered the spotlight and started to have a significant impact on industries. I believe the AI before 2017 is vastly different from the AI we are discussing today. That’s why I say it only has a seven-year history.
Of course, this new history began with a single article, which we all know about. But I’d like to point out that none of the eight authors of that article are still at Google—though I heard one recently returned. Despite their many innovations, Google is no longer as involved.
There are some innovations from that article that have been overlooked today. For example, the concept of a "token" was introduced. It may not seem groundbreaking, but today, tokens are widely used to quantify commercial services. You can imagine how difficult it would be to have a solid industry if we didn’t have a clear system for pricing services. This concept also connects to the AI infrastructure topic I’ll discuss later.
Similarly, around the time that Google published that article, a company called OpenAI emerged. From my perspective, OpenAI made us rethink the mechanisms of innovation. This led to the release of GPT in 2022.
These two events—Google’s transformer and OpenAI’s emergence—are closely linked.
I often say Google is both incredibly capable and also lacking.
Google’s capability is evident, especially in China, where we talk about "0-to-1 innovation." Google succeeded 100% in achieving 0-to-1 innovation and even exceeded expectations.
However, why is Google also lacking? As Eric Schmidt, former CEO of Google, mentioned in a recent talk at Stanford University, Google has fallen short. They failed to create something as socially impactful as OpenAI.
This forces us to reconsider the mechanisms of innovation. Innovation isn’t just about having good ideas or achieving 0-to-1 breakthroughs. The processes and mechanisms involved far exceed what scholars or even the industry currently understand.
This is our biggest challenge, and that’s why I say Google is both capable and lacking.
Behind ChatGPT, its brilliance is evident to the general public, but for industry insiders, it hides many things. Everyone knows about AlphaFold, especially with the release of AlphaFold 3.
However, not many realize that behind AlphaFold is a combination of transformers and diffusion models. While people talk about using transformers and diffusion to generate images or videos that visually please the public, few understand how fundamental the transformer model is.
So, why do I say AI only has a seven-year history?
Looking back, we’re living under the shadow—or perhaps in the bright light—of transformers. That’s why I often reflect on the fact that when the government’s work report mentioned AI and AI+ last year, it was in the context of this transformer logic.
When we talk about AI+, we often think about simply adding AI to an industry. However, from my perspective, this is the most mundane way to think about AI+. It’s important to reconsider what AI+ really means.
So when we revisit GPT or the things we’re discussing today, I think it requires a rethink. ChatGPT, within the logic of AI+, is not merely an application; it’s a platform.
Just like Office in the previous era wasn’t just an application but an application platform. If we break down GPT, as I mentioned before, the core model serves as the foundation, and "chat" is an application. So ChatGPT is essentially GPT plus chat—that’s how I understand it.
But let me emphasize that chat is not just a simple application scenario.
Everyone knows that during Microsoft’s collaboration with OpenAI, they didn’t just develop ChatGPT. They discussed many potential use cases for GPT, but only ChatGPT was revolutionary enough to become a product.
They created a lot of useful but non-revolutionary tools, which later became books. I often joke that non-revolutionary things get written into books, while revolutionary things become products. That’s what we are witnessing today.
No one understands chat better than OpenAI.
So today, I still want to say that if you can’t create something better than ChatGPT, at least two factors are holding you back. First is your technology, the foundation, the model. Second is your depth of understanding of the problem. Whether you can truly identify the core issue in a given field is the biggest constraint. Often, we misunderstand, thinking that having GPT means we can solve everything.
Of course, my main focus today is the "+" in AI+. This "+" reminds me of when chat was first developed, it was merely a reflection of Bill Gates’ original vision—to make computers capable of listening and speaking. Today, with ChatGPT, that vision has materialized, and the computer has essentially become a mobile phone in terms of capability.
The real mechanism behind this "+" is ChatGPT. When we talk about "adding," it’s not just about adding something; it’s about how we add it. The more crucial factor is a mechanism of innovation. This may sound abstract, but let’s consider ChatGPT: what does it mean? The "addition" here is OpenAI itself. Without OpenAI, GPT and chat would not have become the product that has had such a profound impact.
So why is OpenAI’s structure an innovation in itself?
To this day, everyone knows OpenAI is something of a hybrid—there’s the nonprofit OpenAI, and then there’s OpenAI LP. You can imagine how odd it is that a nonprofit organization and a commercial entity coexist within the same body. So I believe that everything that has happened with OpenAI, what people usually refer to as OpenAI, is really OpenAI LP.
But when you realize that OpenAI originally started as a nonprofit organization, you can see just how complex its internal mechanisms must be. I often tell investors that you couldn’t fund a company like OpenAI through traditional methods.
The success of OpenAI also makes us rethink innovation models. For instance, Jensen Huang said that ChatGPT is the "iPhone moment" of artificial intelligence.
This statement has been quoted widely, and I was excited when I first heard it. But after thinking about it, I wasn’t sure what it actually meant. Why? Because no one has fully clarified what ChatGPT is, what artificial intelligence is, or what the iPhone represents. It just puts three unclear things together into a single sentence, which confused me for a long time.
But we shouldn’t think it’s difficult to explain what the iPhone is. I can point to one example: when people talk about the iPhone, they often mention the App Store as one of its most important features.
Today, everyone talks about how crucial ecosystems are, but few realize that when Steve Jobs released the first generation of the iPhone, there was no App Store. And if we look at its ecosystem, many of the companies that partnered with Steve Jobs to release the first iPhone are no longer around today. So what is the iPhone, really? It’s worth deep reflection.
I think Jensen’s statement borrows from another one I found particularly powerful. When AlphaFold 2 came out, someone said it was the "ImageNet moment" of biology. I believe that accurately captures the fundamental developments in technology.
This brings us back to that famous paper by Geoffrey Hinton and his two students.
Anyone working in machine learning or image recognition must be familiar with this paper. If you abstract it, it introduces three concepts that form the foundation of today’s AI conversations: ImageNet (organized data), a model (CNN), and GPU processing. This paper was the first to perfectly combine these three elements.
Although none of these components were new at the time—ImageNet had been around for years, CNN wasn’t a new algorithm, and GPUs were found in every internet café—the paper combined them in a way that set the new standard for the industry. After this paper, GPUs became the standard not only in academia but in industry as well.
They used two ordinary GPU cards. Despite being consumer-grade, these two cards surpassed the computational power of tens of thousands of CPU cores. And these same cards were available in every internet café in China. This marked a monumental shift, showing that while computing power is important, human creativity during the innovation stage is even more crucial.
So why do we now talk about infrastructure? It’s due to scale.
As data, models, and computing power scale dramatically, new mechanisms must be introduced to manage that scale. Those working in IT or programming will recognize this concept.
Pascal’s inventor once said, "A baby’s speed multiplied by 1,000 is a jet plane."
This means that in our world, when something’s scale increases by 1,000 times, it leads to revolutionary changes. As we’ve seen, the scale of data, models, and computation has increased by over 1,000 times in each category. That’s why today, we can’t avoid the most fundamental concept: AI infrastructure.
When a technology becomes infrastructure, it represents the ultimate form of technological penetration. Throughout human history, the technologies that have had the most long-lasting impact are those that became part of the infrastructure.
AI infrastructure wasn’t my invention; it’s something everyone is talking about now. So why is it worth deeply considering the progression from AI to AI+ to AI infrastructure? Allow me to quickly explain. Here is a slide from Sequoia’s seminar, which I’ve borrowed for this presentation. I only brought it to highlight the bottom row, which they call infrastructure.
Looking at the cloud era, the mobile era, and the AI era, they classify cloud computing as the infrastructure. Interestingly, they also classify Apple as part of this infrastructure. Similarly, they classify Nvidia as infrastructure today. It’s an interesting way to categorize things. And it makes sense, as many people think Nvidia should focus on cloud computing.
As someone involved in cloud computing, I find this chart very exciting. I didn’t create it, but I’ll explain it in my own way. It shows six AI unicorns in the U.S., and their underlying infrastructure support is fascinating.
For instance, OpenAI, which received a $10 billion investment, is backed by Microsoft. The second company is backed by AWS. What’s remarkable is that the companies behind these unicorns are the world’s top-ranked cloud computing providers—first, second, third, fifth, and sixth globally.
Interestingly, the fourth-ranked provider is missing. That’s Alibaba Cloud. This highlights not only the importance of infrastructure to these companies but also the gap between industries.
This reminds me of something else: I believe Microsoft is both "unimpressive" and "impressive" at the same time. On the one hand, it didn’t create something like the transformer model in AI. But on the other hand, through cloud computing and infrastructure, it played a pivotal role in creating what we now see with OpenAI. So, in another sense, while Microsoft may seem unimpressive, it’s still very impressive.
In the realm of AI, AI+, and AI infrastructure, everyone has the potential to create history.
Recently, I saw a startup that, in an effort to prove its importance, created a diagram. I found it interesting. We often talk about data, computing power, and algorithms. But imagine if these elements weren’t part of an infrastructure—they would hold no value.
In this diagram, it becomes clear that data is a core component of infrastructure. Data isn’t just an accessory to models or computation. Only when all these elements form a complete infrastructure can we achieve truly groundbreaking innovation.
If you look closely at the diagram, it distinguishes between cloud computing in the traditional IT era and cloud computing in the AI era. Although the two types of computing differ, both are forms of cloud computing. Similarly, it distinguishes between traditional data and AI-specific data, highlighting these subtle differences. I won’t go into further detail here.
In conclusion: When you consider AI, AI+, and AI infrastructure, you’ll realize that not only is technology undergoing a revolution, but so are mechanisms and infrastructure. These three revolutions happening simultaneously are incredibly exciting. I believe these revolutions are shaping the future.
Thank you all.
Source: https://mp.weixin.qq.com/s/P8e5_BuZyOq53I2OrHjDaQ