As long as you can speak, you can be a developer.
Written by Lian Ran
Editor: Zheng Xuan
Source: Geek Park
“Natural language will become the next generation programming language, and everyone can become a developer.”
On April 16, Create 2024 Baidu AI Developer Conference was held in Shenzhen. Baidu founder, chairman and CEO Robin Li delivered a keynote speech entitled “Everyone is a Developer”, describing a world no longer limited to coding skills, but an era in which everyone can participate in creation with natural language as the medium.
The release of Baidu’s Wenxin Big Model 4.0 tool version has made this vision described by Robin Li a big step closer to reality. This tool not only allows developers to process complex data and files through natural language interaction, but also generates charts or files to quickly gain insights into the characteristics of the data, analyze changing trends, and provide efficient and accurate support for subsequent decision-making.
At the same time, Robin Li released three major development tools in his keynote speech, including the intelligent agent development tool AgentBuilder, the AI native application development tool AppBuilder, and the model customization tool ModelBuilder. These tools have greatly lowered the development threshold, allowing developers to create powerful, easy-to-use and convenient AI applications in just a few simple steps.
Image source: Baidu
Robin Li pointed out that “intelligent agents may be the most mainstream way of using large models that are closest to everyone in the future. Based on a powerful basic model, intelligent agents can be generated in batches and applied in various scenarios. Baidu has just upgraded the Wenxin intelligent agent platform. So far, more than 30,000 intelligent agents have been created, more than 50,000 developers and tens of thousands of companies have settled in.”
In Li Yanhong’s view, developing applications in the future will be as easy as shooting short videos. Everyone can become a developer and a creator. “Today, you can make an application without coding, and you can make an intelligent agent without programming. AI is creating a creativity revolution. In the future, developing applications will be as easy as shooting short videos. Everyone can be a developer and a creator.”
In his speech, Robin Li shared the specific ideas and tools that Baidu has developed in the past year to develop AI native applications. He emphasized: “Large language models themselves do not directly create value. Only AI applications developed based on large models can meet real market needs. Today I would like to share with you some specific ideas and tools for developing AI native applications based on large models. This is what we at Baidu have gained through the past year of practice, stepping on countless pitfalls, and paying a high tuition fee.”
The following is the transcript of Li Yanhong’s keynote speech “Everyone is a Developer”, compiled by Geek Park.
Hello everyone, welcome to Create 2024 Baidu AI Developer Conference. This is the first time that Create Conference is held in the Guangdong-Hong Kong-Macao Greater Bay Area. More than 5,000 developers and technology enthusiasts came to the scene today.
In the past year, I have communicated with many entrepreneurs and developers, and I feel that everyone is in a state of “FOMO”, that is, Fear of Missing Out, excited but also afraid of missing out. Indeed, big models and generative AI will completely change the developer community.
In the past, developers changed the world with code; in the future, natural language will become the new universal programming language. As long as you can speak, you can become a developer and change the world with your creativity.
This day is not far away. We have seen that developers’ productivity has been greatly improved because of the powerful basic model and many low-threshold or even zero-threshold development tools.
For example, Comate, an intelligent code assistant based on the Wenxin model, not only supports more than 100 languages and all mainstream IDE platforms, but can also recommend code, generate code comments, find code defects, and provide optimization solutions. It can also deeply interpret the code base and associate private domain knowledge to generate new code. After more than a year in office, Comate has entered tens of thousands of companies such as Himalaya, Mitsubishi Elevator, and Softcom, and the adoption rate of the generated code has reached 46%. Among the new codes added by Baidu every day, 27% are generated by Comate.
Today, you can create an AI application without knowing how to write code, and you can create an intelligent agent without programming. AI is creating a revolution in creativity. In the future, developing applications will be as easy as shooting a short video. Everyone is a developer and a creator.
As a technology company, Baidu’s role is to provide everyone with the necessary development tools as much as possible and continuously improve the creativity of the entire society. Specifically, we provide a powerful basic model series, the Wenxin large model series, which includes the flagship version of ERNIE3.5, ERNIE4.0, as well as the lightweight version of ERNIE Speed, Lite, Tiny, etc.
We also provide tools for developing various applications based on large models, including AgentBuilder, an intelligent agent development tool, AppBuilder, an AI native application development tool, and ModelBuilder, a model customization tool of various sizes. These three tools all represent advanced productivity. I will show you one by one below.
First, let me talk about the latest progress of Wenxin Yiyan and Wenxin Grand Model:
Wenxin Yiyan was released on March 16 last year, and it has been one year and one month to date. Our number of users has exceeded 200 million, the average daily API call volume has also exceeded 200 million, the number of customers served has reached 85,000, and the number of AI native applications developed using the Qianfan platform has exceeded 190,000.
Let’s take a look at what everyone is doing with Wenxin Yiyan?
The real stories in the video are just the tip of the iceberg. We can see that Wen Xin Yi Yan is changing the work and life of more people.
The foundation model that supports Wenxin Yiyan is the Wenxin Big Model. In the past year, it has evolved from version 3.0 to 3.5 and then to version 4.0. Wenxin 4.0 has reached the industry-leading level in the four major capabilities of understanding, generation, logic, and memory.
In recent months, Wenxin Big Model has achieved further significant improvements in general capabilities such as code generation, code interpretation, and code optimization, reaching international leading levels.
Today, we officially released the tool version of Wenxin Big Model 4.0. Now, you can experience the code interpreter function on the tool version. Through natural language interaction, you can process and analyze complex data and files, and generate charts or files. You can quickly gain insights into the characteristics of the data, analyze changing trends, and provide efficient and accurate support for subsequent decision-making.
The Wenxin model has become China’s most advanced and widely used AI basic model.
Not only that, compared with a year ago, the algorithm training efficiency of the Wenxin big model has increased to 5.1 times the original, the average weekly training efficiency has reached 98.8%, the reasoning performance has increased by 105 times, and the cost of reasoning has been reduced to 1% of the original.
That is to say, if a customer used to call the service 10,000 times a day, they can now call it 1 million times at the same cost. The media may not be excited about a 99% reduction in cost. However, once enterprises or developers use the service, they are most concerned about the effect and cost.
We can reduce the inference cost to 1% while improving performance because Baidu has a full-stack layout in the four-layer architecture of chips, frameworks, models, and applications. Through end-to-end optimization, we continue to reduce costs, allowing more people to use large models to perform AI applications efficiently and at a low cost.
There is no doubt that topics related to big models will still be hot in 2024, and various technological breakthroughs will continue to emerge. The media will continue to be keen on using headlines such as “Shocking Release” and “Epic Update” to exaggerate. But I want to emphasize that big models themselves do not directly create value. Only AI applications developed based on big models can meet real market needs.
Today I want to share with you some specific ideas and tools for developing AI native applications based on large models. This is what we at Baidu have learned through our practice over the past year, stepping on countless pitfalls and paying a high tuition fee.
The first is MoE. In the future, large-scale AI native applications will basically be based on MoE architecture. The MoE mentioned here is not a general academic concept, but a mix of large and small models, not relying on one model to solve all problems. But when to call a small model, when to call a large model, and when not to call a model are all technically demanding and need to be matched according to different application scenarios.
The second is the small model. The small model has low inference cost and fast response speed. In some specific scenarios, the small model fine-tuned by SFT can be comparable to the large model. This is why we released three lightweight models: Speed, Lite, and Tiny. We compress and distill a basic model through the large model, and then use the data to train it. This is much better than training a small model from scratch, and is better than the model trained based on the open source model, faster, and cheaper.
The third is intelligent agents. Intelligent agents are a hot topic at the moment. As the capabilities of intelligent agents improve, a large number of new applications will continue to emerge. Intelligent agent mechanisms, including understanding, planning, reflection, and evolution, allow machines to think and act like humans, complete complex tasks autonomously, and continuously learn, self-iterate, and evolve in the environment. In some complex systems, we can also allow different intelligent agents to interact and collaborate with each other to complete tasks with higher quality. We have already developed these intelligent agent capabilities and are fully open to developers.
Baidu has already provided you with “out-of-the-box” tools in the three directions of MoE, small models, and intelligent agents. Below, I will introduce you to three different tools: AgentBuilder, an intelligent agent development tool, AppBuilder, an AI native application development tool, and ModelBuilder, a model customization tool of various sizes.
The first is the agent development tool AgentBuilder. Agents may be the most mainstream way to use large models that are closest to everyone in the future. Based on a powerful basic model, agents can be generated in batches and applied in a variety of scenarios.
Baidu has just upgraded the Wenxin intelligent agent platform. So far, more than 30,000 intelligent agents have been created, more than 50,000 developers and tens of thousands of companies have settled in. Our goal is to make everyone and every organization a developer of intelligent agents and build the most complete intelligent agent ecosystem in China.
How do we achieve this goal? We provide AgentBuilder, a zero-threshold agent development tool.
Let’s take the Singapore Tourism Board as an example to see how an intelligent agent is created.
First, we open the Wenxin Intelligent Agent Platform. The creation page has two modes: zero-code and low-code. Newbies can directly choose the “zero-code mode” and use natural language to create an intelligent agent in just a few sentences.
We first named the intelligent agent “Singapore Tourism Board”, and then wrote in the settings that it needs to create tourism plans, answer questions, and provide hotel ticket booking services. These settings are used to guide the intelligent agent and tell it what it can do.
If only a basic agent is needed, the platform will automatically fill it out. But we hope that the “Singapore Tourism Board” is a professional agent, so advanced configuration is required. I can add the Singapore encyclopedia entry and official website link to the knowledge base and update it every day. Then add some tools, such as hotel inquiries and attraction ticket purchases, to enhance its service capabilities. We have already cooperated with Ctrip to provide tourism service tools such as hotels, attractions, and ticketing. In this way, a Singapore Tourism Board agent is ready and can be further previewed and optimized.
Now open the Baidu app and search for “When is the least crowded time to go to Singapore?”, because everyone wants to avoid crowds when traveling. The intelligent agent will combine information from multiple sources and generate an answer, “The least crowded time is from January to March.” We can also click on the intelligent agent to interact with it further, such as things to note when traveling to Singapore, recommending the top three hotels in Singapore, and letting it directly book tickets for Universal Studios Singapore, solving needs in one stop, greatly saving user time.
In addition to Singapore, cultural and tourism intelligent agents such as Dalian and Shenyang are also online on the Wenxin intelligent agent platform. There are also various intelligent agents for knowledge, creation, learning, entertainment and so on. These are all used Made by AgentBuilder.
When Wenxin Yiyan was first released last year, I said that Wenxin Yiyan would affect every company because of its powerful natural language understanding, expression, and reasoning capabilities, which can bring any company closer to its customers.
Today, every merchant and every customer can have their own intelligent agent in Baidu. The whole process does not require any programming. By inputting information similar to prompt words and performing a few simple steps of operation and tuning, an intelligent agent can be quickly generated and become a gold medal salesperson online 24 hours a day, 7 days a week.
Let’s take a look at how a merchant intelligent agent is created.
Qide Education is a well-known educational enterprise with more than 60 branches across the country and many overseas branches. It covers a wide range of countries and has high requirements for reception skills. How can we respond to customer inquiries 24 hours a day, improve reception level and reduce operating costs?
Kaide Education used Baidu’s AgentBuilder to create its own intelligent agent.
Let’s take a look at how to create an intelligent agent with basic capabilities. It’s very simple. Fill in the intelligent agent’s avatar, name, business scope and welcome message on the platform, and then set some information that users need to provide, such as age and education. In 5 minutes, with zero threshold, an intelligent agent is ready.
Qide Education also hopes that this intelligent agent will be a study abroad consultant who understands the business and students. It can make professional analysis and give accurate answers based on different situations of students, such as whether they want to go to the United States or Australia, whether they want to study for a master’s degree or a bachelor’s degree, how many IELTS and TOEFL scores they have, etc. We can create a more advanced intelligent agent by adding knowledge, roles, and tools.
In the knowledge module, upload private domain knowledge, let the platform analyze it in real time, and automatically generate dialogue materials; in the role module, add some study abroad countries that are not within the business scope to the filtering plan to improve the efficiency of user leads; in the tool module, add services such as appointment to the store. With these few simple steps, a professional Qide Education intelligent body is ready.
Now, let’s search for “Australian study application requirements”. We can see that the intelligent agent quickly gives the seven necessary conditions, including the required language skills and major selection. It can also provide corresponding study abroad consulting solutions and answer all kinds of difficult questions and meet all requests.
The Qide Education Intelligent Body is very popular. In the first week of its launch, it was successfully distributed 1.55 million times and interacted with users 58,000 times. The number of lead conversions increased linearly, the conversion cost of effective leads was significantly reduced, and operating efficiency was greatly improved.
Next, I will introduce to you an intelligent entity in the home furnishing industry.
Sophia is a home furnishing brand that focuses on whole-house customization. As just shown, it can also create a basic merchant intelligent entity by filling in extremely simple information. But for the home furnishing industry, consumers’ offline experience is more important, so Sophia hopes to create a gold medal sales online and restore the offline reception experience.
Therefore, in the further settings, it selected the digital human as the display method in the role module, and then selected the appropriate background and voice for the digital human, and combined with the platform’s intelligent analysis capabilities, automatically summarized a set of sales scripts. Finally, a gentle and friendly gold medal salesperson with professional scripts was created. She can meet the various needs of users 24 hours a day and provide a high-level service experience.
When a Baidu search user has a decoration request, Sophia Intelligent Experience will use the capabilities of the Wenxin model to give priority answers to the question. In addition, she will actively confirm specific needs with the customer, such as decoration type, budget, etc., and recommend nearby offline stores.
Since the launch of Sophia Merchant Intelligence, the cost of effective leads has dropped by 30%. In other words, if it used to cost 100 yuan to acquire an effective customer, it now only costs 70 yuan.
Currently, more than 10,000 Baidu customers have merchant intelligence entities, covering more than 30 industries including education and training, real estate and home furnishings, machinery and equipment, and business services.
Above, through three demos, I showed how developers and merchants can use AgentBuilder to create intelligent agents in different industries.
Now, making an intelligent agent is just a matter of minutes. But here comes the problem! If there is no traffic, no distribution, no finding, and no one uses it, then developers and merchants will have no income, and without income there is no motivation. How to solve this pain point?
Our Wenxin intelligent agent platform provides developers with a way to monetize traffic. In addition to Baidu Search, other products in the Baidu ecosystem, such as Xiaodu, Maps, Tieba, and car machines, can all access the relevant capabilities of the intelligent agent, solving the worries of traffic distribution for developers and obtaining tangible benefits.
With distribution, there will be data feedback; with data feedback, the flywheel will start to turn, and the intelligent agent will be able to iterate autonomously, becoming smarter with use. The Wenxin Intelligent Agent Platform has also launched the data analysis and question-answering tuning modules for intelligent agents, and more new capabilities will be launched soon. The Wenxin Intelligent Agent Platform will drive intelligent agents to form a positive cycle with better quality, better traffic, and greater benefits through the data flywheel of distribution-diagnosis-revenue.
Next, I will introduce you to the second development tool, AppBuilder. It is currently the best AI native application development tool. On AppBuilder, we have pre-packaged and pre-set various components and frameworks required to develop AI native applications, greatly reducing the development threshold.
In just three steps, developers can develop an AI native application using natural language, and easily publish and integrate it into a variety of business environments. Let’s look at a few examples:
Earlier this year, we held an AI native application development challenge. The topic was to use AppBuilder to create an “amusement park queue planning assistant” to help tourists better understand the queue situation in the amusement park, design personalized tour routes, and get the best experience within a limited time.
The champion of this competition developed an application without writing a single line of code and won a 100,000 yuan prize from Baidu. If you can write code, it is not difficult to write one for this topic, but if you can do it without writing a single line of code, it still depends a lot on the basic model and the capabilities of the AppBuilder tool.
Let’s see how to use AppBuilder to create this AI application.
Let’s review the competition topic first. It assumes the waiting time and excitement index of each project in “Universal Studios”, so the problem to be solved in this competition is to get the highest excitement index experience within a limited time.
First, open the development interface of AppBuilder and name the application “Playground Queue Assistant”; second, we describe the specific requirements in the role instructions, including calling the code interpreter, calculating the best combination within a fixed time, outputting the results, etc.; the third step is to go to the tool component and add the code interpreter to assist in the calculation.
Now, let’s test the effect. Enter the question “I have 3 and a half hours. What is the most exciting way to play?” on the right. You can see that the code interpreter translates this question into code, and then calls the data understanding tool to analyze the known conditions. After a series of calculations, it is concluded that the best effect is to combine the four projects “Harry Potter and the Forbidden Journey”, “Jurassic Adventure”, “Decepticon Roller Coaster”, and “Bumblebee Cyclone”. The test is fine, click Publish, and an application is generated without any code.
Now, AppBuilder has been further upgraded. During the creation process, the “AI Optimized Configuration” function can automatically help developers optimize role instructions, component configuration and other aspects to further improve development efficiency.
Let’s look at another example.
At the beginning of this year, North China Electric Power University proposed to provide intelligent exclusive services for all teachers and students. Based on Baidu’s AppBuilder, we jointly created a Huadian AI assistant. Now, let me show you how the Huadian AI assistant was made:
Step 1: Open AppBuilder, enter the creation page, and click AI to automatically generate configuration. First, set basic information such as name, description, and avatar for the app.
Step 2: Describe specific requirements in role instructions using natural language, including tasks, component capabilities, requirements and restrictions.
Step 3: Insert custom components such as book borrowing query, class schedule query, and student score query to enable the campus assistant to have intelligent service capabilities. Then add an opening statement to the campus assistant, and the application is configured.
Next, we debugged the assistant in the preview interface based on user questions, such as asking about the registration time for the CET-4 and CET-6 exams, and testing the effect of automatic calling of each component.
As you can see, through these simple operations, the application has been built. It has been launched on a small scale and has been connected to high-frequency scenarios such as checking systems, checking courses, charging meal cards, and borrowing books, serving the school’s teachers and students. In the future, we will work with North China Electric Power University to further deepen application cooperation and provide more abundant and convenient services.
Baidu has accumulated many years of technology in cross-modality. In AppBuilder, we also provide certain cross-modality capabilities. You only need to give a paragraph of text or a few sentences to quickly create painting applications, such as comics, children’s picture books, etc.
The process is also very simple: open AppBuilder, click to enter “App Creation”, enter the role instructions, select the “Add Wensheng Picture” component, then enter three recommended questions, and then click to publish. After the application is completed, we only need to enter a rough idea of the role or plot, and AppBuilder can automatically generate the story and output the picture.
Baidu Wenku’s latest intelligent comic and picture book generation function uses this type of components provided by AppBuilder. Let’s take a look at how Baidu Wenku’s comic generation function enables every creative person to create good works.
Take the classic “Zhou Chu Eliminates Three Evils” as an example.
Open Baidu Wenku and enter the topic “Zhou Chu Eliminates Three Evils”, which is written in “Book of Jin” and “A New Account of Tales of the World”. After searching the library, a story will be generated based on the original text. We can also modify the content of the story; then click the AI toolbar on the right to start creating this comic.
Enter the comic production interface, and the library will automatically generate comic storyboards for us based on the storyline; then choose the comic style that best fits the story from a variety of styles such as light and shadow, realism, cartoon, etc.; finally, choose different character images based on the role, and the comic is generated.
After the comic is generated, we can browse the complete comic in the intelligent comic editor of Baidu Wenku. At the same time, Wenku supports editing, modifying and fine-tuning each picture. For example, select the picture “Zhou Chu and the white-browed tiger” on the left, click Edit, and add the picture description “The character’s face is clear and the picture is bright” to fine-tune the comic that better meets the needs. As you can see, the comic function of Baidu Wenku has done an excellent job in the consistency of the style of characters and scenes.
Baidu Wenku’s intelligent comic capabilities have greatly improved the efficiency of comic creation, reduced the cost and threshold of comic creation, and allowed more people with ideas and creativity to realize their dreams of comic creation.
Baidu Wenku can not only generate comics, but also help users create picture books without any barriers. You may not know that the average number of picture books read by Chinese children per year is only 10, while in Europe and the United States it is about 50. Now, AI can enable parents who have no drawing skills at all to create a children’s picture book for their children. Let’s take a look at this illustrated audio picture book!
Since last year, we have used AI to reconstruct Baidu Library, making it the “starting point for content production” for users. Now, with the support of AppBuilder, Baidu Library’s newly launched smart comics and smart picture books have extended the scene to a more interesting cross-modal creation field.
I just used three cases to show how to use Baidu’s AppBuilder to create AI native applications. You should be able to feel the two obvious advantages of AppBuilder:
First, it is powerful. Relying on Wenxin 4.0’s ability to understand and follow instructions, our AppBuilder can ensure that a good level can be achieved after cold start, and it will not take a long time to tune because of poor results, which greatly reduces the development threshold. Relying on the retrieval enhancement technology RAG, in typical scenarios such as knowledge questions and answers, our question and answer accuracy and friendly response level have reached more than 95%, far surpassing other similar products. AppBuilder also provides a rich and complete component tool, including 55 components such as Baidu Search, AI capability components based on Baidu’s many years of technology accumulation, large model capability components, and Baidu’s exclusive business components. And some third-party APIs for mainstream scenarios, such as flight inquiries, paper inquiries, etc. We have just supported custom components, and customers can directly connect to any of their own proprietary tools and data. These rich components jointly support the efficient development of AI native applications.
Second, it is easy to use. With AppBuilder, you can quickly create an application in just three steps and distribute it with one click. We also support open source SDKs to facilitate secondary development.
Next, I will introduce a tool that is more suitable for professional developers, that is ModelBuilder, which can customize models of any size according to the needs of developers, and further fine-tune the SFT of the model according to the segmented scenarios, so as to achieve better results. For developers, it is very important to master the method of model fine-tuning in order to make good use of large models.
Let’s take the scenario of essay correction in the education industry as an example. Because essay correction has clear scoring standards, and the requirements and scoring standards for essays in different grades are also different. This requires fine-tuning the model so that the output results of the large model are more in line with specific requirements.
Let’s take a look at how ModelBuilder can be used to fine-tune the essay correction model.
Step 1: Create a data set. The effect of model fine-tuning depends largely on the quality of our data. In this case, there are only 180 original data, and the quality is not high enough. We need to use three functions: data cleaning, data labeling, and data enhancement. Data cleaning can quickly remove data gaps, garbled characters, and other problems. In terms of data labeling, we have added more dimensions to the composition, such as content depth, writing techniques, and other evaluations. The data enhancement function can generate similar but non-repetitive data for data expansion. After expansion, ModelBuilder generated 920 high-quality data for us.
The second step is to fine-tune the model. First, we need to select a basic model for fine-tuning. Here we choose ERNIE Speed as the basic model. Then, we configure the parameters according to the platform’s recommended values. This time, the data is close to 1,000, and the recommended number of iterations is configured to 10 rounds. Then, the model can be trained.
Step 3: Deploy the model on the platform, and the entire fine-tuning process is completed.
Let’s use a middle school essay to test the model after fine-tuning. The title of the essay is “Looking Forward to Growing Up”, with no restrictions on the subject matter and no less than 800 words.
We input a student’s composition and the format requirements for corrections into the two models before and after fine-tuning. Let’s take a look at the comparison between the two models and the teacher’s comments. It can be seen that the fine-tuned large model has more professional teacher comment thinking and better format compliance. The essay correction task that originally took a long time for teachers to complete can now be completed quickly in a few seconds with the help of the large model, greatly improving the speed and efficiency of giving feedback to students. At present, this type of fine-tuned model has been applied in many leading companies in the education industry.
I mentioned earlier that large-scale AI native applications are generally MoE. So, how can we combine these models to get the best cost-effectiveness? Next, I will use Xiaodu to demonstrate an example of MoE model routing. Let’s first see if Xiaodu has become smarter with the support of a large model:
Robin: Xiaodu Xiaodu
Xiaodu: I am here
Robin: What’s the news today?
Xiaodu: Xiaodu plays 3-5 news bulletins of the day
Robin: Xiaodu, Xiaodu. I heard that you have become more capable recently? Xiaodu: Not only can I chat with you, but I can also be your AI schedule assistant.
Robin: Then I’m going to test you. Please help me arrange my schedule. After today’s event, I will have dinner with Mr. Zhang, and then there will be a management meeting at the hotel, which will take about an hour and a half, and I hope it can be finished before 10 o’clock. Tomorrow morning I will go for a morning run, see what the weather is like, and then there will be a morning meeting at 8 o’clock in the morning. Please help me arrange my schedule.
Xiaodu: Based on your needs, Xiaodu will provide you with the best schedule as follows
Your dinner with Mr. Zhang is scheduled for 18:00-19:30. After that, you will attend the management meeting at 20:00, which will end around 21:30.
The alarm will remind you to get up and go for a morning run at 6 a.m. tomorrow morning. Keep your running time within 1 hour to ensure you are not late for the 8 a.m. meeting tomorrow morning.
Tomorrow morning the weather will be sunny and the temperature will be 25 degrees. It is recommended that you wear short sleeves and take good sun protection.
Can I be your secretary? I have many new skills.
Robin: Very good, great (thumbs up to Xiaodu), let me sync these schedules to my phone.
Just now, I asked Xiaodu several questions, including simple news inquiries and complex schedule requirements. It seems that I am communicating with the same Xiaodu, but in fact, different models are called behind the scenes. When it received my question, the small model ERNIE Tiny first performed the “model routing” work; for weather issues, it called the fine-tuned model based on ERNIE Lite based on the results of the weather query, which was also ERNIE Lite after SFT, and gave clothing suggestions; for more complex schedules, it called the best-performing large model - Wenxin 4.0, to calculate the arrangements for various matters from tonight to tomorrow morning.
Today, every question we ask Xiaodu will be assigned to different models for execution. When calling the application’s API interface, the ERNIE Functions model will be used. When explaining the questions to children, the Wenxin Model 3.5 or 4.0 will be used. The ERNIE Character model is used to create the intelligent assistant to improve the consistency of the personality and stimulate the user’s desire to chat.
Through this combination of large and small models, Xiaodu not only successfully completed the “brain replacement operation” and installed the new AI native operating system DuerOS X, but also formed the best combination of effect, speed and cost. Compared with the flagship version that uses all Wenxin large models, the response speed is increased by 2 times and the cost is reduced by 99%. The Xiaodu Tiantian AI tablet robot that I just talked to was fully sold on major platforms yesterday. Friends who are interested can place an order immediately to experience it.
These examples about ModelBuilder demonstrate Baidu’s ability to produce models efficiently and at a low cost. With the most powerful basic model Wenxin 4.0, we can tailor smaller models suitable for various scenarios based on needs, taking into account various considerations such as effect, response speed, and inference cost, and support fine-tuning and post pretrain. The model tailored by dimensionality reduction is significantly better than the model directly adjusted from open source at the same size, and the cost is significantly lower at the same effect. People used to think that open source was cheap, but in fact, in large model scenarios, open source is the most expensive. Therefore, open source models will become more and more backward.
In order to help you get started quickly, ModelBuilder has pre-set the most comprehensive and richest large models. These include ERNIE3.5 and ERNIE4.0, which are flagship large models that are suitable for general complex scenarios and have powerful capabilities; there are also three lightweight large models, ERNIE Speed, Lite, and Tiny; there are also two vertical scene models, ERNIE Character is suitable for role-playing; ERNIE Functions is suitable for external tool use and business function calls in dialogue or question-and-answer scenarios. Of course, ModelBuilder also supports mainstream third-party models at home and abroad, with a total of 77, making it the development platform with the largest number of large models in China.
In addition to providing these development tools, we also provide financial and resource support to developers.
In May last year, Baidu launched the “Wenxin Cup” Entrepreneurship Competition, hoping to promote the prosperity of the big model ecosystem, create a more dynamic ecosystem, and help entrepreneurs and developers develop various AI native applications. In the first “Wenxin Cup” Entrepreneurship Competition, we received nearly 1,000 entrepreneurial teams to register. Baidu provided nearly 100 million yuan in investment support to 15 of the winning teams, and continued to provide all-round support in technology, team and resources.
Today, I announce that the second “Wenxin Cup” Entrepreneurship Competition has officially started. This time, we will expand the scope of project screening, set up sub-venues, and recruit entrepreneurial innovation teams from the global market and college students. As long as your entrepreneurial direction is AI native applications, you can sign up for the competition on the official website of the competition. At the same time, we have also increased our support for entrepreneurs, providing more sufficient investment funds and richer business resources. For the first time, we have also set up a “special award”. Particularly outstanding projects will have the opportunity to receive up to 50 million RMB in cash and resource support.
Chinese entrepreneurs and developers are very good at using new technologies to develop applications. I believe that the Wenxin Big Model will become the first choice for Chinese AI entrepreneurs and developers, and more and more applications will be built on the Wenxin Big Model. I also look forward to more entrepreneurs and developers joining us to build a prosperous AI ecosystem.
Most of the tools and cases we talked about earlier are based on large language models. Looking to the future, I believe that multimodal large models, or the integration of text, images, voice, video and other multimodal models, are very important long-term development directions for basic models and are the only way to AGI. Baidu has long-term investments in these areas and will update technical progress in a timely manner.
I have a very different judgment: the biggest application scenario of the visual big model is autonomous driving. Baidu is the best in this direction and the global leader in autonomous driving. We not only train AI to generate videos, but also train AI to understand what is happening in the real world and predict the future.
Based on more than 100 million kilometers of test mileage data on complex urban roads in China, Baidu has trained the Apollo visual perception model. It has four basic capabilities: detection, tracking, understanding, and mapping. This allows Baidu to have a smarter, more adaptable, and safer autonomous driving solution.
Baidu Maps has also taken the lead in applying the visual perception model to the field of mapping. Now, the world’s largest lane-level map data has been launched in 360 cities across the country. As long as Baidu Maps navigation can go, intelligent driving can go there.
After the Spring Festival this year, Baidu’s Carrot Run completed its “first crossing” of the Yangtze River. We extended our service from the north bank of the Yangtze River to the south bank. In some areas of Wuhan, we have achieved 7X24 hours of all-weather operations. We also plan to deploy 1,000 driverless vehicles in Wuhan this year.
This is a landmark event for the real commercialization of autonomous driving. It is no longer just a regional demonstration, but has entered a new stage of city-level application demonstration. In Wuhan, Turnip Run covers more than 3,000 square kilometers and a population of 7.7 million, making it the largest autonomous driving operation area in the world.
“Pack up and take away, use as you like.”
Just now, I showed you the Baidu Wenxin large model series and three development tools - AgentBuilder, AppBuilder, and ModelBuilder. They form a toolbox that you can pack up and take away immediately and use whenever you want.
At this moment, I am standing here, actually I am also a developer and an entrepreneur, and I am as excited as everyone else. Today, China has 1 billion Internet users, a strong basic model, enough AI application scenarios, and the most complete industrial system in the world. The country is also vigorously encouraging and supporting the “artificial intelligence +” action. Everyone and every company only needs to make full use of these tools to unleash unlimited creativity and productivity.
Today, everyone can become a developer, and the future will surely be a future created by developers together.