近日,我在《連線 2025 年度趨勢》專刊的的專欄文章里,探討了生成式 AI 的未來發展及其對社會、科技和商業模式的深遠影響。
當前,動輒數十億美元的算力投資和昂貴的推理成本正在侵蝕生成式 AI 的創新潛力。為了實現進一步的突破,大語言模型(LLM)亟需變得更輕量化、更高效、更實惠。
我相信,2025 年將是一個重要的轉折點。在性能優異、但更輕量化的模型加持下,眾多 AI-First 應用將會涌現,深刻改變我們的生活。
以下是專欄文章全文:
作者:李開復
零一萬物 CEO、創新工場董事長
展望 2025 年,預計將會有一系列由生成式 AI(GenAI)驅動的 AI-First 應用被推出。屆時,生成式 AI 將衍生出新一代價格合理的消費級和企業級解決方案,一方面回應其在業界引爆的廣泛期待,另一方面也向社會各界展現其巨大的潛在價值。
然而,這一觀點在今天并非共識。當前,OpenAI、谷歌和 xAI 等硅谷科技巨頭正陷入一場激烈的“科技軍備競賽”,競相開發最強大的超大模型,以追求被稱為“AGI”的通用人工智能。他們之間激烈的拉鋸競爭,占據了生成式 AI 生態的全球關注焦點,也主導分配了該生態體系中的收入份額。
以埃隆·馬斯克為例,他籌集了 60 億美元用于創立新公司 xAI,并購買了 10 萬張英偉達 H100 GPU。這些昂貴的芯片被用于訓練人工智能,其 Grok 模型的成本超過了 30 億美元。這些驚人的投入水位,只有富有的科技巨頭才有能力構建這些超大語言模型。
圖片來源:pexels
OpenAI、谷歌和 xAI 等公司不計成本的投入,造就了一個頭重腳輕的生態系統。由這些龐大的 GPU 集群訓練出的大模型,推理成本通常非常昂貴,這一成本最終會疊加到在每一個接入該大模型的應用上。
這就好比每個人都擁有 5G 智能手機,但流量費用卻高得令人望而卻步,貴到沒法觀看短視頻或時時瀏覽社交媒體。因此,盡管大模型性能在不斷提升,但只要推理成本居高不下,殺手級應用的普及就不切實際。
這個由超級富豪、科技巨頭相互競爭造就的失衡生態,使得英偉達成為最大的獲利者,同時迫使應用開發者陷入兩難:要么只能用低成本、低性能的模型,但這必然會達不到用戶的期望值;要么直面高昂的推理成本,冒著破產的風險去開發應用。
到 2025 年,一種新的模式將為改變這一困境帶來希望。回顧我們從以往技術革命浪潮中習得的經驗,PC 時代英特爾和 Windows 成功崛起,移動時代的高通和安卓成為了新的弄潮兒。在這些時代里,摩爾定律逐年提升了 PC 和應用的性能,而更低的帶寬和聯網成本,則極大改善了移動端應用的使用體驗。
面對高昂的推理成本,我預測業界即將迎來一項革命性的 AI 推理法則——得益于新一代人工智能算法的優化、先進的推理技術以及成本效益更高的芯片技術,AI 推理成本有望實現每年十倍的下降。
圖片來源:pexels
為了凸顯推理成本下降的顯著影響,我們來做一個簡單的對比。如果一名第三方開發者使用 OpenAI 的頂級模型來構建一個 AI 搜索應用,2023 年 5 月這款應用的單次搜索成本約為 0.75 美元,而沒有生成式 AI 加持的谷歌單次搜索成本遠遠低于 0.01 美元,相差 75 倍。
但僅僅一年間,2024 年 5 月使用 OpenAI 頂級模型的單次提問成本,已經降至約 0.04 美元,非常接近谷歌搜索。
推理成本每年下降十倍,這一速度是前所未有的。在此趨勢下,應用開發者很快就能夠使用性能更優、更實惠的大模型,未來兩年內 AI-First 應用將會迅速普及。
我相信,這將引領一種構建大模型公司的新模式。與其專注于 AGI “軍備競賽”,創業者將開始專注于構建性能優異、但更輕量化的模型,從而實現極速和極低成本的推理。
這些專為商業用途而設計的模型會采用創新的模型架構,變得更精簡。這不僅會大幅收窄訓練成本,還可以保證模型性能能夠滿足消費者或企業端的需求。在這種模式下或許不會誕生“能獲得諾貝爾獎的 AI”,但是這類模型卻有望成為推動 AI-First 應用普及的催化劑,促成 AI 生態系統的良性循環。
圖片來源:pexels
舉個例子,我所孵化的一個創業團隊正在同時構建模型、推理引擎和應用。僅以 300 萬美元的成本就訓練出了一個性能與 OpenAI 頂級模型幾乎持平的模型。值得一提的是,Sam Altman 曾表示訓練 OpenAI 的 GPT-4 成本超過 1 億美元。[1]
將這個模型應用到 AI 搜索應用 BeaGo 上,單次搜索的推理成本僅為 0.001 美元,只有 GPT-4 成本的 3%。而且,該團隊僅用五名工程師,花了兩個月就研發上線了這個 AI 搜索。
這又是如何實現的呢?據我所知,創業團隊通過深度垂直整合,全面優化了推理、模型和應用開發的全過程。
在人工智能發展的歷程中,我們共同見證了大語言模型這項革命性技術的力量。我堅信,生成式 AI 將徹底改變我們的學習、工作、生活方式以及商業模式。整個生態系統必須協同合作,克服成本障礙,調整策略,實現平衡,讓 AI 真正為我們的社會作出貢獻。
本文翻譯自《THE WIRED WORLD IN 2025》(《連線 2025 年度趨勢》)英文專欄,原文如下:
How Do You Get to Artificial General Intelligence? Think Lighter
BYKAI-FU LEE
CEO of 01.AI and Chairman of Sinovation Ventures
In 2025, entrepreneurs will unleash a flood of AI-powered apps. Finally, generative AI will deliver on the hype with a new crop of affordable consumer and business apps. This is not the consensus view today. OpenAI, Google, and xAI are locked in an arms race to train the most powerful large language model (LLM) in pursuit of artificial general intelligence, known as AGI, and their gladiatorial battle dominates the mindshare and revenue share of the fledgling GenAI ecosystem.
For example, Elon Musk raised $6 billion to launch the newcomer xAI and bought 100,000 Nvidia H100 GPUs, the costly chips used to process AI, costing north of $3 billion to train its model,Grok. At those prices, only technotycoons can afford to build these giant LLMs.
The incredible spending by companies such as OpenAI, Google and xAI has created a lopsided ecosystem that’s bottom heavy and top light. The LLMs trained by these huge GPU farms are usually also very expensive to inference, the process of entering a prompt and generating a response from large language models that is embedded in every app using AI. It’s as if everyone had 5G smartphones, but using data was too expensive for anyone to watch a Tiktok video or surf social media. As a result, excellent LLMs with high inference costs have made it unaffordable to proliferate killer apps.
This lopsided ecosystem of ultra-rich tech moguls battling each other has enriched Nvidia while forcing application developers into a catch-22 of either using a low-cost and low-performance model bound to disappoint users, or face paying exorbitant inference costs and risk going bankrupt.
In 2025, a new approach will emerge that can change all that. This will return to what we’ve learned from previous technology revolutions, such as the PC-era of Intel and Windows or the mobile era of Qualcomm and Android, where Moore’s Law improved PCs and apps, and lower bandwidth cost improved mobile phones and apps year after year.
But what about the high inference cost? A new law for AI inference is just around the corner. The cost of inference has fallen by a factor of 10 per year, pushed down by new AI algorithms, inference technologies, and better chips at lower prices.
As a reference point, if a third-party developer used OpenAI’s top-of-the-line models to build AI search, in May 2023 the cost would be about $0.75 per query, while Google’s non-Gen-AI search costs well less than $0.01, a 75x difference. But by May 2024, the price of OpenAI’s top model came down to about $0.04 per query. At this unprecedented 10x-per-year price drop, application developers will be able to use ever higher-quality and lower-cost models, leading to a proliferation of AI apps in the next two years.
I believe this will drive a different way to build an LLM company. Rather than focusing on the AGI arms race, founders will start to focus on building models that are almost as good as the top LLMs, but lightweight and thus ultra-fast and ultra-cheap. These models and apps, purpose-built for commercial applications using leaner models and innovative architecture, will cost a fraction to train and achieve levels of performance good enough for consumers and enterprises. This approach will not lead to a Nobel Prize- winning AI, but will be the catalyst to proliferating AI apps, leading to a healthy AI ecosystem.
For instance, I’m backing a team that’s jointly building a model, an inference engine, and an app all at the same time. This Silicon Valley-based AI startup trained a model almost as good as the best from OpenAI for $3 million, compared to the more than $100 million that Sam Altman said it cost to train OpenAI’s GPT-4 [1]. The inference cost of this model applied to an AI search app such as BeaGo is only $0.001 per query, only 3% of GPT-4’s price. And the team also built and launched an AI search app with just five engineers working for two months.
How was that accomplished? Vertical and deep integration that optimized inference, model, and application development holistically.
On the path of AI progression, we have all witnessed the power of LLM as a revolutionary technology. I am a firm believer that generative AI will disrupt the way we learn, work, live, and do business. The ecosystem must work together to get over the cost hurdle and adjust the formula, achieving equilibrium to make AI really work for our society.
[1] https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/
特別聲明:以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網易號”用戶上傳并發布,本平臺僅提供信息存儲服務。
Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.