|
|
LeVanLoi'log, ⌚ 2024-12-28
***
Top AI Stories of 2024! Agents Rise, Prices Fall, Models Shrink, Video Takes Off, Acquisitions Morph
Tác giả: The Batch @ DeepLearning.AI
Link: https://info.deeplearning.ai/top-ai-stories-of-2024-agents-rise-prices-fall-models-shrink-video-takes-off-acquisitions-morph-1
Top AI Stories of 2024! Agents Rise, Prices Fall, Models Shrink, Video Takes Off, Acquisitions Morph
Dear friends,
Is AI progressing rapidly? Yes! But while the progress of underlying AI technology has indeed sped up over the past 2 years, the fastest acceleration is in applications.
Consider this: GPT-4 was released March 2023. Since then, models have become much faster, cheaper, sometimes smaller, more multimodal, and better at reasoning, and many more open weight versions are available — so progress has been fantastic! (Claims that AI is “hitting a wall” seem extremely ill-informed.) But more significantly, many applications that already were theoretically possible using the March 2023 version of GPT-4 — in areas such as customer service, question answering, and process automation — now have significant early momentum.
I’m confident 2025 will see even faster and more exciting advances than 2024 in both AI technology and applications. Looking back, the one thing that could have stopped AI was bad, anti-competitive regulation that would have put onerous burdens on developers, particularly of open models. So long as we remain vigilant and hold off these anti-innovation forces, we’ll keep up or even further accelerate progress.
I’m also seeing a widening gap between those at the cutting edge (which includes many readers of The Batch!) and those who have not yet tried out ChatGPT even once (yes, a lot of people are still in this group!). As technology changes around us, we all have to keep up to remain relevant and be able to make significant contributions. I’m committed to making sure DeepLearning.AI continues to help you learn the most useful and important AI technologies. If you’re making New Year’s resolutions, I hope you’ll include us in your learning plan!
AI is the most important technological change happening in the world right now. I’m thrilled to be working in this exciting sector alongside you, and I’m grateful for your efforts to learn about and apply it to better the lives of yourself and others.
Happy holidays!
Andrew
A Blizzard of Progress
What a year! AI made dramatic advances in 2024. Agentic systems improved their abilities to reason, use tools, and control desktop applications. Smaller models proliferated, many of them more capable and less expensive than their larger forbears. While some developments raised worries, far more sparked wonder and optimism. As in the waning days ofearlier years, we invite you to pour a cup of hot cocoa and consider the high points of the last 12 months.
Agents Ascendant
The AI community laid the foundation for systems that can act by prompting large language models iteratively, leading to much higher performance across a range of applications.
What happened: AI gained a new buzzword — agentic — as researchers, tool vendors, and model builders equipped large language models (LLMs) to make choices and take actions to achieve goals. These developments set the stage for an upswell of agentic activity in the coming year and beyond.
Driving the story: Several tools emerged to help developers build agentic workflows.
- Microsoft primed the pump for agentic development tools in late 2023 with Autogen, an open source conversational framework that orchestrates collaboration among multiple agents. (Learn how to take advantage of it in our short course “AI Agentic Design Patterns with Autogen.”) In late 2024, part of the Autogen team split off to build AG2based on a fork of the code base.
- In October 2023, CrewAI released its open source Python framework for building and managing multi-agent systems. Agents can be assigned roles and goals, gain access to tools like web search, and collaborate with each other. (DeepLearning.AI’s short courses “Multi-Agent Systems with crewAI” and “Practical Multi AI-Agents and Advanced Use Cases with crewAI” can give you a fast start.)
- In January, LangChain, a provider of development tools, introduced LangGraph, which orchestrates agent behaviors using cyclical graphs. The framework enables LLM-driven agents to receive inputs, reason over them, decide on actions, use tools, evaluate the results, and repeat these steps to improve results. (Our short course “AI Agents in LangGraph” offers an introduction.)
- In September, Meta introduced Llama Stack for building agentic applications based on Llama models. Llama Stack provides memory, conversational skills, orchestration services, and ethical guardrails.
- Throughout the year, integrated development environments implemented agentic workflows to generate code. For instance, Devin and OpenHands accept natural-language instructions to generate prototype programs. Replit Agent, Vercel’s V0, and Bolt streamline projects by automatically writing code, fixing bugs, and managing dependencies.
- Meanwhile, a number of LLM makers supported agentic workflows by implementing tool use and function calling. Anthropic added computer use, enabling Claude 3.5 Sonnet to control users’ computers directly.
- Late in the year, OpenAI rolled out its o1 models and the processing-intensive o1 pro mode, which use agentic loops to work through prompts step by step. DeepSeek-R1 and Google Gemini 2.0 FlashThinking Mode followed with similar agentic reasoning. In the final days of 2024, OpenAI announced o3 and o3-preview, which further extend o1’s agentic reasoning capabilities with impressive reported results.
Behind the news: Techniques for prompting LLMs in more sophisticated ways began to take off in 2022. They coalesced in moves toward agentic AI early this year. Foundational examples of this body of work include:
- Chain of Thought prompting, which asks LLMs to think step by step
- Self-consistency, which prompts a model to generate several responses and pick the one that’s most consistent with the others
- ReAct, which interleaves reasoning and action steps to accomplish a goal
- Self-Refine, which enables an agent to reflect on its own output
- Reflexion, which enables a model to act, evaluate, reflect, and repeat.
- Test-time compute, which increases the amount of processing power allotted to inference
Where things stand: The agentic era is upon us! Regardless of how well scaling laws continue to drive improved performance of foundation models, agentic workflows are making AI systems increasingly helpful, efficient, and personalized.
Prices Tumble
Fierce competition among model makers and cloud providers drove down the price of access to state-of-the-art models.
What happened: AI providers waged a price war to attract paying customers. A leading indicator: From March 2023 to November 2024, OpenAI cut the per-token prices of cloud access to its models by nearly 90 percent even as performance improved, input context windows expanded, and the models became capable of processing images as well as text.
Driving the story: Factors that pushed down prices include open source, more compute-efficient models, and excitement around agentic workflows that consume more tokens at inference. OpenAI’s GPT-4 Turbo set a baseline when it debuted in late 2023 at $10.00/$30.00 per million tokens of input/output. Top model makers slashed prices in turn: Google and OpenAI at the higher end of the market, companies in China at the lower end, and Amazon at both. Meanwhile, startups with specialized hardware offered open models at prices that dramatically undercut the giants.
-
Competitive models with open weights helped drive prices down by enabling cloud providers to offer high-performance models without bearing the cost of developing or licensing them. Meta released Llama 3 70B in April, and various cloud providers offered it at an average price of $0.78/$0.95 per million input/output tokens. Llama 3.1 405B followed in July 2024; Microsoft Azure priced it at almost half the price of GPT-4 Turbo ($5.33/$16.00).
-
Per-token prices for open weights models tumbled in China. In May, DeepSeek released DeepSeek V2 and soon dropped the price to $0.14/$0.28 per million tokens of input/output. Alibaba, Baidu, and Bytedance slashed prices for Qwen-Long ($0.06/$0.06), Ernie-Speed and Ernie-Lite (free), and Doubau ($0.11/$0.11) respectively.
- Makers of closed models outdid one another with lower and lower prices. In May, OpenAI introduced GPT-4o at $5.00/$15.00 per million tokens of input/output, half as much as GPT-4 Turbo. By August, GPT-4o cost $2.50/$10.00 and the newer GPT-4o mini cost $0.15/$0.60 (half as much for jobs with slower turnaround times).
- Google ultimately cut the price of Gemini 1.5 Pro to $1.25/$5.00 per million input/output tokens (twice as much for prompts longer than 128,000 tokens) and slashed Gemini 1.5 Flash to $0.075/$0.30 per million input/output tokens (twice as much for prompts longer than 128,000 tokens). As of this writing, Gemini 2.0 Flash is free to use as an experimental preview, and API prices have not been announced.
- In December, Amazon introduced the Nova family of LLMs. At launch, Nova Pro ($0.80/$3.20 per million tokens of input/output) cost much less than top models from OpenAI or Google, while Nova Lite ($0.06/$0.24) and Nova Micro ($0.035/$0.14 respectively) cost much less than GPT-4o mini. (Disclosure: Andrew Ng serves on Amazon’s board of directors.)
- Even as model providers cut their prices, startups including Cerebrus, Groq, and SambaNova designed specialized chips that enabled them to serve open weights models faster and more cheaply. For example, SambaNova offered Llama 3.1 405B for $5.00/$10.00 per million tokens of input/output, processing a blazing 132 tokens per second. DeepInfra offered the same model at a slower speed for as little as $2.70/$2.70.
Yes, but: The trend toward more processing-intensive models is challenged but not dead. In September, OpenAI introduced token-hungry models with relatively hefty price tags: o1-preview ($15.00/$60.00 per million tokens input/output) and o1-mini ($3.00/$12.00). In December, o1 arrived with a more accurate pro mode that’s available only to subscribers who are willing to pay $200 per month.
Behind the news: Prominent members of the AI community pushed against regulations that threatened to restrict open source models, which played an important role in bringing down prices. Opposition by developers helped to block California SB 1047, a proposed law that would have held developers of models above certain size limits liable for unintended harms caused by their models and required a “kill switch” that would enable developers to disable them — a problematic requirement for open weights models that anyone could modify and deploy. California Governor Gavin Newsom vetoed the bill in October.
Where things stand: Falling prices are a sign of a healthy tech ecosystem. It’s likely that in-demand models will always fetch relatively high prices, but the market is increasingly priced in pennies, not dollars, per million tokens.
|
|
|
|
|