DeepSeek V4 Review: Speed is Key in AI Models

Introduction

DeepSeek V4 has finally arrived after much anticipation. Just hours ago, the preview version was released and made open source. Coincidentally, OpenAI also launched GPT-5.5 on the same day, highlighting the competition between two major AI companies in the US and China.

DeepSeek V4 comes in two versions: Pro and Flash, both supporting a super long context of 1 million tokens, with total parameter sizes of 1.6 trillion (activated 49 billion) and 284 billion (activated 13 billion) respectively.

However, beyond the impressive numbers of “1.6 trillion parameters” or “1 million tokens context,” two figures in the technical documentation deserve more attention: 27% and 10%.

According to the introduction of the V4 series on HuggingFace, the single token inference FLOPs of V4-Pro in a 1 million token context scenario is only 27% of that of V3.2, and the KVcache is only 10% of V3.2.

In simpler terms, in scenarios dealing with ultra-long materials, V4 is not only capable of handling more but does so faster and at a lower cost. This may be the most noteworthy aspect of the V4 update.

In the past six months, long context has become a common selling point among leading models. Claude, Qwen, Kimi, and GLM are all moving towards long texts, code repositories, and agent tasks, while DeepSeek focuses on the most expensive aspects of long text scenarios: computation and caching.

Unfortunately, V4 currently lacks native multimodal capabilities, which limits its performance in certain scenarios. Thus, the keyword for V4 is not the long-awaited “new species” in the industry, but rather a further step in “efficiency engineering.”

Faster but Lacking Native Multimodal

In 2026, it is no longer surprising for large models to support long contexts. However, another question arises: can the model continue to work efficiently when processing ultra-long texts and chains?

A model can easily answer questions when only looking at a few paragraphs of text; however, if it needs to review an entire code repository, dozens of contracts, or months of meeting records while continuously generating, retrieving, modifying code, and calling tools, the difficulty increases exponentially.

The single token inference FLOPs of V4-Pro is only 27% of V3.2, and the KVcache is only 10% of V3.2, which directly addresses this issue. The former indicates the computational load required to generate each token, while the latter refers to the KVcache usage, which can be understood as the “working memory” the model needs to carry when processing long texts.

As the text length increases, this working memory becomes heavier; if the model has to carry a full burden at every step, it becomes challenging to maintain agility. Thus, speed is paramount.

Here, speed is not just about answering a few seconds faster in a chat window; it refers to operational efficiency in long text tasks. After ingesting 1 million tokens, can the model still run smoothly and support high-frequency calls?

This efficiency improvement is also reflected in the recently launched GPT-5.5, where many ChatGPT users have noted a significant increase in response speed with GPT-5.5-Thinking.

With the current popularity of agent workflows, this metric becomes even more crucial. System-level agent tools, including OpenClaw, often need to read files, check information, call tools, modify code, save intermediate states, and continue based on feedback.

The more realistic the task and the longer the context, the more the computational and caching burden snowballs. Many agent products today seem futuristic, but when costs are calculated, they can be disastrous. If V4 can indeed reduce operational efficiency under long contexts, it will impact the entire cost structure of the agent toolchain.

We at Letter AI also had a simple hands-on experience with DeepSeek V4 Pro, setting up a basic offline environment and running two tests closely related to everyday user scenarios.

First, we provided V4 Pro with a set of materials regarding MCP, structured output, tool invocation, edge models, and inference services, asking it to write a technical analysis. This task primarily tested whether the model could organize a bunch of concepts and terms into a clear engineering diagram.

V4 Pro performed like a seasoned technical editor. It did not reiterate the materials point by point but grasped a main line: the competition among agents is not just about model parameters but also about how models can stably connect with external systems. In other words, models must not only “think” but also be able to read files, query databases, call tools, and write results back to business systems.

It understood structured output as “making the model express information in a way that machines can directly understand,” and MCP as “enabling the model to easily access external tools through standard interfaces,” which is closer to real products than merely explaining terms.

The second test involved asking it to write a local command-line tool in Python to manage daily collected AI industry news leads. The prompt was simple, with just a few basic constraints: no internet connection, no API calls; it should allow adding, viewing, filtering, deduplicating, automatically scoring news value, and exporting a markdown daily report.

V4 Pro provided a runnable small tool. Users can input company, title, type, source, link, time, text, and verification status, and the program will automatically calculate the news value score, categorizing leads into “directly quotable,” “needs further verification,” and “not adopted for now.” The exported markdown will also be grouped by hierarchy, retaining dimensions like company, title, type, score, and source.

This test illustrates that V4 Pro can break down a relatively complex intent into structure, rules, and executable code, which aligns with DeepSeek’s past user mindset.

On developer channels like OpenRouter, the DeepSeek V3 series has already proven its cost-effectiveness and user inertia. OpenRouter data shows that the DeepSeek V3 series consumed over 7.27 trillion tokens in 2025, ranking fifth, only behind models like ClaudeSonnet4 and Gemini2.0Flash. Even today, the call volume of DeepSeek V3.2 remains among the top on the OpenRouter leaderboard.

This indicates that user recognition is not solely based on benchmarks but on whether a model is stable, cost-effective, and efficient in real workflows.

This perspective can also be applied to Claude. In comparisons between ClaudeOpus4.6 and GPT-5.4 series on various model capability rankings, the conclusion does not always favor Claude; in some knowledge, reasoning, and speed metrics, GPT-5.4 performs better.

However, this does not prevent Claude from continuing to capture the developer and enterprise market over the past period. Anthropic disclosed in February this year that, based on revenue trends at the time, the company’s annual revenue scale had reached 14 billion dollars, with revenue growing more than tenfold each year over the past three years.

Thus, to objectively assess a model’s capabilities, one must observe its actual engineering performance in real workflows.

Of course, V4 is not without its shortcomings. The biggest regret is its current lack of native multimodal capabilities. Even before its release, there were expectations in the community that V4 would not just be a text model. Some media had previously reported that DeepSeek V4 was planned to be a multimodal model capable of handling images, videos, and text generation.

The absence of multimodal capabilities does pose a practical issue; once it involves visual understanding, chart analysis, or handling PPT/webpage/software interfaces, it reaches the limits of the model’s capabilities.

Today’s productivity tasks are no longer just about “reading a segment of text.” Many users are dealing with images, tables, screenshots, PDFs, webpages, video conferences, and complex software interfaces. Without native multimodal capabilities, V4 can still serve as a powerful foundation for long tasks, but it is not yet a complete work entry point.

However, one can also interpret this from another angle: standing at the crossroads of financing and IPO, V4 primarily addresses foundational issues for its parent company rather than constructing the entire building.

DeepSeek at the Crossroads of Financing

Another context for V4’s release is the sudden influx of financing news regarding DeepSeek. Clearly, as a rare species in China’s AI industry, DeepSeek has never been short of funds.

In the past, one of DeepSeek’s most recognizable tags was that it did not push forward through typical AI unicorn financing narratives. It has the backing of quantitative investment firm Huanfang and a figure like Liang Wenfeng, who has maintained a mysterious and focused image in the industry for a long time.

However, the situation has begun to change recently. The latest reports indicate that DeepSeek is seeking financing at a valuation exceeding 20 billion dollars, with companies like Alibaba and Tencent reportedly in discussions for investment. The specific figures are still under negotiation, but the direction is clear: DeepSeek has reached a point where it is ready for the capital market.

And V4 is a crucial lever at this juncture. Behind V4’s focus on efficiency logic is the capture of the current developer community’s primary concern: the predictable demand for calls could be further amplified, driving more commercialization.

This is also the most challenging hurdle for DeepSeek moving forward. Proving a 20 billion dollar valuation requires not only a strong model but also the ability to transform that model into a stable business system.

Competitors have already started taking action on this front. Qwen, GLM, and Kimi are all moving towards agentic coding, tool invocation, and long task execution, while Claude has made enterprise knowledge work and code workflows its most important commercial focus.

Clearly, leveraging V4’s capabilities, DeepSeek needs more product-level implementations. Agents cannot operate solely on the foundation model; they require a browser, file system, permission systems, enterprise software interfaces, plugin ecosystems, and product experiences. Even if V4 addresses foundational issues, how to establish a user ecosystem for productivity scenarios is a question Liang Wenfeng and his team must consider moving forward.

Thus, the most accurate positioning for V4 is not as a new species of model as people might imagine, but rather as an elevation of the “open-source model task foundation” to a new height.

In the past, DeepSeek has demonstrated that Chinese companies can create strong models at lower costs. V4 must prove that this low-cost route can continue to hold in the current phase of one million contexts, agents, domestic computing power, and commercialization.

Currently, V4 has played the efficiency card. The question that remains for DeepSeek is whether this card can support a 20 billion dollar company’s commercial scale.