Just recently, Insta360 announced that it achieved a revenue of 2.481 billion yuan in the first quarter of 2026, a year-on-year increase of 83.11%; however, the net profit attributable to shareholders of the listed company was 84.6202 million yuan, a year-on-year decrease of 52.02%. The surge in revenue and the sharp drop in profit are partly related to Insta360’s aggressive expansion, including a push into the drone market against DJI and an accelerated rollout of its wireless microphone product line.
On April 27, Insta360 partnered with TRAE, an AI programming product under ByteDance, to launch the “Vibe Coding Mic Air” set.
Image source: Insta360
To be honest, when I first saw this product information, even after experiencing numerous AI hardware, I was a bit confused. Let’s briefly introduce the background: Vibe Coding is essentially a programming method where you tell the AI your requirements, then sit back and watch the AI write code for you.
So how does Vibe Coding relate to a microphone? According to the introduction, this specialized Mic Air boasts “three major advantages”: high sampling rate and AI noise reduction ensure that the computer clearly hears the user’s commands, its compact and lightweight design allows users to speak softly while keeping the Mic Air close, and a 10-hour battery life supports long working hours.
At this point, does anyone recall the night in May 2018 when Luo Yonghao performed using TNT’s “voice command” at the Bird’s Nest?
“Shh! Please speak quietly, don’t disturb me while I use TNT.” If Luo Yonghao had access to the Mic Air eight years ago, would there have been no subsequent “Long Live Understanding”?
Voice Control for Work Computers is a Misconception
Although the Mic Air is marketed under the banner of “Vibe Coding,” it seems to me more like a conceptual peripheral created to forcefully enter the AI arena.
Using voice interaction for AI programming is not a whimsical demand from Insta360 and the TRAE team. In March of this year, Claude Code officially launched its voice mode, allowing users to speak by holding down the space bar and releasing it to complete input. For programmers, there is indeed a demand for voice-based Vibe Coding, but a specialized microphone is not necessarily required.
(Source: Anthropic Claude)
First of all, as a “Vibe Coding microphone,” the Mic Air lacks any irreplaceable technological barriers: a 48kHz sampling rate, AI noise reduction, and long battery life are essentially basic features of professional lapel microphones and have no real connection to AI or Vibe Coding.
For developers, existing laptop array microphones or professional noise-canceling headphones are already sufficient to support voice-to-text needs in quiet indoor environments. Even if a laptop’s built-in microphone is of poor quality, a headset with an independent microphone or a pair of mainstream TWS or open-ear headphones can better address this issue. After all, a company that allows “voice Vibe Coding” in an open office must also permit the use of noise-canceling headphones during work.
Considering these factors, I believe that developers buying a microphone to specifically facilitate “Vibe Coding” is akin to purchasing an iPad solely for note-taking in preparation for graduate school: there is not much connection between the “goal” and the “means.”
(Source: TRAE)
On a positive note, the Insta360 Mic Air TRAE set will include access to the TRAE AI programming platform, and its price will remain consistent with that of a standard Mic Air set, unlike last year’s surge of AI concept peripherals that exploited the AI hype to sell at inflated prices. At least in this regard, Insta360 differs fundamentally from last year’s “AI mice.”
Given Insta360’s product layout, I do not believe that Insta360 needs to hype its AI capabilities as a peripheral brand through Vibe Coding. This collaboration between Mic Air and TRAE seems more like a “co-branding” event between the two parties.
In my view, while voice interaction is very convenient in everyday scenarios (including driving, walking, or lying down while using a phone), it is actually a very inefficient means of interaction in work scenarios.
From an information density perspective, voice contains many filler words, repetitions, pauses, and other “non-informational components,” resulting in a lower information density compared to text (this discussion focuses solely on information density, not input speed). For an inherently unstable service like AI, lower information density amplifies distortions in the information processing.
(Source: Generated by ChatGPT)
Moreover, voice input is not suitable for “long thinking” by humans. As I write this text, I constantly refine it, quickly adjusting the order of phrases, adding or deleting content. Modifying a long prompt several hundred words long before sending it to AI does not take much time, but requiring users to clearly articulate lengthy requirements verbally is challenging. In WeChat chats, the built-in voice-to-text feature works well, but it still allows me to edit before sending.
Furthermore, voice input is an input method with strong “exclusivity.” In a small space where multiple developers are talking at once, even if everyone tries to control their volume, the sound energy will accumulate; combined with the tendency for people to raise their voices in background noise, the office will inevitably become “lively.”
Additionally, voice input faces language logic differences and security risks. I believe that at least in work scenarios, AI interaction based on voice will be a misconception.
Since voice interaction has many flaws in office settings, what kind of AI interaction do we actually need?
Multimodal + Agent is the Correct Answer for AI Interaction
In my view, current AI on PC has evolved to the stage of “multimodal input + Agent automatic execution.” The truly efficient interaction method at this point should be visual and pointer input (precise targeting) combined with proactive AI-predicted options.
Multimodal paired with AI Agents means that AI has transcended the limitations of the “text box” and can actively “perceive” and execute. Given this, we should not view AI interaction issues through the outdated model of “text window + voice input.”
In the Vibe Coding scenario, the most efficient action is not to speak prompts into a microphone but to select code with a cursor (or use an eye-tracking camera to capture visual focus). The AI Agent, upon receiving input, will actively infer the user’s “intent” and provide corresponding shortcut options, allowing users to click or voice-select the next step. Ultimately, what programmers need is not just a listening device, but an AI project manager. They articulate requirements to the AI project manager, who, based on their observations, perceptions, understanding, and predictive abilities, organizes the information into documents and directs the Agents to get to work.
Last week, I experienced the Pura X Max’s companion AI, which employs the “AI predicts the next action” model, and the experience was indeed quite impressive.
(Source: 雷科技)
The widespread ridicule of TNT was not due to the issues with the touch selection + voice command model, but rather because the natural language understanding models of that time could not fulfill “accurate operational requirements” with “vague voice commands.” However, times have changed; rapidly evolving AI Agents may not even require users to speak to complete corresponding tasks.
As long as future AI Agents can actively respond and reduce the need for user input “precision,” both voice input and cursor clicks or shortcut selections can provide users with comprehensive services, albeit in different contexts.
It can only be said that the TNT released in 2018 indeed appeared in the “wrong era.”
DJI and Hollyland Compete in Wireless Microphones
In my view, Insta360’s launch of the “TRAE programming set” can also be seen as a charge by a “new peripheral brand” into productivity scenarios in the AI era.
In recent years, with the explosion of short videos and live streaming, the wireless lapel microphone market has experienced a frenzy of growth. Newcomers like DJI and Hollyland have seized significant market share from professional audio brands like Rode, Sony, and Sennheiser, thanks to superior sound pickup and noise reduction capabilities, longer battery life, more stable wireless transmission, smaller sizes, and lower prices.
Recently, after Zhang Xue won the championship at the Zhang Xue motorcycle event, she was interviewed by the media with a circle of microphones (mainly DJI Mic) around her neck, vividly illustrating the status of wireless microphones in the self-media era.
(Source: Chongqing Daily)
The DJI Mic series around Zhang Xue’s neck has almost defined the standard for “good wireless microphones”; purchasing a Pocket along with a set of DJI Mic has become the default equipment for new domestic video creators.
Last week, DJI launched the new DJI Mic mini series, enhancing the product’s visual design while maximizing its hardware strength, aiming to change the public’s stereotype of lapel microphones as “black boxes” and pursue the product’s “artistic value.” More importantly, the colorful design allows the product to blend with the speaker (interviewee), non-intrusively and without disrupting the visual focus.
(Source: 雷科技)
Hollyland’s product strategy differs somewhat from DJI’s; it adopts an “industry-level technology dissemination” approach, leveraging its rich technical experience in film and broadcasting to become a strong player in the lapel microphone market. DJI’s Wang Tao even directly listed it as one of the competitors in his vision during a recent interview.
(Source: Hollyland)
However, the video shooting peripheral market has already become saturated, and every content creator with filming needs likely has two or three sets of microphones. Even I, who usually just film myself racing on the track, own three cameras and two sets of microphones. In the era of AI-generated videos, the growth rate of novice users with genuine filming needs is slowing down.
In other words, the hardware competition in the lapel microphone sector has reached its limit— everyone has noise reduction, touch boxes, and ultra-long battery life; the only factors left to compete on are appearance, price, and stability.
Insta360’s “cross-industry” move to create an AI programming set is essentially an attempt to find a new market in “non-filming scenarios,” thereby avoiding the spotlight of DJI and Hollyland. The positioning of the “Vibe Coding microphone” brings it to the attention of programmers, a new user group in the industry.
The collaborative promotion model with TRAE can also create new differentiators beyond traditional hardware performance. Yes, once TRAE officially launches, other microphone brands will also enter the market.
While voice interaction in office settings is indeed a misconception, the approach of exploring new scenarios and markets in productivity is a necessary step for microphone manufacturers facing severe internal competition.
Ultimately, the emergence of AI hardware like the Insta360 Mic Air TRAE limited set is a breakthrough action by brands developed in the video era, navigating the challenges of the AI era. With more sensitive network awareness and a focused product line, cross-industry players like Insta360 and new peripheral brands are bound to leave behind arrogant giants like Logitech in the dust.
It is time for Chinese AI brands to define human-computer interaction in the AI era.
Comments
Discussion is powered by Giscus (GitHub Discussions). Add
repo,repoID,category, andcategoryIDunder[params.comments.giscus]inhugo.tomlusing the values from the Giscus setup tool.