Perplexity Discloses Web Search Agent Post-Training Method; Qwen3.5-Based Model Outperforms GPT-5.4 on Accuracy and Cost

Gate News message, April 23 — Perplexity’s research team published a technical article detailing its post-training methodology for web search agents. The approach uses two open-source Qwen3.5 models (Qwen3.5-122B-A10B and Qwen3.5-397B-A17B) and employs a two-stage pipeline: supervised fine-tuning (SFT) to establish instruction-following and language consistency, followed by online reinforcement learning (RL) to optimize search accuracy and tool-use efficiency.

The RL phase leverages the GRPO algorithm with two data sources: a proprietary multi-hop verifiable question-answer dataset constructed from internal seed queries requiring 2–4 hops of reasoning with multi-solver verification, and rubric-based general conversation data that converts deployment requirements into objectively checkable atomic conditions to prevent SFT behavior degradation.

Reward design employs gated aggregation—preference scores only contribute when baseline correctness is achieved (question-answer match or all rubric criteria met), preventing high preference signals from masking factual errors. Efficiency penalties use within-group anchoring, applying smooth penalties to tool calls and generation length exceeding the baseline of correct answers in the same group.

Evaluation shows Qwen3.5-397B-SFT-RL achieves best-in-class performance across search benchmarks. On FRAMES, it reaches 57.3% accuracy with a single tool call, outperforming GPT-5.4 by 5.7 percentage points and Claude Sonnet 4.6 by 4.7 percentage points. Under moderate budget (four tool calls), it achieves 73.9% accuracy at $0.02 per query, compared to GPT-5.4’s 67.8% accuracy at $0.085 per query and Sonnet 4.6’s 62.4% accuracy at $0.153 per query. Cost figures are based on each provider’s public API pricing and exclude caching optimizations.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

Anthropic Launches /ultrareview for Claude Code: Multi-Agent Cloud-Based Code Review

Gate News message, April 23 — Anthropic has introduced /ultrareview (research preview), a cloud-based multi-agent code review feature for Claude Code. Users can type /ultrareview in the CLI to launch a group of review agents in a remote sandbox that work in parallel to examine diffs between the

GateNews25m ago

OpenAI Launches ChatGPT Workspace Agents for Enterprise Workflow Automation

Gate News message, April 23 — OpenAI announced the rollout of workspace agents in ChatGPT on April 22, introducing shared AI agents designed to automate complex tasks and extended workflows across tools and teams within an organization. The agents are powered by Codex and operate in a cloud-based en

GateNews26m ago

Alibaba Cloud Launches JVS Crew, Enterprise-Grade AI Agent Platform

Gate News message, April 23 — Alibaba Cloud officially released JVS Crew, an enterprise-grade AI Agent construction platform designed with an "integration-first" approach. The platform enables enterprises to quickly embed AI Agent capabilities into existing apps, SaaS services, or smart hardware

GateNews38m ago

Taiwan banks team up to build local AI! Finance’s large language model goes live by the end of the year at the fastest

CITIC Financial Holding, led by CITIC Financial Holding’s 16 financial institutions, announced the launch of the “Financial Large Language Model FinLLM” project. The first release of the banking model is expected to be published in August, and in 2026 Q1, AI agents based on FinLLM will be introduced. Training will begin in May, with a budget of approximately 40–70 million yuan. Due to regulatory and localization needs, local data training will be the core, strengthening sovereign AI, building shared infrastructure, and extending to inclusive finance. The plan has been incorporated into the national AI development plan and has received cross-ministry support.

ChainNewsAbmedia2h ago

Google CEO: Capital expenditures in 2026 will reach $185 billion; ramping up investment in the era of AI agents

Google CEO Sundar Pichai announced at Google Cloud Next in Las Vegas on April 22 that Google plans to invest $175 billion to $185 billion in 2026 in capital expenditures to build the infrastructure needed for autonomous AI agents, up from $31 billion in 2022.

MarketWhisper2h ago

Google Jules releases a new version candidate list, repositioning it as an end-to-end product development platform

According to the official April 23 announcement by the Google Jules team, Jules’s product positioning has been upgraded from an asynchronous coding agent to an “end-to-end agentic product development platform.” The new version can read the full product context, independently determine the next steps for building, and submit a PR. The official also announced that the new version candidate list is now open.

MarketWhisper2h ago
Comment
0/400
No comments