Gate News message, April 23 — Perplexity’s research team published a technical article detailing its post-training methodology for web search agents. The approach uses two open-source Qwen3.5 models (Qwen3.5-122B-A10B and Qwen3.5-397B-A17B) and employs a two-stage pipeline: supervised fine-tuning (SFT) to establish instruction-following and language consistency, followed by online reinforcement learning (RL) to optimize search accuracy and tool-use efficiency.
The RL phase leverages the GRPO algorithm with two data sources: a proprietary multi-hop verifiable question-answer dataset constructed from internal seed queries requiring 2–4 hops of reasoning with multi-solver verification, and rubric-based general conversation data that converts deployment requirements into objectively checkable atomic conditions to prevent SFT behavior degradation.
Reward design employs gated aggregation—preference scores only contribute when baseline correctness is achieved (question-answer match or all rubric criteria met), preventing high preference signals from masking factual errors. Efficiency penalties use within-group anchoring, applying smooth penalties to tool calls and generation length exceeding the baseline of correct answers in the same group.
Evaluation shows Qwen3.5-397B-SFT-RL achieves best-in-class performance across search benchmarks. On FRAMES, it reaches 57.3% accuracy with a single tool call, outperforming GPT-5.4 by 5.7 percentage points and Claude Sonnet 4.6 by 4.7 percentage points. Under moderate budget (four tool calls), it achieves 73.9% accuracy at $0.02 per query, compared to GPT-5.4’s 67.8% accuracy at $0.085 per query and Sonnet 4.6’s 62.4% accuracy at $0.153 per query. Cost figures are based on each provider’s public API pricing and exclude caching optimizations.
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to
Disclaimer.
Related Articles
Web3 AI Infrastructure AIW3 Raises $2M in Seed Funding Led by Buffalo Capital
Gate News message, April 24 — Web3 AI infrastructure platform AIW3 announced the completion of a $2 million seed round funding. The round was led by Buffalo Capital, with GalaXin Capital and Three-stones Ventures participating as co-investors.
AIW3 is transitioning toward an Agent-as-a-Service
GateNews14m ago
The UAE government announced the rollout of AI agents, aiming to complete automation of half of its business operations by 2028 at the latest
The UAE announced that within two years, 50% of federal government departments, services, and operations will be run by autonomous AI agents, making it a global first. AI will become a government implementation partner, helping with decision-making, improving services, and self-optimizing. All civil servants will be required to undergo training, minister performance and the effectiveness of AI adoption will be linked, and the initiative will be driven by a dedicated task force supervised by the president. This move stems from more than a decade of policy accumulation and an AI strategy, with a core focus on people.
ChainNewsAbmedia1h ago
OristaPay Launches AI-Powered Payment System on Telegram, Enables Instant USDT Settlements on TON
Gate News message, April 24 — OristaPay, a brand operating under RD Technologies, announced a complete payment pathway enabling AI agents to execute transactions within the Telegram ecosystem during the Hong Kong Web3 Festival. The system allows users to trigger digital asset transactions through na
GateNews3h ago
Jeff Bezos' Project Prometheus Raises $10B at $38B Valuation
Gate News message, April 24 — Project Prometheus, an AI lab founded by Amazon founder Jeff Bezos and former Google executive Vik Bajaj, has closed a $10 billion funding round at a $38 billion valuation. JPMorgan Chase and BlackRock are
GateNews8h ago
OpenAI Launches GPT-5.5, Designed for Agent Tasks and Complex Workflows
Gate News message, April 24 — OpenAI has officially released GPT-5.5, a next-generation AI model designed to handle complex objectives, tool integration, self-verification, and multi-step task completion. The model excels at code writing and debugging, online research, data analysis, document
GateNews8h ago
AI Agent Startup Band Raises $17M Seed Round Led by Sierra Ventures, Hetz Ventures, Team8
Gate News message, April 24 — Band, a startup building a communication and collaboration platform for AI agents, has closed a $17 million seed round led by Sierra Ventures, Hetz Ventures, and Team8. Founded in mid-2025 by CEO Arick Goomanovsky and CTO Vlad Luzin, the company develops software for re
GateNews9h ago