
主题:AlphaGo和F-35 -- 晨枫

共:💬78 🌺303 🌵1
全看分页树展 · 主题 跟帖
家园 不同看法

你说:"由于新的策略方式(policy net + value net)跟传统的完全基于蒙特卡洛的方式有本质的区别..."


The strongest current Go programs are based on MCTS, enhanced by policies that are trained to predict human expert moves. These policies are used to narrow the search to a beam of high-probability actions, and to sample actions during rollouts. This approach has achieved strong amateur play. However, prior work has been limited to shallow policies or value functions based on a linear combination of input features.


We use these neural networks to reduce the effective depth and breadth of the search tree: evaluating positions using a value network, and sampling actions using a policy network.

因此AlphaGo可以看成是对于这种基于MCTS方法的筛选修剪优化。从MCTS的角度看, AlphaGo用的“policy and value networks”跟目前围棋软件用的“shallow policies or value functions based on a linear combination of input features”没有本质的区别。



全看分页树展 · 主题 跟帖


Copyright © cchere 西西河