主题：AlphaGo和F-35 -- 晨枫

共:💬78 🌺303 🌵1

老大河待整

不同看法

你说："由于新的策略方式（policy net + value net）跟传统的完全基于蒙特卡洛的方式有本质的区别..."

其实现在高水平的围棋软件（比如zen）都不是“完全基于蒙特卡洛的方式”。正如那篇关于AlphaGo的论文摘要里说的，

The strongest current Go programs are based on MCTS, enhanced by policies that are trained to predict human expert moves. These policies are used to narrow the search to a beam of high-probability actions, and to sample actions during rollouts. This approach has achieved strong amateur play. However, prior work has been limited to shallow policies or value functions based on a linear combination of input features.

...

We use these neural networks to reduce the effective depth and breadth of the search tree: evaluating positions using a value network, and sampling actions using a policy network.

因此AlphaGo可以看成是对于这种基于MCTS方法的筛选修剪优化。从MCTS的角度看， AlphaGo用的“policy and value networks”跟目前围棋软件用的“shallow policies or value functions based on a linear combination of input features”没有本质的区别。

你说："所以在对上狗狗这种不会犯错的对手的时候，“打劫”可能对人类而言就是根本无利的伪招式，反而加大了自己犯错的概率"

问题是，谁敢断定AlphaGo不会犯错？事实上，樊麾在跟它下的五盘“非正式”对局里还赢了两盘，只不过谷歌没有公布这些棋谱，我们不知道AlphaGo犯了什么错误而已。

全看分页树展 · 主题跟帖

相关回复上下关系8
压缩 4 层
- - 🙂罗冼河那局三劫循环 1 witten1 字36 2016-03-11 19:23:31
    🙂可以同意！ 3 说几句字67 2016-03-11 20:47:31
- 🙂这是MCTS的基本问题不假 10 jahcoo 字1240 2016-03-10 20:39:08
  - 🙂不同看法
    🙂shallow or deep? 3 jahcoo 字1250 2016-03-12 06:19:26
    🙂这个我同意 2 happyyuppie 字827 2016-03-12 11:08:34
    🙂第一个问题是“多长时间才能收敛到全局最优解”吧？ 3 jahcoo 字2432 2016-03-12 11:59:19
    🙂神佑人族 2 jahcoo 字1003 2016-03-13 11:10:36

有趣有益，互惠互利；开阔视野，博采众长。
虚拟的网络，真实的人。天南地北客，相逢皆朋友

版面群落趣味社区帮助常见问题网站简介基本河规隐私条款使用条款广告说明