蒙特卡罗方法
1. Motivating example2. The simplest MC-based RL algorithm3. Use data more efficiently4. MC without exploring starts参考文献本文是一篇学习笔记内容全部源自于以下视频https://www.bilibili.com/video/BV1Pz5C6iE3X/?p6spm_id_from333.1007.top_right_bar_window_history.content.clickvd_source44ed90827c8f67247cab0ab288133c80