狠狠综合久久久久综合网址-a毛片网站-欧美啊v在线观看-中文字幕久久熟女人妻av免费-无码av一区二区三区不卡-亚洲综合av色婷婷五月蜜臀-夜夜操天天摸-a级在线免费观看-三上悠亚91-国产丰满乱子伦无码专区-视频一区中文字幕-黑人大战欲求不满人妻-精品亚洲国产成人蜜臀av-男人你懂得-97超碰人人爽-五月丁香六月综合缴情在线

COMP9414代寫、Python語言編程代做

時間:2024-07-06  來源:  作者: 我要糾錯



COMP9414 24T2
Artificial Intelligence
Assignment 2 - Reinforcement Learning
Due: Week 9, Wednesday, 26 July 2024, 11:55 PM.
1 Problem context
Taxi Navigation with Reinforcement Learning: In this assignment,
you are asked to implement Q-learning and SARSA methods for a taxi nav-
igation problem. To run your experiments and test your code, you should
make use of the Gym library1, an open-source Python library for developing
and comparing reinforcement learning algorithms. You can install Gym on
your computer simply by using the following command in your command
prompt:
pip i n s t a l l gym
In the taxi navigation problem, there are four designated locations in the
grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
episode starts, one taxi starts off at a random square and the passenger is
at a random location (one of the four specified locations). The taxi drives
to the passenger’s location, picks up the passenger, drives to the passenger’s
destination (another one of the four specified locations), and then drops off
the passenger. Once the passenger is dropped off, the episode ends. To show
the taxi grid world environment, you can use the following code:

env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
s t a t e = env . r e s e t ( )
rendered env = env . render ( )
p r i n t ( rendered env )
In order to render the environment, there are three modes known as
“human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
ronment in a way suitable for human viewing, and the output is a graphical
window that displays the current state of the environment (see Fig. 1). The
“rgb array” mode provides the environment’s state as an RGB image, and
the output is a numpy array representing the RGB image of the environment.
The “ansi” mode provides a text-based representation of the environment’s
state, and the output is a string that represents the current state of the
environment using ASCII characters (see Fig. 2).
Figure 1: “human” mode presentation for the taxi navigation problem in
Gym library.
You are free to choose the presentation mode between “human” and
“ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
description, there are six discrete deterministic actions that are presented in
Table 1.
For this assignment, you need to implement the Q-learning and SARSA
algorithms for the taxi navigation environment. The main objective for this
assignment is for the agent (taxi) to learn how to navigate the gird-world
and drive the passenger with the minimum possible steps. To accomplish
the learning task, you should empirically determine hyperparameters, e.g.,
the learning rate α, exploration parameters (such as ? or T ), and discount
factor γ for your algorithm. Your agent should be penalized -1 per step it
2
Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
library. Gold represents the taxi location, blue is the pickup location, and
purple is the drop-off location.
Table 1: Six possible actions in the taxi navigation environment.
Action Number of the action
Move South 0
Move North 1
Move East 2
Move West 3
Pickup Passenger 4
Drop off Passenger 5
takes, receive a +20 reward for delivering the passenger, and incur a -10
penalty for executing “pickup” and “drop-off” actions illegally. You should
try different exploration parameters to find the best value for exploration
and exploitation balance.
As an outcome, you should plot the accumulated reward per episode and
the number of steps taken by the agent in each episode for at least 1000
learning episodes for both the Q-learning and SARSA algorithms. Examples
of these two plots are shown in Figures 3–6. Please note that the provided
plots are just examples and, therefore, your plots will not be exactly like the
provided ones, as the learning parameters will differ for your algorithm.
After training your algorithm, you should save your Q-values. Based on
your saved Q-table, your algorithms will be tested on at least 100 random
grid-world scenarios with the same characteristics as the taxi environment for
both the Q-learning and SARSA algorithms using the greedy action selection
3
Figure 3: Q-learning reward. Figure 4: Q-learning steps.
Figure 5: SARSA reward. Figure 6: SARSA steps.
method. Therefore, your Q-table will not be updated during testing for the
new steps.
Your code should be able to visualize the trained agent for both the Q-
learning and SARSA algorithms. This means you should render the “Taxi-
v3” environment (you can use the “ansi” mode) and run your trained agent
from a random position. You should present the steps your agent is taking
and how the reward changes from one state to another. An example of the
visualized agent is shown in Fig. 7, where only the first six steps of the taxi
are displayed.
2 Testing and discussing your code
As part of the assignment evaluation, your code will be tested by tutors
along with you in a discussion carried out in the tutorial session in week 10.
The assignment has a total of 25 marks. The discussion is mandatory and,
therefore, we will not mark any assignment not discussed with tutors.
Before your discussion session, you should prepare the necessary code for
this purpose by loading your Q-table and the “Taxi-v3” environment. You
should be able to calculate the average number of steps per episode and the
4
Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
algorithm.
average accumulated reward (for a maximum of 100 steps for each episode)
for the test episodes (using the greedy action selection method).
You are expected to propose and build your algorithms for the taxi nav-
igation task. You will receive marks for each of these subsections as shown
in Table 2. Except for what has been mentioned in the previous section, it is
fine if you want to include any other outcome to highlight particular aspects
when testing and discussing your code with your tutor.
For both Q-learning and SARSA algorithms, your tutor will consider the
average accumulated reward and the average taken steps for the test episodes
in the environment for a maximum of 100 steps for each episode. For your Q-
learning algorithm, the agent should perform at most 13 steps per episode on
average and obtain a minimum of 7 average accumulated reward. Numbers
worse than that will result in a score of 0 marks for that specific section.
For your SARSA algorithm, the agent should perform at most 15 steps per
episode on average and obtain a minimum of 5 average accumulated reward.
Numbers worse than that will result in a score of 0 marks for that specific
section.
Finally, you will receive 1 mark for code readability for each task, and
your tutor will also give you a maximum of 5 marks for each task depending
on the level of code understanding as follows: 5. Outstanding, 4. Great,
3. Fair, 2. Low, 1. Deficient, 0. No answer.
5
Table 2: Marks for each task.
Task Marks
Results obtained from agent learning
Accumulated rewards and steps per episode plots for Q-learning
algorithm.
2 marks
Accumulated rewards and steps per episode plots for SARSA
algorithm.
2 marks
Results obtained from testing the trained agent
Average accumulated rewards and average steps per episode for
Q-learning algorithm.
2.5 marks
Average accumulated rewards and average steps per episode for
SARSA algorithm.
2.5 marks
Visualizing the trained agent for Q-learning algorithm. 2 marks
Visualizing the trained agent for SARSA algorithm. 2 marks
Code understanding and discussion
Code readability for Q-learning algorithm 1 mark
Code readability for SARSA algorithm 1 mark
Code understanding and discussion for Q-learning algorithm 5 mark
Code understanding and discussion for SARSA algorithm 5 mark
Total marks 25 marks
3 Submitting your assignment
The assignment must be done individually. You must submit your assignment
solution by Moodle. This will consist of a single .zip file, including three
files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
SARSA (you can choose the format for the Q-tables). Remember your files
with your Q-tables will be called during your discussion session to run the
test episodes. Therefore, you should also provide a script in your Python
code at submission to perform these tests. Additionally, your code should
include short text descriptions to help markers better understand your code.
Please be mindful that providing clean and easy-to-read code is a part of
your assignment.
Please indicate your full name and your zID at the top of the file as a
comment. You can submit as many times as you like before the deadline –
later submissions overwrite earlier ones. After submitting your file a good
6
practice is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
4 Deadline and questions
Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
forum on Moodle to ask questions related to the project. We will prioritise
questions asked in the forum. However, you should not share your code to
avoid making it public and possible plagiarism. If that’s the case, use the
course email cs9414@cse.unsw.edu.au as alternative.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
For any questions regarding the discussion sessions, please contact directly
your tutor. You can have access to your tutor email address through Table
3.
5 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp









 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:FINS5510代寫、代做Python/c++程序語言
  • 下一篇:代寫公式指標 代寫指標股票公式定制開發
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • NBA直播 短信驗證碼平臺 幣安官網下載 歐冠直播 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    狠狠综合久久久久综合网址-a毛片网站-欧美啊v在线观看-中文字幕久久熟女人妻av免费-无码av一区二区三区不卡-亚洲综合av色婷婷五月蜜臀-夜夜操天天摸-a级在线免费观看-三上悠亚91-国产丰满乱子伦无码专区-视频一区中文字幕-黑人大战欲求不满人妻-精品亚洲国产成人蜜臀av-男人你懂得-97超碰人人爽-五月丁香六月综合缴情在线
  • <dl id="akume"></dl>
  • <noscript id="akume"><object id="akume"></object></noscript>
  • <nav id="akume"><dl id="akume"></dl></nav>
  • <rt id="akume"></rt>
    <dl id="akume"><acronym id="akume"></acronym></dl><dl id="akume"><xmp id="akume"></xmp></dl>
    日韩少妇内射免费播放| 黄色一级二级三级| 中文久久久久久| 日韩在线视频在线观看| 9色视频在线观看| 99热都是精品| 一二三在线视频| 欧美乱做爰xxxⅹ久久久| 成年人黄色在线观看| 五月天av影院| 国产精品国产三级国产专区51| 日韩人妻精品一区二区三区| 五月天在线免费视频| 国产精品中文久久久久久| 色中文字幕在线观看| 中国女人做爰视频| 国产精品久久..4399| 免费无码不卡视频在线观看| 3d动漫一区二区三区| 日韩精品一区中文字幕| 鲁一鲁一鲁一鲁一av| 一级片免费在线观看视频| 成人污网站在线观看| 日韩伦理在线免费观看| 日本美女高潮视频| 国产精品美女在线播放| 妞干网在线视频观看| 日韩有码免费视频| 最新免费av网址| 国产欧美日韩小视频| 人妻有码中文字幕| 日本中文字幕在线不卡| 婷婷五月综合缴情在线视频| 国产视频在线视频| 超碰人人爱人人| 免费日韩中文字幕| 国产精品视频一二三四区| 99热自拍偷拍| 亚洲女人在线观看| 国产成人a亚洲精v品无码| 成年人三级黄色片| 国产中文字幕在线免费观看| 亚洲最大天堂网| 成人免费在线小视频| 在线观看视频在线观看| 免费看a级黄色片| 精品国偷自产一区二区三区| 校园春色 亚洲色图| 国产精品一线二线三线| 亚洲精品国产久| 中文字幕有码av| 国产1区2区在线| 国产精品视频网站在线观看| 中文字幕第22页| 国产精品拍拍拍| 红桃av在线播放| 自拍日韩亚洲一区在线| 亚洲区成人777777精品| 在线a免费观看| 手机版av在线| caoporm在线视频| 日韩在线一区视频| 在线观看亚洲色图| 在线观看国产一级片| 三级在线免费看| 日韩一级理论片| jizz大全欧美jizzcom| 亚洲少妇第一页| 亚洲综合在线网站| 奇米影音第四色| 99精品999| 激情视频小说图片| 精品人妻人人做人人爽| 大胆欧美熟妇xx| 国产h视频在线播放| jizzjizzxxxx| 爱情岛论坛亚洲首页入口章节| 超碰av在线免费观看| 五月婷婷六月丁香激情| 中文字幕国内自拍| 国产性生活一级片| 欧美在线观看黄| 中文字幕日本最新乱码视频| 成人小视频在线看| 在线观看国产福利| 国产系列第一页| 9久久9毛片又大又硬又粗| 中文字幕无码不卡免费视频| 日韩手机在线观看视频| 中文字幕久久av| 日韩国产精品毛片| 国产亚洲综合视频| 九九九九九国产| 免费观看美女裸体网站| 女人另类性混交zo| 男女啪啪的视频| 内射国产内射夫妻免费频道| 国产一级不卡毛片| 91精品国产吴梦梦| 女性隐私黄www网站视频| 极品粉嫩美女露脸啪啪| 久久国产精品视频在线观看| 中文字幕 91| 亚洲国产成人精品无码区99| 亚洲三级视频网站| 男人的天堂狠狠干| 激情图片中文字幕| 亚洲精品乱码久久久久久自慰| 亚洲制服在线观看| 国产一级不卡毛片| 日本午夜激情视频| av磁力番号网| 天天看片天天操| 精品少妇无遮挡毛片| 欧美在线一区视频| 日日噜噜夜夜狠狠久久丁香五月| 久久久精品在线视频| www.在线观看av| 在线观看成人免费| 亚洲av无日韩毛片久久| 国产男女无遮挡| 久在线观看视频| 日韩久久久久久久久久久久| 亚洲第一成肉网| 中日韩av在线播放| 激情五月亚洲色图| 黄色三级视频片| 成熟老妇女视频| 国产成人精品视频ⅴa片软件竹菊| www.激情网| 日韩精品久久一区二区| 7777在线视频| 国产91在线亚洲| 日韩亚洲欧美一区二区| 无码毛片aaa在线| 国产精品videossex国产高清 | 成人免费观看cn| 免费一级特黄毛片| 日本阿v视频在线观看| 欧美黄网在线观看| 人人妻人人澡人人爽欧美一区双| 青青草原网站在线观看| 国风产精品一区二区| 国产一二三四区在线观看| 视频区 图片区 小说区| 国产成人免费高清视频| 人妻激情另类乱人伦人妻| 人妻激情另类乱人伦人妻| 777777av| 免费涩涩18网站入口| 日本一二区免费| 免费的一级黄色片| 国产美女在线一区| 成人羞羞国产免费网站| av在线网址导航| 黄色三级中文字幕| 国产91在线视频观看| 男人的天堂最新网址| 超碰成人在线免费观看| 精品无码一区二区三区在线| 欧美两根一起进3p做受视频| 免费黄频在线观看| 国产精品入口芒果| www.se五月| 国产午夜福利100集发布| 老司机午夜性大片| 日本a在线免费观看| 在线观看国产一级片| 成人免费在线网| 日本不卡一区二区在线观看| 91精品国产毛片武则天| 啊啊啊国产视频| www.男人天堂网| 精品综合久久久久| 免费毛片小视频| 国产成人精品免费看在线播放| 欧美 日韩 国产在线观看| 97人人模人人爽人人澡| 久草资源站在线观看| 黄网站色视频免费观看| 亚洲18在线看污www麻豆| 97超碰青青草| 成人在线播放网址| 99精品一级欧美片免费播放| 亚洲黄色a v| 久久久噜噜噜www成人网| 国产在线无码精品| 超碰在线免费av| 欧美日韩一区二区三区69堂| 色欲av无码一区二区人妻| 大陆av在线播放| 日韩一级免费看| 国产专区在线视频| 日韩视频一二三| 五月天婷婷影视| 亚洲综合在线一区二区| 亚洲视频一二三四| 中文字幕在线视频精品| 男女视频在线看| 国产乱女淫av麻豆国产|