美女扒开腿免费视频_蜜桃传媒一区二区亚洲av_先锋影音av在线_少妇一级淫片免费放播放_日本泡妞xxxx免费视频软件_一色道久久88加勒比一_熟女少妇一区二区三区_老司机免费视频_潘金莲一级黄色片_精品国产精品国产精品_黑人巨大猛交丰满少妇

COMP9414代寫、Python語言編程代做

時間:2024-07-06  來源:  作者: 我要糾錯



COMP9414 24T2
Artificial Intelligence
Assignment 2 - Reinforcement Learning
Due: Week 9, Wednesday, 26 July 2024, 11:55 PM.
1 Problem context
Taxi Navigation with Reinforcement Learning: In this assignment,
you are asked to implement Q-learning and SARSA methods for a taxi nav-
igation problem. To run your experiments and test your code, you should
make use of the Gym library1, an open-source Python library for developing
and comparing reinforcement learning algorithms. You can install Gym on
your computer simply by using the following command in your command
prompt:
pip i n s t a l l gym
In the taxi navigation problem, there are four designated locations in the
grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
episode starts, one taxi starts off at a random square and the passenger is
at a random location (one of the four specified locations). The taxi drives
to the passenger’s location, picks up the passenger, drives to the passenger’s
destination (another one of the four specified locations), and then drops off
the passenger. Once the passenger is dropped off, the episode ends. To show
the taxi grid world environment, you can use the following code:

env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
s t a t e = env . r e s e t ( )
rendered env = env . render ( )
p r i n t ( rendered env )
In order to render the environment, there are three modes known as
“human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
ronment in a way suitable for human viewing, and the output is a graphical
window that displays the current state of the environment (see Fig. 1). The
“rgb array” mode provides the environment’s state as an RGB image, and
the output is a numpy array representing the RGB image of the environment.
The “ansi” mode provides a text-based representation of the environment’s
state, and the output is a string that represents the current state of the
environment using ASCII characters (see Fig. 2).
Figure 1: “human” mode presentation for the taxi navigation problem in
Gym library.
You are free to choose the presentation mode between “human” and
“ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
description, there are six discrete deterministic actions that are presented in
Table 1.
For this assignment, you need to implement the Q-learning and SARSA
algorithms for the taxi navigation environment. The main objective for this
assignment is for the agent (taxi) to learn how to navigate the gird-world
and drive the passenger with the minimum possible steps. To accomplish
the learning task, you should empirically determine hyperparameters, e.g.,
the learning rate α, exploration parameters (such as ? or T ), and discount
factor γ for your algorithm. Your agent should be penalized -1 per step it
2
Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
library. Gold represents the taxi location, blue is the pickup location, and
purple is the drop-off location.
Table 1: Six possible actions in the taxi navigation environment.
Action Number of the action
Move South 0
Move North 1
Move East 2
Move West 3
Pickup Passenger 4
Drop off Passenger 5
takes, receive a +20 reward for delivering the passenger, and incur a -10
penalty for executing “pickup” and “drop-off” actions illegally. You should
try different exploration parameters to find the best value for exploration
and exploitation balance.
As an outcome, you should plot the accumulated reward per episode and
the number of steps taken by the agent in each episode for at least 1000
learning episodes for both the Q-learning and SARSA algorithms. Examples
of these two plots are shown in Figures 3–6. Please note that the provided
plots are just examples and, therefore, your plots will not be exactly like the
provided ones, as the learning parameters will differ for your algorithm.
After training your algorithm, you should save your Q-values. Based on
your saved Q-table, your algorithms will be tested on at least 100 random
grid-world scenarios with the same characteristics as the taxi environment for
both the Q-learning and SARSA algorithms using the greedy action selection
3
Figure 3: Q-learning reward. Figure 4: Q-learning steps.
Figure 5: SARSA reward. Figure 6: SARSA steps.
method. Therefore, your Q-table will not be updated during testing for the
new steps.
Your code should be able to visualize the trained agent for both the Q-
learning and SARSA algorithms. This means you should render the “Taxi-
v3” environment (you can use the “ansi” mode) and run your trained agent
from a random position. You should present the steps your agent is taking
and how the reward changes from one state to another. An example of the
visualized agent is shown in Fig. 7, where only the first six steps of the taxi
are displayed.
2 Testing and discussing your code
As part of the assignment evaluation, your code will be tested by tutors
along with you in a discussion carried out in the tutorial session in week 10.
The assignment has a total of 25 marks. The discussion is mandatory and,
therefore, we will not mark any assignment not discussed with tutors.
Before your discussion session, you should prepare the necessary code for
this purpose by loading your Q-table and the “Taxi-v3” environment. You
should be able to calculate the average number of steps per episode and the
4
Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
algorithm.
average accumulated reward (for a maximum of 100 steps for each episode)
for the test episodes (using the greedy action selection method).
You are expected to propose and build your algorithms for the taxi nav-
igation task. You will receive marks for each of these subsections as shown
in Table 2. Except for what has been mentioned in the previous section, it is
fine if you want to include any other outcome to highlight particular aspects
when testing and discussing your code with your tutor.
For both Q-learning and SARSA algorithms, your tutor will consider the
average accumulated reward and the average taken steps for the test episodes
in the environment for a maximum of 100 steps for each episode. For your Q-
learning algorithm, the agent should perform at most 13 steps per episode on
average and obtain a minimum of 7 average accumulated reward. Numbers
worse than that will result in a score of 0 marks for that specific section.
For your SARSA algorithm, the agent should perform at most 15 steps per
episode on average and obtain a minimum of 5 average accumulated reward.
Numbers worse than that will result in a score of 0 marks for that specific
section.
Finally, you will receive 1 mark for code readability for each task, and
your tutor will also give you a maximum of 5 marks for each task depending
on the level of code understanding as follows: 5. Outstanding, 4. Great,
3. Fair, 2. Low, 1. Deficient, 0. No answer.
5
Table 2: Marks for each task.
Task Marks
Results obtained from agent learning
Accumulated rewards and steps per episode plots for Q-learning
algorithm.
2 marks
Accumulated rewards and steps per episode plots for SARSA
algorithm.
2 marks
Results obtained from testing the trained agent
Average accumulated rewards and average steps per episode for
Q-learning algorithm.
2.5 marks
Average accumulated rewards and average steps per episode for
SARSA algorithm.
2.5 marks
Visualizing the trained agent for Q-learning algorithm. 2 marks
Visualizing the trained agent for SARSA algorithm. 2 marks
Code understanding and discussion
Code readability for Q-learning algorithm 1 mark
Code readability for SARSA algorithm 1 mark
Code understanding and discussion for Q-learning algorithm 5 mark
Code understanding and discussion for SARSA algorithm 5 mark
Total marks 25 marks
3 Submitting your assignment
The assignment must be done individually. You must submit your assignment
solution by Moodle. This will consist of a single .zip file, including three
files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
SARSA (you can choose the format for the Q-tables). Remember your files
with your Q-tables will be called during your discussion session to run the
test episodes. Therefore, you should also provide a script in your Python
code at submission to perform these tests. Additionally, your code should
include short text descriptions to help markers better understand your code.
Please be mindful that providing clean and easy-to-read code is a part of
your assignment.
Please indicate your full name and your zID at the top of the file as a
comment. You can submit as many times as you like before the deadline –
later submissions overwrite earlier ones. After submitting your file a good
6
practice is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
4 Deadline and questions
Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
forum on Moodle to ask questions related to the project. We will prioritise
questions asked in the forum. However, you should not share your code to
avoid making it public and possible plagiarism. If that’s the case, use the
course email cs9414@cse.unsw.edu.au as alternative.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
For any questions regarding the discussion sessions, please contact directly
your tutor. You can have access to your tutor email address through Table
3.
5 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp









 

標簽:

掃一掃在手機打開當(dāng)前頁
  • 上一篇:FINS5510代寫、代做Python/c++程序語言
  • 下一篇:代寫公式指標 代寫指標股票公式定制開發(fā)
  • 無相關(guān)信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
    ICP備06013414號-3 公安備 42010502001045

    美女扒开腿免费视频_蜜桃传媒一区二区亚洲av_先锋影音av在线_少妇一级淫片免费放播放_日本泡妞xxxx免费视频软件_一色道久久88加勒比一_熟女少妇一区二区三区_老司机免费视频_潘金莲一级黄色片_精品国产精品国产精品_黑人巨大猛交丰满少妇
    久久精品一区二区免费播放| 无码黑人精品一区二区| 黄色片在线观看网站| 国产一二三四五区| 国产精品无码网站| 国产探花视频在线播放| 1024手机在线视频| 一道本在线观看| 欧产日产国产精品98| 韩国三级hd中文字幕有哪些| 久久午夜福利电影| 精品人妻一区二区三区视频| 秘密基地免费观看完整版中文 | 亚洲 欧美 变态 另类 综合| 精品国产av无码| 野外性满足hd| 在线免费观看a级片| 亚洲一级Av无码毛片久久精品| 国产精品免费人成网站酒店 | 日日噜噜夜夜狠狠久久波多野| 九九热免费在线| 日本免费www| 性色国产成人久久久精品 | ass极品国模人体欣赏| 免费看污片网站| av黄色免费网站| 精品人妻无码一区| 国产破处视频在线观看| 天堂资源在线视频| 538任你躁在线精品视频网站| 四虎免费在线视频| 国产白袜脚足j棉袜在线观看| 扒开伸进免费视频| 男女黄床上色视频| 五月婷婷婷婷婷| 国产精品精品软件男同| 一卡二卡三卡四卡五卡| 在线视频 日韩| 少妇太紧太爽又黄又硬又爽小说| 99成人在线观看| 这里只有精品在线观看视频 | 成年人免费视频播放| 欧美视频www| 日本japanese极品少妇| 污污视频网站在线免费观看| 成人免费毛片东京热| 久久久久久久久久影视| 欧美做受xxxxxⅹ性视频| 男人av资源站| 亚洲成av人片在线观看无| 性欧美丰满熟妇xxxx性仙踪林| 日本美女xxx| 中文字幕乱视频| 亚洲一级黄色录像| 成人在线观看一区二区| 麻豆一区在线观看| 97香蕉碰碰人妻国产欧美| 国产尤物在线播放| www成人啪啪18软件| 日韩精品视频一区二区| 老熟妇高潮一区二区三区| 成人免费看片载| 中文国语毛片高清视频| 中文字幕第4页| 人妻 日韩 欧美 综合 制服| 最新黄色av网址| 人妻少妇无码精品视频区| 稀缺小u女呦精品呦| 51精品免费网站| 女人十八毛片嫩草av| 搡老熟女老女人一区二区| 中文字幕无人区二| 99热精品免费| 少妇性l交大片7724com| 国产jizz18女人高潮| 手机免费看av| 久久久久久久毛片| 女~淫辱の触手3d动漫| 欧美夫妇交换xxx| 欧美xxxxx少妇| 黄色在线免费播放| 亚洲欧美高清在线| 成年人小视频在线观看| 在线播放av网址| 少妇熟女视频一区二区三区| 在线观看你懂的视频| 天天爽夜夜爽视频| 人妻互换一二三区激情视频| 亚洲精品成人无码毛片| 国产不卡一二三| 久久精品国产亚洲av久| 男人的天堂官网 | 性爱在线免费视频| 极品久久久久久久| 色欲一区二区三区精品a片| 性生交大片免费全黄| 91麻豆精品成人一区二区| 欧洲猛交xxxx乱大交3| 国产精久久久久| 久久亚洲AV成人无码国产野外| 精品无码国产污污污免费网站| 99自拍偷拍视频| 午夜性福利视频| 亚洲做受高潮无遮挡| www.黄色com| 亚洲午夜久久久久久久久| 久久久久久九九九九九| 欧美xxxooo| 日本道中文字幕| 国产精品国产三级国产专业不| 制服丨自拍丨欧美丨动漫丨| 久久无码专区国产精品s| 欧美18—19性高清hd4k| 粉嫩av性色av蜜臀av网站| 日本不卡视频一区| 欧美一区二区三区粗大| 麻豆免费在线观看视频| 中文字幕av网址| 69av.com| 中文字幕伦理片| 国模私拍在线观看| 青青草原在线免费观看| 噜噜噜在线视频| 日韩a级片在线观看| 国产真人做爰视频免费| 日本人添下边视频免费| 欧日韩不卡视频| 精品无人区无码乱码毛片国产| 国产又粗又猛又爽又黄| 北条麻妃在线观看视频| 魔女鞋交玉足榨精调教| 中文字幕99页| 黄色片子免费看| 欧美另类videoxo高潮| gv天堂gv无码男同在线观看| 人妻互换一二三区激情视频| 久久久久亚洲av片无码| av黄色免费在线观看| 丁香激情五月少妇| av直播在线观看| 最新在线黄色网址| 美女黄色一级视频| 中文字幕在线视频播放| 五月天丁香社区| 成年人的黄色片| 日韩网站在线播放| 亚洲午夜福利在线观看| 日韩精品卡通动漫网站| 性欧美成人播放77777| 色哟哟无码精品一区二区三区| 丰满少妇高潮久久三区| 波多野结衣中文字幕在线播放| 欧美第一页在线观看| 真实国产乱子伦对白在线| 日本r级电影在线观看 | 素人fc2av清纯18岁| 欧美xxxxx精品| 污污污www精品国产网站| 丰满岳乱妇一区二区| 亚洲av片不卡无码久久| 亚洲最大免费视频| 亚洲自拍偷拍一区二区| 男人的天堂av网| 日韩精品一区二区亚洲av性色 | 久久久久亚洲AV成人| jjzz黄色片| 色欲av无码一区二区三区| 日本一级免费视频| 免费在线观看黄色小视频| 制服下的诱惑暮生| 性色av蜜臀av色欲av| 在线国产视频一区| 精品熟妇无码av免费久久| 欧美老女人性生活视频| 性高潮久久久久久| 人人妻人人藻人人爽欧美一区| 天天干天天操天天拍| 日本人dh亚洲人ⅹxx| 国产精品久久AV无码| www中文在线| 国产一级免费片| 欧美a在线播放| 日韩免费高清一区二区| 天美传媒免费在线观看| 亚洲欧美激情一区二区三区| 天堂久久精品忘忧草| 伦伦影院午夜理论片| 欧美激情视频二区| 国产黑丝一区二区| 日韩女优一区二区| 国产伦理片在线观看| 星空大象在线观看免费播放| 国产成人在线网址| 精品人妻少妇嫩草av无码| 国产精品嫩草影院俄罗斯| 一级黄色性视频| 美女又爽又黄免费| 在线播放av网址| 国产精品免费人成网站酒店| 草草影院第一页|