美女扒开腿免费视频_蜜桃传媒一区二区亚洲av_先锋影音av在线_少妇一级淫片免费放播放_日本泡妞xxxx免费视频软件_一色道久久88加勒比一_熟女少妇一区二区三区_老司机免费视频_潘金莲一级黄色片_精品国产精品国产精品_黑人巨大猛交丰满少妇

IEMS 5730代做、c++,Java語言編程代寫

時(shí)間:2024-03-12  來源:  作者: 我要糾錯(cuò)



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the
submitted homework.
I declare that the assignment submitted on Elearning system is original
except for source material explicitly acknowledged, and that the same or
related material has not been previously submitted for another course. I
also acknowledge that I am aware of University policy and regulations on
honesty in academic work, and of the disciplinary guidelines and
procedures applicable to breaches of such policy and regulations, as
contained in the website
http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________
Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must
be created COMPLETELY by oneself ALONE. A student may not share ANY written work or
pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has
discussed or worked with. If the answer includes content from any other source, the
student MUST STATE THE SOURCE. Failure to do so is cheating and will result in
sanctions. Copying answers from someone else is cheating even if one lists their name(s) on
the homework.
If there is information you need to solve a problem, but the information is not stated in the
problem, try to find the data somewhere. If you cannot find it, state what data you need,
make a reasonable estimate of its value, and justify any assumptions you make. You will be
graded not only on whether your answer is correct, but also on whether you have done an
intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.
Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of
Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in
books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference
[1] and [2] to download the two datasets. Each line in these two files has the following format
(TAB separated):
bigram year match_count volume_count
An example for 1-grams would be:
circumvallate 1978 335 91
circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall,
from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop
cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over
the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7]
to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per
year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared.
Assume the data set contains all the 1-grams in the last 100 years, and the above
records are the only records for the word ‘circumvallate’. Then the average value is:
(335 + 261) / 2 = 298,
instead of
(335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences
per year along with their corresponding average values sorted in descending order. If
multiple bigrams have the same average value, write down anyone you like (that is,
break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform
this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance
between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your
Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive
2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with
the same datasets stored in the HDFS. Rerun the Pig script in this cluster and
compare the performance between Pig and Hive in terms of overall run-time and
explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small
subset of the data instead of the whole data set. Once your Hive commands/ scripts
work as desired, you can then run them up on the complete data set.
Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in
the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is
aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this
homework, you will implement a similar-users-detection algorithm for the online movie rating
system. Basically, users who rate similar scores for the same movies may have common
tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this
homework, the similarity between a given pair of users (e.g. A and B) is measured as the
total number of movies both A and B have watched divided by the total number of
movies watched by either A or B. The following is the formal definition of similarity: Let
M(A) be the set of all the movies user A has watched. Then the similarity between user A
and user B is defined as:
………..(**) 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) =
|𝑀(𝐴)∩𝑀(𝐵)|
|𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented
by its unique userID and each movie is represented by its unique movieID. The format of the
data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google
Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of
movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the
list of the 10 pairs of users having the largest number of movies watched by
both users in the pair within the corresponding dataset. The format of your
answer should be as follows:
請(qǐng)加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標(biāo)簽:

掃一掃在手機(jī)打開當(dāng)前頁
  • 上一篇:COMP 315代寫、Java程序語言代做
  • 下一篇:代做CSCI 2525、c/c++,Java程序語言代寫
  • 無相關(guān)信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國(guó)家級(jí)風(fēng)景名勝區(qū)
    昆明西山國(guó)家級(jí)風(fēng)景名勝區(qū)
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗(yàn)證碼平臺(tái) 理財(cái) WPS下載

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
    ICP備06013414號(hào)-3 公安備 42010502001045

    美女扒开腿免费视频_蜜桃传媒一区二区亚洲av_先锋影音av在线_少妇一级淫片免费放播放_日本泡妞xxxx免费视频软件_一色道久久88加勒比一_熟女少妇一区二区三区_老司机免费视频_潘金莲一级黄色片_精品国产精品国产精品_黑人巨大猛交丰满少妇
    中文字幕第六页| 黄色a级片在线观看| 色欲人妻综合网| 成年人免费观看视频网站 | 欧美精品日韩在线| xxxxwww一片| 五月天婷婷色综合| sm捆绑调教视频| 中文字幕 自拍| 国产精品一区二区在线免费观看| 91视频免费看片| 亚洲欧美日韩色| 国产精品99精品无码视亚| 免费在线黄色网| 人妻巨大乳一二三区| 无码人妻aⅴ一区二区三区玉蒲团| 在线免费看黄色片| 折磨小男生性器羞耻的故事| 午夜爱爱毛片xxxx视频免费看| 亚洲色婷婷一区二区三区| 一本一本久久a久久| 99国产精品无码| 69av视频在线| 中文乱码人妻一区二区三区视频| 欧美 日本 国产| 亚洲AV无码片久久精品| 国产精品久久久视频| 亚洲综合久久av一区二区三区| www青青草原| 欧美图片自拍偷拍| 亚洲理论片在线观看| 久久久久亚洲av无码麻豆| 欧美色图校园春色| 日韩人妻一区二区三区| 黄色录像一级片| 在线观看av中文字幕| 老熟妻内射精品一区| 久久无码人妻精品一区二区三区| 国产亚洲精品精品精品| 真实国产乱子伦对白在线| 黄色性生活一级片| 91视频青青草| 中文字幕在线播放视频| 亚洲精品国产精品国自产网站| 91精品啪在线观看国产| 久久亚洲无码视频| 伊人久久一区二区三区| 日本二区三区视频| 国产三级在线观看完整版| 国产污在线观看| 九色91porny| 久久人妻无码aⅴ毛片a片app| 亚洲v国产v欧美v久久久久久| 午夜69成人做爰视频| 老女人性淫交视频| 大又大又粗又硬又爽少妇毛片| 亚洲精品激情视频| 91视频免费看片| 欧美丰满美乳xxⅹ高潮www| 成人无码www在线看免费| 亚洲丝袜在线观看| 好吊色视频在线观看| 91麻豆精品久久毛片一级| 最近中文字幕免费| 泷泽萝拉在线播放| av直播在线观看| 粉嫩av懂色av蜜臀av分享| 一个人看的视频www| 日本在线视频播放| 国产伦理在线观看| 五月天丁香社区| 欧美成人精品一区二区综合免费| 久久精品综合视频| 一区二区三区四区五区| 亚洲伦理一区二区三区| 极品久久久久久| 99久久免费看精品国产一区 | 日韩精品一区二区三区高清免费| 午夜诱惑痒痒网| 国产精品国产高清国产| 日本青青草视频| 污污免费在线观看| 男女黄床上色视频| 亚洲人与黑人屁股眼交| 欧美体内she精高潮| 182在线视频| 日本 欧美 国产| 四虎永久免费观看| 国产精品揄拍100视频| 国产探花视频在线| 国产xxxx视频| 波多野结衣家庭教师在线观看 | 无码任你躁久久久久久老妇| 亚洲成av人片在线观看无| 一本加勒比北条麻妃| 亚洲色成人网站www永久四虎| 黄色录像免费观看| 中文字幕av一区二区三区人妻少妇| 久久亚洲AV成人无码国产野外| 熟女少妇内射日韩亚洲| 国产盗摄一区二区三区在线| 日本少妇xxxx| 国内毛片毛片毛片毛片毛片| 国产 xxxx| 全程偷拍露脸中年夫妇| 久久久久久国产免费a片| 最新国产精品自拍| 91动漫免费网站| 黄色av网址在线观看| 成人乱码一区二区三区av| 玖玖爱这里只有精品| 最近中文字幕在线mv视频在线| 国模无码国产精品视频| 欧美 日本 国产| 成人免费精品动漫网站| 人妻少妇无码精品视频区| 日本一卡二卡在线| 扒开伸进免费视频| 国产免费无码一区二区| 在线观看亚洲网站| 国产精品18在线| 综合 欧美 亚洲日本| 四虎国产精品成人免费入口| 你懂得在线视频| xxxxxx黄色| 日本69式三人交| 黄页网站在线看| 日本美女视频网站| 中文在线观看免费视频| 欧美熟妇精品一区二区蜜桃视频| 欧美一级片黄色| 无码任你躁久久久久久老妇| av av在线| 免费在线观看日韩av| 岛国大片在线免费观看| 国产婷婷在线观看| 能免费看av的网站| 欧美日韩国产一二三区| 暗呦丨小u女国产精品| 超碰在线国产97| 四虎永久免费观看| 在哪里可以看毛片| 久久爱一区二区| 91视频免费入口| 日本黄色片在线播放| 少妇特黄一区二区三区| 国产熟女一区二区| 69久久精品无码一区二区| 五十路六十路七十路熟婆| 午夜在线观看一区| 国产又色又爽又高潮免费| wwwxx日本| 国产精品18在线| 少妇一级淫片免费放播放| 日本不卡一区视频| 一级欧美一级日韩片| 九九九视频在线观看| 少妇性l交大片7724com| 中文字幕免费看| 人妻少妇偷人精品久久久任期| 玖玖爱在线观看| 国产黄色录像视频| 加勒比精品视频| 国产探花在线播放| 妖精视频在线观看免费| 波多野结衣加勒比| 老女人性淫交视频| 日本人亚洲人jjzzjjz| 成年人小视频在线观看| 亚洲激情图片网| 亚洲最大的黄色网| xxxx视频在线观看| 国产又黄又粗视频| 国产肉体xxxx裸体784大胆| 天天做夜夜爱爱爱| 国产午夜精品福利视频| 成人免费看aa片| 大乳护士喂奶hd| 一级黄色免费视频| 艳母动漫在线看| 性色av无码久久一区二区三区| 欧美肥妇bbwbbw| 添女人荫蒂视频| 挪威xxxx性hd极品| 9191在线视频| 国产老头和老头xxxx×| 日韩黄色免费观看| 外国一级黄色片| 可以看的av网址| 日本少妇一区二区三区| 久久无码人妻一区二区三区| 欧美成人777| 免费在线观看一级片| 中文字幕avav| 欧美激情 亚洲| 精品人妻一区二区三区视频| 免费看污片网站| 97精品在线播放| 国产高清在线免费观看| 中文字幕1区2区|