美女扒开腿免费视频_蜜桃传媒一区二区亚洲av_先锋影音av在线_少妇一级淫片免费放播放_日本泡妞xxxx免费视频软件_一色道久久88加勒比一_熟女少妇一区二区三区_老司机免费视频_潘金莲一级黄色片_精品国产精品国产精品_黑人巨大猛交丰满少妇

代做IEMS 5730、代寫 c++,Java 程序設計

時間:2024-03-11  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the submitted homework.
I declare that the assignment submitted on Elearning system is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________ Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must be created COMPLETELY by oneself ALONE. A student may not share ANY written work or pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has discussed or worked with. If the answer includes content from any other source, the student MUST STATE THE SOURCE. Failure to do so is cheating and will result in sanctions. Copying answers from someone else is cheating even if one lists their name(s) on the homework.
If there is information you need to solve a problem, but the information is not stated in the problem, try to find the data somewhere. If you cannot find it, state what data you need, make a reasonable estimate of its value, and justify any assumptions you make. You will be graded not only on whether your answer is correct, but also on whether you have done an intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.

 Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference [1] and [2] to download the two datasets. Each line in these two files has the following format (TAB separated):
bigram year match_count
An example for 1-grams would be:
volume_count
circumvallate 1978 335 91 circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall, from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7] to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared. Assume the data set contains all the 1-grams in the last 100 years, and the above records are the only records for the word ‘circumvallate’. Then the average value is:
 instead of
(335 + 261) / 2 = 298, (335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences per year along with their corresponding average values sorted in descending order. If multiple bigrams have the same average value, write down anyone you like (that is,

 break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive 2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with the same datasets stored in the HDFS. Rerun the Pig script in this cluster and compare the performance between Pig and Hive in terms of overall run-time and explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data set. Once your Hive commands/ scripts work as desired, you can then run them up on the complete data set.
 
 Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this homework, you will implement a similar-users-detection algorithm for the online movie rating system. Basically, users who rate similar scores for the same movies may have common tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this homework, the similarity between a given pair of users (e.g. A and B) is measured as the total number of movies both A and B have watched divided by the total number of movies watched by either A or B. The following is the formal definition of similarity: Let M(A) be the set of all the movies user A has watched. Then the similarity between user A and user B is defined as:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) = |𝑀(𝐴)∩𝑀(𝐵)| ...........(**) |𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented by its unique userID and each movie is represented by its unique movieID. The format of the data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
  
 cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the list of the 10 pairs of users having the largest number of movies watched by both users in the pair within the corresponding dataset. The format of your answer should be as follows:
<userID A>, <userID B>, <the number of movie both A and B have watched> //top 1 ...
<userID X>, <userID Y>, <the number of movie both X and Y have watched> //top 10
(b) [20 marks] By modifying/ extending part of your codes in part (a), find the Top-K (K=3) most similar users (as defined by Equation (**)) for every user in the datasets [3], [4]. If multiple users have the same similarity, you can just pick any three of them.
(c)
Hint:
1. In part (b), to facilitate the computation of the similarity measure as
defined in (**), you can use the inclusion-exclusion principle, i.e.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:&#160;ICT239 代做、代寫 java/c/c++程序
  • 下一篇:代寫COMP9334 Capacity Planning of Computer
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    美女扒开腿免费视频_蜜桃传媒一区二区亚洲av_先锋影音av在线_少妇一级淫片免费放播放_日本泡妞xxxx免费视频软件_一色道久久88加勒比一_熟女少妇一区二区三区_老司机免费视频_潘金莲一级黄色片_精品国产精品国产精品_黑人巨大猛交丰满少妇
    欧美黑人欧美精品刺激| 黄色av网址在线观看| 一本色道久久hezyo无码| 久久美女免费视频| 日本一区二区免费视频| 成年人av电影| 午夜剧场免费在线观看| 日本视频在线免费| 中文字幕人妻一区二区三区在线视频| 亚洲一区二区三区三州| 国产农村妇女精品一区| 精品人妻互换一区二区三区| 影音先锋黄色资源| 黄色性生活一级片| 国产精品一级黄片| 欧美图片一区二区| 羞羞在线观看视频| 日本少妇xxxx软件| 岛国大片在线免费观看| 国产精品毛片一区二区| 小早川怜子一区二区的演员表| 欧美精品日韩在线| 四虎国产精品成人免费入口| 在线观看你懂的视频| 老司机福利在线观看| 麻豆网站免费观看| 亚洲色图 激情小说| 波多野结衣 在线| 黄色三级生活片| 秋霞欧美一区二区三区视频免费| 亚洲熟女少妇一区二区| 欧美大片xxxx| 中文字幕99页| 精品少妇一区二区三区免费观| 丰满少妇一区二区| 极品粉嫩小仙女高潮喷水久久 | 怡红院一区二区| 激情av中文字幕| 粉嫩av蜜桃av蜜臀av| 色哟哟一一国产精品| av电影中文字幕| 成年人网站免费在线观看| 中文字幕欧美激情极品| 日韩av成人网| 亚洲自拍偷拍图| www青青草原| 黄瓜视频污在线观看| 色老板免费视频| 国产在线观看无码免费视频| 中文字幕精品亚洲| 成人在线视频免费播放| 青青青手机在线视频| 午夜福利三级理论电影| 日本二区在线观看| 国产精品日日摸夜夜爽| 性欧美精品男男| 美女伦理水蜜桃4| 国产成人在线网址| 精品无码国产污污污免费网站 | 少妇高潮一69aⅹ| 中文字幕在线观看免费高清| 成年人性生活视频| 97精品在线播放| 永久免费av无码网站性色av| 五月婷婷综合激情网| 999精品视频在线观看播放| 少妇av片在线观看| 刘亦菲国产毛片bd| 国产一二三av| 婷婷色中文字幕| 风韵丰满熟妇啪啪区老熟熟女| 国产成人自拍网站| 亚洲AV无码久久精品国产一区| 亚洲国产av一区| 在线精品视频播放| 韩国av中国字幕| 亚洲香蕉中文网| 韩国黄色一级片| 一级黄色大片免费看| 可以直接看的黄色网址| 欧美偷拍第一页| 中国毛片直接看| 欧美三级日本三级| 中文字幕av一区二区三区人妻少妇| 日韩欧美国产成人精品免费| 国产精品嫩草影院俄罗斯| 女人扒开双腿让男人捅| 亚洲天堂av中文字幕| 在线观看免费小视频| av地址在线观看| 国内精品久久99人妻无码| 久久一级免费视频| 国产精品熟妇一区二区三区四区| 18禁裸乳无遮挡啪啪无码免费| 自拍偷拍你懂的| 美女露出粉嫩尿囗让男人桶| 变态另类ts人妖一区二区| 小日子的在线观看免费第8集| 波多野结衣视频播放| 国产免费美女视频| 欧美图片一区二区| 野战少妇38p| 国产一区二区播放| 女女互磨互喷水高潮les呻吟| 国产精品熟女一区二区不卡| 调教驯服丰满美艳麻麻在线视频| 91丨porny丨九色| 伊人久久久久久久久久久久久久| 风韵丰满熟妇啪啪区老熟熟女| 久久久精品成人| 国产传媒第一页| 日韩av无码一区二区三区不卡 | 黑人操日本美女| 久久婷婷五月综合| 精品无码人妻少妇久久久久久| 白白色免费视频| 无码成人精品区在线观看| 色婷婷狠狠18禁久久| 波多野结衣喷潮| 国产午夜精品理论片| 午夜剧场免费在线观看| 国产男女猛烈无遮挡在线喷水| 国产三级短视频| 国产调教在线观看| av网站免费在线看| 欧美黄色一级生活片| 亚洲AV无码成人精品区明星换面| 四虎永久免费影院| 熟女俱乐部一区二区| 亚洲成人网在线播放| 国产精成人品免费观看| 免费看黄色av| av在线免费播放网址| 久久爱一区二区| 国产高潮流白浆| 黄色国产在线视频| 加勒比精品视频| 久久视频精品在线观看| 性欧美成人播放77777| 国产激情在线免费观看| 一级片久久久久| 真实国产乱子伦对白在线| 久久久久久国产精品日本| 岛国精品一区二区三区| 少妇真人直播免费视频| 夫妇交换中文字幕| 香蕉网在线视频| 中文字幕国产综合| www.xx日本| 亚洲少妇一区二区| 国产精品天天干| www.四虎精品| 欧日韩不卡视频| 精品久久久久一区二区| 卡一卡二卡三在线观看| 国产suv一区二区三区| av无码av天天av天天爽| 色哟哟一一国产精品| 亚洲熟妇一区二区| 精品熟妇无码av免费久久| 好吊操视频这里只有精品| 熟女俱乐部一区二区| 97精品人人妻人人| 日韩亚洲欧美中文字幕| 成年人在线观看av| 亚洲国产精品免费在线观看| a毛片毛片av永久免费| 亚洲一区和二区| 日本黄色片免费观看| 亚洲性猛交xxxx乱大交| av在线天堂网| 欧美日韩色视频| 999精品久久久| 少妇人妻好深好紧精品无码| 亚洲国产第一区| 少妇献身老头系列| 在线免费日韩av| 日本高清不卡免费| 特黄一区二区三区| 91麻豆精品久久毛片一级| 少妇愉情理伦三级| 成人小视频免费看| 亚洲av毛片基地| 99精品中文字幕| 美国一级黄色录像| 国产一区二区三区四区在线| 久久成人激情视频| 国产123在线| 欧美成人午夜精品免费| 人妻丰满熟妇aⅴ无码| avtt香蕉久久| 精品人妻一区二区免费视频| 91玉足脚交白嫩脚丫| 久久人人妻人人人人妻性色av| 国产又黄又粗又猛又爽的视频| 成人网站免费观看| 正在播放国产对白害羞| 欧美做爰啪啪xxxⅹ性| 四虎国产精品免费| 最近中文字幕无免费|