本次新加坡代写是大数据相关的报告
Assignment Deliverables and Conditions
• Final Documentation must be word processed. The maximum of 1500 words is recommended. Your document should be submitted as a words/pdf document in the form NAME.pdf, where NAME is your name in camel case.
• Late submissions will be deducted by 5/100 marks every day. If you have a genuine reason for needing to submit late, you can request an extension from the lecturer.
• Citation of facts is mandatory. Obtain your facts from credible sources into references / bibliography. Avoid ‘dumping of data’. Instead, the facts that you discuss should be made relevant to your case/project.
Part A (20 marks)
The idea of Big Data is not new. Companies in many fields has been using Big Data for different goals. One of it is market research. This gives them leverage in terms of having insights and helping them solve issues arising in the market. In relation to Big Data, answer the following questions:
1. Explain the 7 V’s of Big Data in your own words and with examples (with diagram).
2. Describe all the FOUR (4) types of table joins with examples (with table, SQL code and
diagram).
Part B (20 marks)
Big Data is only useful when it can detect the trends and patterns of the data. Machine learning can help this process by the way of incorporating its algorithms. There are many tools in the market today hat is offering hands-on features to undergo machine learning algorithms. With finding the right rends and patterns in mind, answer the following questions:
1. Choose one machine learning tool of your liking and give the detail of the below:
i. The history of the tool.
ii. The history of the developer.
2. Go to this link https://www.kaggle.com/jsphyg/weather-dataset-rattle-
package?select=weatherAUS.csv and register. Then:
i. Download the dataset that is named “weatherAUS.csv”.
ii. Observe the dataset and you will be able to see a column named “RainTomorrow”. This
column will be your label/target.
iii. Import this dataset into the tool that you chose in Question 1.
iv. Do a data clean up i.e. removing NA data rows, replacing NA data.
v. Choose a machine learning algorithm to be used in the tool and explain why you chose
it.
vi. Perform the process using this tool to find the most optimal features from the dataset
to predict the label/target.
vii. Choose an evaluation technique and explain why you chose this technique.
viii. Perform an evaluation on your method.
*Show all the step-by-step with diagram and explanation.