7022DATSCI Big Data Analysis

7022DATSCI—Mini-projects
Master of Sensors Data and Management
Big Data Analysis

Project: Big Data Analysis

The aim of the Big Data Analysis project is to apply a machine learning method in a practical setting. In each of the following projects you are asked to...

  1. Work on a practical machine learning project.
  2. Present your work in a presentation.

You will work on your projects in groups of 3-5 students. The following list contains suggestions for project topics. Additional topics might become available and you can also suggest alternative topics:

  • “3, 6, 8, 9?”—recognising hand-written digits with principal component analysis

Apply principal component analysis for recognising handwritten digits as explained in (Lu, 2017) (but without the pre-processing using Histograms of Oriented Gradients (HOG)) to the MNIST data set. http://yann.lecun.com/exdb/mnist/

  • Googling food webs—the PageRank of extinction

Implement the variant of the PageRank algorithm described in (Allesina and Pascual, 2009) and reproduce the study for some of the food webs from this article. Note that some of the food webs are available in R by installing the cheddar library.

  • MCMC for code cracking

A highly original application of Markov chain Monte Carlo (MCMC) was presented by (Diaconis, 2009) and extended by (Chen and Rosenthal, 2012). Implement and test the approach by reproducing the example described in (Diaconis, 2009).

References

Allesina, S., Pascual, M., 09 2009. Googling food webs: Can an eigenvector measure species’ importance for coextinctions? PLOS Computational Biology 5 (9), 1–6.

URL https://doi.org/10.1371/journal.pcbi.1000494

Chen, J., Rosenthal, J., 2012. Decrypting classical cipher text using Markov chain Monte Carlo. Statistics and Computing 22, 397–413.

URL https://doi.org/10.1007/s11222-011-9232-5

Diaconis, P., 2009. The Markov Chain Monte Carlo Revolution. Bulletin of the American Mathematical Society 46 (2), 179–205.

Lu, W., 2017. Handwritten digits recognition using PCA of histogram of oriented gradient. In: 2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM). pp. 1–5.

What you should hand in

  1. Each group: A Powerpoint presentation with 10 minutes recorded audio (25%).
  2. Every student: A one-page summary of your mini-project (25%)
  3. Every student: A text file containing your commented source code (50%).

Important! All group members will receive the same mark for the Powerpoint presentation, one-page summary and code demonstration will be marked individually.

Presentation/One-page summary

Partial mark

Introduction

Brief description of your application

Motivation: Which challenge are you going to address?

5%

Implementation

What are the challenges of implementing the algorithm?

Explain how you implemented the method.

15%

Results

What have you found out about your data set?

Show how your machine learning method addresses the challenge described in the Introduction.

10%

Discussion

Brief summary of the analysis of the data

Critically reflect how well the challenge described in the Introduction was solved by your machine learning approach.

10%

Formal marks

Visual presentation

Delivery of the talk

Time keeping

10%

Total

50%

Source code (submitted to Canvas and demonstration)

Partial mark

Completeness of the implementation

20%

Demonstration

10%

Clarity of the code

10%

Quality of Comments

10%

Total

50%

STEP BY STEP PROJECT ANSWER WITH EXPLANATION for Recognizing Hand-Written Digits with Principal Component Analysis

Below is an outline for the PowerPoint presentation and a one-page summary for the mini-project on "Recognizing Hand-Written Digits with Principal Component Analysis." Keep in mind that this is a simplified example, and you'll need to expand on each section with detailed content, visuals, and explanations.

PowerPoint Presentation Outline:

Slide 1: Title Slide

  • Project Title: Recognizing Hand-Written Digits with PCA
  • Group Members: [List Names]

Slide 2: Introduction

  • Brief Description: Applying PCA for Handwritten Digit Recognition
  • Motivation: Addressing the challenge of efficient digit recognition using PCA.

Slide 3: Implementation Challenges

  • Challenges Faced: Dimensionality of MNIST data, Eigenvalue computation, Interpretability.
  • Approach: Explain how PCA can reduce dimensionality while retaining essential information.

Slide 4: Method Implementation

  • Overview: Describe how PCA is applied step by step.
  • Algorithm: Highlight the mathematical process of PCA.
  • Code Snippet: Include a concise code snippet for PCA implementation.

Slide 5: Results

  • Data Set: Briefly introduce the MNIST dataset.
  • Findings: Showcase the performance of PCA in reducing dimensionality.
  • Visuals: Display graphs or charts comparing the original and reduced data.

Slide 6: Machine Learning Approach

  • Challenge Addressed: Discuss how PCA addresses the challenge of efficient digit recognition.
  • Interpretability: Explain how the reduced dimensions still capture digit features.

Slide 7: Discussion

  • Analysis Summary: Briefly summarize the analysis of the data.
  • Successes: Discuss the effectiveness of PCA in reducing dimensionality.
  • Limitations: Mention any shortcomings or areas of improvement.

Slide 8: Code Walkthrough

  • Demonstrate the key parts of the PCA code.
  • Explain relevant functions and their role in the implementation.

Slide 9: Conclusion

  • Recap the main points of the project.
  • Highlight the significance of using PCA for digit recognition.

Slide 10: Q&A

  • Open the floor for questions from the audience.

One-Page Summary:

Title: Recognizing Hand-Written Digits with PCA

Introduction: Our project involves the application of Principal Component Analysis (PCA) for recognizing hand-written digits. The primary motivation is to address the challenge of efficient digit recognition by reducing the dimensionality of the MNIST dataset while retaining crucial information.

Implementation Challenges: Implementing PCA posed challenges due to the high dimensionality of the MNIST data, the computation of eigenvalues, and the need for maintaining interpretability while reducing dimensions.

Method Implementation: We implemented PCA by following these steps: data preprocessing, covariance matrix computation, eigenvalue decomposition, selection of principal components, and reconstruction of the data. The algorithm was coded in Python using libraries like NumPy.

Results: Upon applying PCA to the MNIST dataset, we observed a significant reduction in dimensionality while maintaining digit information. Visualizations showed that principal components capture essential digit features effectively.

Machine Learning Approach: PCA successfully addresses the challenge of efficient digit recognition by reducing the number of dimensions while preserving the most relevant information. The reduced dimensions retain interpretability, aiding in understanding the recognition process.

Discussion: Our analysis demonstrated that PCA effectively reduces the dimensionality of the MNIST dataset. While PCA is successful in addressing dimensionality challenges, it might not capture all intricate digit details. Balancing dimensionality reduction and information preservation remains a consideration.

Code Walkthrough: During the code walkthrough, we explained key functions such as covariance matrix calculation, eigenvalue decomposition, and dimension reduction. We highlighted how these functions contribute to the PCA implementation.

Conclusion: Our project showcases the power of PCA in reducing dimensionality for hand-written digit recognition. By maintaining interpretability and minimizing information loss, PCA offers a valuable approach to address the challenge of efficient digit recognition.

This is a general outline for your presentation and summary. You'll need to add relevant content, detailed explanations, visuals, and code snippets to each section.

AssignmentHelp.net is a valuable resource that can significantly assist in preparing a successful project, including a PowerPoint presentation, a one-page summary, and commented source code. This platform provides online homework help and online assignment help expertise across various domains, including database management, big data, machine learning, data warehousing, and report generation. Get in touch with our Live Chat experts today to understand how AssignmentHelp.net can enhance the quality and efficiency of your project preparation.

STEP BY STEP PROJECT ANSWER WITH EXPLANATION for Googling Food Webs - PageRank of Extinction

Here's an outline for the PowerPoint presentation and a one-page summary for the mini-project on "Googling Food Webs - PageRank of Extinction."

PowerPoint Presentation Outline:

Slide 1: Title Slide

  • Project Title: PageRank of Extinction in Food Webs
  • Group Members: [List Names]

Slide 2: Introduction

  • Brief Description: Implementation of PageRank Algorithm for Food Web Analysis
  • Motivation: Addressing the challenge of measuring species' importance for coextinctions using PageRank.

Slide 3: Implementation Challenges

  • Challenges Faced: Understanding the PageRank algorithm, Adaptation to food webs, Data preparation.
  • Approach: Describe how we adapted the PageRank algorithm for food web analysis.

Slide 4: Method Implementation

  • Overview: Explain the key concepts of the PageRank algorithm.
  • Algorithm Variant: Describe the variant from (Allesina and Pascual, 2009) for measuring species' importance.
  • Code Snippet: Include a concise code snippet for implementing the algorithm.

Slide 5: Data Source and Preparation

  • Data Source: Introduction to food webs from (Allesina and Pascual, 2009).
  • Data Cleaning: Describe any preprocessing steps required for the data.

Slide 6: Results

  • Data Analysis: Summarize the results obtained from applying the PageRank variant to food webs.
  • Importance Metrics: Present the importance scores of species based on PageRank of extinction.

Slide 7: Machine Learning Approach

  • Challenge Addressed: Explain how the PageRank variant addresses the challenge of measuring species' importance.
  • Interpretability: Discuss how the PageRank scores reflect species' significance for coextinctions.

Slide 8: Discussion

  • Analysis Summary: Briefly discuss the findings from the food web analysis.
  • Interpretation: Reflect on the strengths and limitations of the PageRank approach for this context.

Slide 9: Code Walkthrough

  • Guide through the implementation of the PageRank variant for food webs.
  • Explain key functions and their role in the algorithm.

Slide 10: Conclusion

  • Recap the main points of the project.
  • Emphasize the relevance of the PageRank approach in understanding species' importance for coextinctions.

One-Page Summary:

Title: PageRank of Extinction in Food Webs

Introduction: Our project involves the implementation of a variant of the PageRank algorithm to measure species' importance for coextinctions in food webs. The motivation stems from the challenge of quantifying the significance of species in the context of ecosystem stability and coextinctions.

Implementation Challenges: Our primary challenge was understanding the PageRank algorithm and adapting it to the unique context of food webs. Additionally, we needed to ensure accurate data preparation and alignment with the variant proposed by (Allesina and Pascual, 2009).

Method Implementation: We implemented a variant of the PageRank algorithm to analyze food webs. This involved adapting the core PageRank principles to the coextinction scenario, as outlined by (Allesina and Pascual, 2009). Our implementation was carried out using R and leveraged the cheddar library for available food web data.

Data Source and Preparation: We utilized food web data from (Allesina and Pascual, 2009) for our analysis. After data collection, we performed necessary data cleaning and preprocessing to ensure the accurate application of the PageRank variant.

Results: The application of the PageRank variant to food webs yielded insights into species' importance for coextinctions. The obtained PageRank scores provided a quantitative measure of species' influence on overall ecosystem stability.

Machine Learning Approach: The PageRank variant effectively addresses the challenge of quantifying species' importance in the context of coextinctions. By incorporating species interactions and dependencies, the algorithm provides a holistic understanding of ecosystem dynamics.

Discussion: Our analysis highlighted the significance of the PageRank approach in assessing species' importance in food webs. However, limitations may arise from assumptions inherent to the algorithm and the quality of input data.

Code Walkthrough: During the code walkthrough, we explained the adaptation of the PageRank algorithm for food web analysis. We detailed key functions and their roles in the algorithm's implementation.

Conclusion: Our project demonstrates the utility of the PageRank variant for measuring species' importance in the face of coextinctions. This approach offers valuable insights into ecosystem stability and contributes to our understanding of species dynamics in food webs.

Remember, this outline is a sample assignment starting point, and you'll need to add relevant content, explanations, data visualizations, and code explanations to each section.

STEP BY STEP PROJECT ANSWER WITH EXPLANATION for "MCMC for Code Cracking."

PowerPoint Presentation Outline:

Slide 1: Title Slide

  • Project Title: MCMC for Code Cracking
  • Group Members: [List Names]

Slide 2: Introduction

  • Brief Description: Application of MCMC for Code Cracking
  • Motivation: Addressing the challenge of deciphering classical cipher text using MCMC.

Slide 3: Implementation Challenges

  • Challenges Faced: Understanding MCMC, Adapting the approach, Code optimization.
  • Approach: Describe how we implemented and tested the MCMC approach for code cracking.

Slide 4: Method Implementation

  • Overview: Explain the key principles of the MCMC approach for code cracking.
  • Algorithm: Describe the MCMC procedure outlined by (Diaconis, 2009) and extended by (Chen and Rosenthal, 2012).
  • Code Snippet: Include a concise code snippet for implementing the MCMC algorithm.

Slide 5: Data and Experiment Setup

  • Data Source: Briefly introduce the classical cipher text used in the experiment.
  • Setup: Describe the experimental parameters and the initial conditions for MCMC.

Slide 6: Results

  • Decryption Analysis: Summarize the results obtained from the MCMC code cracking approach.
  • Success Metrics: Present the effectiveness of the MCMC approach in deciphering the code.

Slide 7: Machine Learning Approach

  • Challenge Addressed: Explain how MCMC addresses the challenge of code cracking.
  • Probabilistic Interpretation: Discuss how the MCMC process uncovers the probable original text.

Slide 8: Discussion

  • Analysis Summary: Reflect on the findings from the code cracking experiment.
  • Interpretation: Discuss the implications of successful decryption using MCMC.

Slide 9: Code Walkthrough

  • Walk through the implementation of the MCMC algorithm for code cracking.
  • Explain key functions and their roles in the decryption process.

Slide 10: Conclusion

  • Recap the main points of the project.
  • Highlight the novelty of using MCMC for code cracking and its potential applications.

One-Page Summary:

Title: MCMC for Code Cracking

Introduction:

Our project involves the implementation and testing of a Markov chain Monte Carlo (MCMC) approach for deciphering classical cipher text. The motivation is to explore the application of MCMC in solving cryptographic challenges, as presented by (Diaconis, 2009) and extended by (Chen and Rosenthal, 2012).

Implementation Challenges:

Our primary challenges included comprehending the intricate details of MCMC, adapting the MCMC procedure for code cracking, and optimizing the code for efficiency.

Method Implementation:

We implemented the MCMC approach for code cracking based on the principles outlined by (Diaconis, 2009) and extended by (Chen and Rosenthal, 2012). Our implementation involved adapting MCMC to decipher a classical cipher text using Python.

Data and Experiment Setup:

The experiment utilized classical cipher text as the input. We set up the experiment by initializing the MCMC process with appropriate parameters, including the initial conditions.

Results:

Applying the MCMC approach to the code cracking challenge yielded promising results. The MCMC algorithm successfully decrypted the classical cipher text, highlighting its potential effectiveness in deciphering codes.

Machine Learning Approach:

MCMC effectively addresses the challenge of code cracking by leveraging probabilistic sampling. The iterative nature of MCMC allows it to explore possible decryption keys and unveil the most probable original text.

Discussion:

Our analysis demonstrated the utility of MCMC in solving cryptographic challenges. Successful decryption using MCMC emphasizes its potential applications in deciphering various types of codes.

Code Walkthrough:

During the code walkthrough, we explained the key steps of the MCMC algorithm for code cracking. We highlighted functions and procedures central to the decryption process.

Conclusion:

Our project showcases the innovative application of MCMC for code cracking. The success of the experiment underscores the adaptability of MCMC to cryptographic challenges and its potential impact on code decryption.

Remember, this outline is a starting point, and you'll need to add relevant content, detailed explanations, code snippets, and visualizations to each section.

AssignmentHelp.net serves as a powerful ally for project preparation, covering areas such as database management, big data, machine learning, data warehousing, and report generation. Its multifaceted capabilities streamline the creation of PowerPoint presentations, one-page summaries, and well-commented source code.

Get Instant Online Database assignment help with Expert Guidance: AssignmentHelp.net provides access to subject-matter experts, fostering collaboration and ensuring accurate project execution. Professionals offer tailored guidance, ensuring projects meet their full potential.

Tailored Assistance for all database homework help queries: The platform's support extends from initial idea formation to project completion. Experts tailor assistance to meet project requirements, offering comprehensive solutions.

Effective Presentation of database assignment help answers and academic essay writing: AssignmentHelp.net aids in crafting engaging PowerPoint presentations. It refines content, enhances visual appeal, and optimizes the overall delivery of project concepts. Creating impactful one-page summaries becomes simpler with AssignmentHelp.net. It helps condense project essence into a concise yet comprehensive format.

Get online Database homework help with Quality Source Code: Well-commented source code is crucial for project evaluation. AssignmentHelp.net's experts enhance code quality, ensuring functionality and readability.

Online study help with Database an Datawarehouse Domain Expertise: AssignmentHelp.net covers a wide range of domains, facilitating expert assistance in diverse project topics. It serves as a one-stop solution for various subject areas.

24x7 database homework help for Timely Support: AssignmentHelp.net is available for project support at any stage, especially crucial when deadlines approach. It guarantees timely assistance to achieve successful outcomes.

Order now database assignment answers and database tutoring for this incredible Learning Opportunity. Engaging with experts on AssignmentHelp.net not only aids project preparation but also enhances understanding. It's an educational experience that enriches your learning journey.