COMPSCI 752 : Big Data Management


2021 Semester One (1213) (15 POINTS)

Course Prescription

Big data modelling and management in distributed and heterogeneous environments. Sample topics include: representation languages for data exchange and integration (XML and RDF), languages for describing the semantics of big data (DTDs, XML Schema, RDF Schema, OWL, description logics), query languages for big data (XPath, XQuery, SPARQL), data integration (Mediation via global-as-view and local-as-view), large-scale search (keyword queries, inverted index, PageRank) and distributed computing (Hadoop, MapReduce, Pig), big data and blockchain technology (SPARK, cryptocurrency). Recommended preparation: COMPSCI 351 or equivalent.

Course Overview

Many companies must manage large volumes of diverse data in order to stay competitive. The deep diversity of modern-day data requires data scientists to master many technologies that rely on new principles to represent, describe, and access data. The course will provide insight into the rich landscape of big data. The main aim of the course is to prepare students for big data modeling and large-scale data management in distributed and heterogeneous environments.
On the one hand, learning the principles of big data management will prepare students for a career as data scientists, independently of continuous technology changes. In particular, learning how to model, query, and integrate big data are necessary skills to get data ready for analytical purposes. For example, MapReduce algorithms such as PageRank illustrate how to efficiently rank billions of Web pages.
On the other hand, investigating current big data technologies will demonstrate what is and what is not possible today, but also highlight opportunities for future work. For instance, Spark offers an integrated technology framework for preparing and analyzing big data, while the disruptive Blockchain technology exemplifies a distributed computing system with high fault tolerance with application potential that we are only beginning to understand.

Course Requirements

Prerequisite: Approval of the Academic Head or nominee

Capabilities Developed in this Course

Capability 1: Disciplinary Knowledge and Practice
Capability 2: Critical Thinking
Capability 3: Solution Seeking
Capability 4: Communication and Engagement
Capability 5: Independence and Integrity
Capability 6: Social and Environmental Responsibilities

Learning Outcomes

By the end of this course, students will be able to:
  1. Apply state-of-the-art in representation formalisms for big data, including the eXtensible Markup Language (XML), the Resource Description Framework (RDF), the JavaScript Object Notation (JSON), the Graph Property Model, and NoSQL. (Capability 1, 2 and 3)
  2. Model big data with schema languages, including Document Type Definitions (DTDs), XML Schema, JSON schema, and RDF schema (Capability 1, 2 and 3)
  3. Assess big data with query languages, including XPath, XQuery, SPARQL, Cypher, Spark SQL (Capability 1, 2 and 3)
  4. Integrate big data with ontologies, including the Web Ontology Language OWL, Description Logics, and Knowledge Graphs (Capability 1, 2 and 3)
  5. Understand and critically evaluate how to manage and analyse big data, including techniques for searching, indexing and processing such as PageRank, MapReduce, Spark and Blockchain technology (Capability 1, 2 and 3)
  6. Present as a group to fellow students and teachers slides on your joint understanding about the state-of-the-art knowledge on a topic about big data (Capability 1, 2 and 4)
  7. Communicate their individual understanding of the state-of-the-art knowledge on a research topic about big data in the form of a written report, including the fair use of this knowledge in business and society (Capability 1, 2, 3, 4, 5 and 6)


Assessment Type Percentage Classification
Assignments 10% Individual Coursework
Presentation 15% Group Coursework
Reports 15% Group Coursework
Final Exam 60% Individual Coursework
Assessment Type Learning Outcome Addressed
1 2 3 4 5 6 7
Final Exam

Special Requirements


Workload Expectations

This course is a standard 15 point course and students are expected to spend 10 hours per week involved in each 15 point course that they are enrolled in.

For this course, you can expect 3 hours of lectures, an hour tutorial for every topic, 3 hours of reading and thinking about the content and 3 hours of work on assignments and/or test preparation.

Delivery Mode

Campus Experience

Attendance is expected at scheduled activities including tutorials of the course.

Lectures will be available as recordings. Other learning activities including tutorials will be available as recordings.

Attendance on campus is required for the exam.

The activities for the course are scheduled as a standard weekly timetable delivery.

This course will also be available for remote students who could not access the campus due to the Covid-19.

Learning Resources

All materials are on Canvas.

Student Feedback

During the course Class Representatives in each class can take feedback to the staff responsible for the course and staff-student consultative committees.

At the end of the course students will be invited to give feedback on the course and teaching through a tool called SET or Qualtrics. The lecturers and course co-ordinators will consider all feedback.

Your feedback helps to improve the course and its delivery for all students.

Digital Resources

Course materials are made available in a learning and collaboration tool called Canvas which also includes reading lists and lecture recordings (where available).

Please remember that the recording of any class on a personal device requires the permission of the instructor.

Academic Integrity

The University of Auckland will not tolerate cheating, or assisting others to cheat, and views cheating in coursework as a serious academic offence. The work that a student submits for grading must be the student's own work, reflecting their learning. Where work from other sources is used, it must be properly acknowledged and referenced. This requirement also applies to sources on the internet. A student's assessed work may be reviewed against online source material using computerised detection mechanisms.


The content and delivery of content in this course are protected by copyright. Material belonging to others may have been used in this course and copied by and solely for the educational purposes of the University under license.

You may copy the course content for the purposes of private study or research, but you may not upload onto any third party site, make a further copy or sell, alter or further reproduce or distribute any part of the course content to another person.

Inclusive Learning

All students are asked to discuss any impairment related requirements privately, face to face and/or in written form with the course coordinator, lecturer or tutor.

Student Disability Services also provides support for students with a wide range of impairments, both visible and invisible, to succeed and excel at the University. For more information and contact details, please visit the Student Disability Services’ website

Special Circumstances

If your ability to complete assessed coursework is affected by illness or other personal circumstances outside of your control, contact a member of teaching staff as soon as possible before the assessment is due.

If your personal circumstances significantly affect your performance, or preparation, for an exam or eligible written test, refer to the University’s aegrotat or compassionate consideration page

This should be done as soon as possible and no later than seven days after the affected test or exam date.

Learning Continuity

In the event of an unexpected disruption we undertake to maintain the continuity and standard of teaching and learning in all your courses throughout the year. If there are unexpected disruptions the University has contingency plans to ensure that access to your course continues and your assessment is fair, and not compromised. Some adjustments may need to be made in emergencies. You will be kept fully informed by your course co-ordinator, and if disruption occurs you should refer to the University Website for information about how to proceed.

Details about how the delivery will be affected under each of the Covid-19 Alert Levels 1 to 4.

Level 1: Delivered normally as specified in delivery mode

Level 2: You will not be required to attend in person. All teaching and assessment will have a remote option. The following activities will also have an on-campus/in-person option: Lectures, tutorials, office hours .

Level 3 / 4: All teaching activities and assessments are delivered remotely 

Student Charter and Responsibilities

The Student Charter assumes and acknowledges that students are active participants in the learning process and that they have responsibilities to the institution and the international community of scholars. The University expects that students will act at all times in a way that demonstrates respect for the rights of other students and staff so that the learning environment is both safe and productive. For further information visit Student Charter


Elements of this outline may be subject to change. The latest information about the course will be available for enrolled students in Canvas.

In this course you may be asked to submit your coursework assessments digitally. The University reserves the right to conduct scheduled tests and examinations for this course online or through the use of computers or other electronic devices. Where tests or examinations are conducted online remote invigilation arrangements may be used. The final decision on the completion mode for a test or examination, and remote invigilation arrangements where applicable, will be advised to students at least 10 days prior to the scheduled date of the assessment, or in the case of an examination when the examination timetable is published.

Published on 16/12/2020 02:21 p.m.