COMPSCI 752
Big Data Management
Assignment 3 / Semester 1, 2022
Data Semantics and Knowledge Graph
Student ID:
Name:
Intruction:
There are 2 exercises that are worth 5 marks. Each exercise has several questions with
its mark distribution. Good luck.
Submission:
Please submit it as a single pdf file on CANVAS by 5pm, Fri 6 May 2022.
For the required Python script, you should embed it into the pdf file.
Penalty Dates
The assignment will not be accepted after the last penalty date unless there are special
circumstances (e.g., sickness with certificate). Penalties will be calculated as follows as
a percentage of the mark for the assignment.
• By 5pm, Fri 6 May 2022 – No penalty
• By 5pm, Sat 7 May 2022 – 25% penalty
• By 5pm, Sun 8 May 2022 – 50% penalty
1
1 Querying data through RDFS [1.5 marks]
Suppose that our Tbox T and Abox A are defined as follows:
Tbox T :
RegisteredIn rdfs : domain Student (1)
RegisteredIn rdfs : range Program (2)
HasPrograms rdfs : range Program (3)
Design rdfs : subPropertyOf HasProgram (4)
LedBy rdfs : domain Dept (5)
LedBy rdfs : range Professor (6)
Abox A:
Design(Stats,DataScience) (7)
Design(CS, InfoSys) (8)
RegisteredIn(Alice,DataScience) (9)
RegisteredIn(Peter, InfoSys) (10)
RegisteredIn(Mary,DataScience) (11)
LedBy(CS,Giovanni) (12)
We consider the following conjunctive query:
q(x) : − Student(x),RegisteredIn(x, y),HasPrograms(z, y),Dept(z)
Questions:
1. What is the answer of q(x) when evaluated on only Abox A? Explain the answer.
[0.5 marks]
2. What is the answer of q(x) when evaluated on both Tbox and Abox < T,A >?
Explain the answer. [1 mark]
2
2 Knowledge graph [3.5 marks]
We will build a knowledge graph based on the profile text from Ninh’s homepage.
"Prior to joining University of Auckland in December 2018, Ninh worked in
Copenhagen for 7 years at the University of Copenhagen and IT University of
Copenhagen. He received his PhD at IT University of Copenhagen under the
supervision of Professor Rasmus Pagh in 2014. After that, he spent 4 years
in postdoctoral positions in Copenhagen. He was the recipient of the best
paper awards in WWW Conference 2014 and PKDD 2020. AMiner has recognized
him as the 2022 AI 2000 Most Influential Scholar Honorable Mention in Data
Mining (Rising Star) for his outstanding and vibrant contributions to this
field between 2012 and 2021."
1. Unsupervised method: Assume that nouns will be entities, and verbs form
relations. Using NLP techniques (e.g. nltk packages), write a small Python script
to parse the above text into entities and relationships. [1 mark]
Construct a knowledge graph based on the parsing result. [0.5 marks]
2. Supervised method: Using a pre-trained model (e.g. https://spacy.io/models/
en) to parse the above text for entities and verbs. [1 mark]
Assume that verbs form relations, construct a knowledge graph based on the pars-
ing result. [0.5 marks]
3. If we use some specific nouns as verbs, e.g. supervision, award, contribution,
how do the constructed knowledge graphs above change? [0.5 marks]
3