Preaload Image
Back

Project Description

Smart devices, such as smartphones and smartwatches, collect vast amounts of big data, including GPS, Bluetooth, Wi-Fi, and accelerometer data, mainly in the form of sensor data. This type of data is often categorized as a specific kind of big data. While big data has been extensively utilized in predicting various aspects of human behavior, such as in the fields of Human Activity Recognition, Health Monitoring, and Autonomous Vehicles, it has a major limitation in that it it is not thick, that is, it does not carry information about the context within which it was generated, hence often used with ambiguous meanings.

To ensure data is self-explanatory and meaningful, it is necessary to explicitly represent the context, encompassing what was accomplished by a person, how and why, and in which overall situation. Context becomes necessary when using the same dataset for multiple prediction tasks, as the same sensor value may convey different meanings in varying contextual scenarios. Furthermore, context is crucial for understanding human behaviour, particularly in social or personal life, which is always context-sensitive. To address the problem of big data de-contextualization, we turn to the notion of Big-Thick Data. Big-Thick Data is big data complemented with thick data, that is, observational data about context.

Project Goal

This project aims to integrate various data, e.g., sensors, answers to machine questions, and self-reports, into Big-Thick Data as highly contextualized data by modeling Observation Contexts. We first define the two main components of Observation Contexts, Personal Context and Reference Context, which encode the world from a person’s subjective view and an objective all-encompassing view of a third-party observer. Then the observation context is built by composing (a part of) the objective reference context and with (parts of) one or more subjective personal contexts based on shared identifying information, e.g., names, identifiers, spatio-temporal coordinates. The unification process is flexible depending on the specific purpose, thus allowing for different types of questions about the users’ behavior and the surrounding world, also for multiple different answers to the same question.

Image Credits: Eduardo Magrani

The Solution

We integrate Big-Thick Data as highly contextualized data by modeling Observation Contexts. Observation Contexts are based on two main components: one or more users’ Personal Contexts and a Reference Context. A personal context encodes a person’s subjective view of the world, e.g., where she is, what she is doing, who she is with, and her mood. We model personal contexts, in time, as Personal Big-Thick Data obtained by integrating Personal Big-Data, e.g., sensor or social media data, with user provided descriptions of the current situation, e.g., human answers to machine questions and human self-reports. A reference context provides a user-independent objective all-encompassing view of a third-party observer. It keeps track of the environment within which users are operating, defined in terms of a reference observation period and a reference location, e.g., one day and the city of Trento. The reference context can be built out of any type of spatio-temporal (big) data, e.g., OpenStreetMap (OSM) or the Italian Spatial Data portal.

The observation context is built by composing (a part of) the objective reference context and with (parts of) one or more subjective personal contexts based on shared identifying information, e.g., names, identifiers, spatio-temporal coordinates. We call this process Context Unification. Context unification is a flexible process depending on the specific purpose, thus allowing for different types of questions about the users’ behaviour and the world around them and, also, for multiple different answers to the same question.

Case Study: SmartUnitn-2 and OpenStreetMap Integration

The case study is to integrate SmartUnitn-Two (SU2) dataset and OpenStreetMap (OSM) dataset from Geofabrik, as SU2OSM Big-Thick Data. The SU2 dataset describes sequences of personal contexts from 158 students of University of Trento over four weeks. The OSM dataset represents a reference context in Trentino. The integrated SU2OSM dataset unifies the personal contexts with the reference context, which represents the personal daily lives of these students.

Project Community Webpage:
Datasets Used:
Output Datasets:

Acknowledgements

The first and most important thanks goes to Matteo Busso for inventing the term big-thick data. The work described in this project could be done thanks to the amazing group of colleagues in the KnowDive Group: Mayukh Bagchi, Simone Bocca, Andrea Bontempelli, Ali Hamza, Leonardo Havier Malcotti, Ivan Kayongo, Alessio Zamboni, Haonan Zhao, etc. This research has received funding from the European Union’s Horizon 2020 FET Proactive project “WeNet – The Internet of us”, grant agreement No 823783.

Publications and Research

Key Paper:
Giunchiglia, F., & Li, X. (2024). Big-Thick Data generation via reference and personal context unification. ECAI 2024: 26th European Conference on Artificial Intelligence.
Available at: https://arxiv.org/abs/2409.05883

Project Team