BEGIN:VCALENDAR VERSION:2.0 PRODID:-//jEvents 2.0 for Joomla//EN CALSCALE:GREGORIAN METHOD:PUBLISH BEGIN:VTIMEZONE TZID:America/New_York BEGIN:STANDARD DTSTART:20181104T010000 RDATE:20190310T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20191103T010000 RDATE:20200308T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20201101T010000 RDATE:20210314T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20211107T010000 RDATE:20220313T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20221106T010000 RDATE:20230312T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20231105T010000 RDATE:20240310T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20241103T010000 RDATE:20250309T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20251102T010000 RDATE:20260308T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20261101T010000 RDATE:20270314T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20271107T010000 RDATE:20280312T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20281105T010000 RDATE:20290311T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20291104T010000 RDATE:20300310T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20301103T010000 RDATE:20310309T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20311102T010000 RDATE:20320314T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20321107T010000 RDATE:20330313T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20331106T010000 RDATE:20340312T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20341105T010000 RDATE:20350311T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20180905T130000 RDATE:20181104T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20190310T030000 RDATE:20191103T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20200308T030000 RDATE:20201101T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20210314T030000 RDATE:20211107T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20220313T030000 RDATE:20221106T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20230312T030000 RDATE:20231105T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20240310T030000 RDATE:20241103T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20250309T030000 RDATE:20251102T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20260308T030000 RDATE:20261101T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20270314T030000 RDATE:20271107T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20280312T030000 RDATE:20281105T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20290311T030000 RDATE:20291104T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20300310T030000 RDATE:20301103T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20310309T030000 RDATE:20311102T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20320314T030000 RDATE:20321107T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20330313T030000 RDATE:20331106T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20340312T030000 RDATE:20341105T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:6f86ef513757dd05ab1461e14d37b84a CATEGORIES:PhD Defense CREATED:20190919T135613 SUMMARY:Searching Heterogeneous Personal Data LOCATION:CoRE A (301) DESCRIPTION;ENCODING=QUOTED-PRINTABLE:
Abstract:
Personal data is now pervasive, as digi tal devices are capturing every part of our lives. Users are constantly col lecting and saving more data, either actively in files, emails, social medi a interactions, etc., or passively by GPS tracking of mobile devices, or re cords of financial transactions. Unlike traditional information seeking, wh ich focuses on discovering new information, search on personal data is usua lly focused on retrieving information that users know exists in their own d ataset, even though most of the time they do not have a perfect recollectio n of where it is stored. Attempting to retrieve and cross-reference persona l information leads to a tedious process of individually accessing all the relevant sources of data and manually linking their information. In this sc enario, traditional searches are often inefficient, making it critical for search tools to be capable of accessing heterogeneous and decentralized dat a in a flexible and accurate way by taking into consideration the additiona l knowledge the user is likely to have about the target information.
In this dissertation, we introduce a set of techniques that allow users to easily access their own data. We start by presenting a unified and intuitiv e multidimensional data model following a combination of dimensions that na turally summarize various aspects of the data collection: who, when, where, what, why, how. We then proceed by designing frequency-based scoring model s that leverage the correlation between users (who), time (when), location (where), data topics (what), and provenance (how) to improve search over pe rsonal data. Since the scoring model proposed needs to generalize well over user-specific datasets, we extend the static scoring function by adopting a learning-to-rank approach using the state of the art LambdaMART algorithm . Due to the lack of pre-existing personal training data, a combination of known-item query generation techniques and an unsupervised ranking model (f ield-based BM25) is used to build our own training sets.
To validate the data and scoring models, we implemented tools for data extraction, clas sification, entity recognition, and topic modeling. A thorough qualitative evaluation performed over a publicly available email collection and a perso nal digital data trace collection from a real user show that our approach s ignificantly improves search accuracy when compared with traditional person al search tools such as Apple's Spotlight and Apache Solr, and techniques l ike TF-IDF, BM25, and field-based BM25.
DTSTAMP:20240329T045422Z DTSTART;TZID=America/New_York:20190906T130000 SEQUENCE:0 TRANSP:OPAQUE END:VEVENT END:VCALENDAR