BEGIN:VCALENDAR VERSION:2.0 PRODID:-//jEvents 2.0 for Joomla//EN CALSCALE:GREGORIAN METHOD:PUBLISH BEGIN:VTIMEZONE TZID:America/New_York BEGIN:STANDARD DTSTART:20180108T110000 RDATE:20180311T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20181104T010000 RDATE:20190310T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20191103T010000 RDATE:20200308T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20201101T010000 RDATE:20210314T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20211107T010000 RDATE:20220313T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20221106T010000 RDATE:20230312T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20231105T010000 RDATE:20240310T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20241103T010000 RDATE:20250309T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20251102T010000 RDATE:20260308T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20261101T010000 RDATE:20270314T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20271107T010000 RDATE:20280312T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20281105T010000 RDATE:20290311T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20291104T010000 RDATE:20300310T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20301103T010000 RDATE:20310309T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20311102T010000 RDATE:20320314T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20321107T010000 RDATE:20330313T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20331106T010000 RDATE:20340312T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:STANDARD DTSTART:20341105T010000 RDATE:20350311T030000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 TZNAME:America/New_York EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20180311T030000 RDATE:20181104T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20190310T030000 RDATE:20191103T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20200308T030000 RDATE:20201101T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20210314T030000 RDATE:20211107T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20220313T030000 RDATE:20221106T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20230312T030000 RDATE:20231105T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20240310T030000 RDATE:20241103T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20250309T030000 RDATE:20251102T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20260308T030000 RDATE:20261101T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20270314T030000 RDATE:20271107T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20280312T030000 RDATE:20281105T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20290311T030000 RDATE:20291104T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20300310T030000 RDATE:20301103T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20310309T030000 RDATE:20311102T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20320314T030000 RDATE:20321107T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20330313T030000 RDATE:20331106T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT BEGIN:DAYLIGHT DTSTART:20340312T030000 RDATE:20341105T010000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 TZNAME:America/New_York EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:2669ae744d365341350118b869a32195 CATEGORIES:Pre-Defense CREATED:20190823T084023 SUMMARY:Searching Heterogeneous Personal Data LOCATION:CoRE A 301 DESCRIPTION;ENCODING=QUOTED-PRINTABLE:
Abstract :
Person al data is now pervasive, as digital devices are capturing every part of ou r lives. Users are constantly collecting and saving more data, either activ ely in files, emails, social media interactions, etc., or passively by GPS tracking of mobile devices, or records of financial transactions. Unlike tr aditional information seeking, which focuses on discovering new information , search on personal data is usually focused on retrieving information that users know exists in their own data set, even though most of the time they do not have a perfect recollection of where it is stored. Attempting to re trieve and cross-reference personal information leads to a tedious process of individually accessing all the relevant sources of data and manually lin king their information.In this scenario, traditional searchesare often inef ficient, making it critical for search tools to be capable of accessing het erogeneous and decentralized data in a flexible and accurate way by taking into consideration the additional knowledge the user is likely to have abou t the target information.
In this work, we propose a set of techniques that allow users t o easily access their own data. We start by proposing a unified and intuiti ve multidimensional data model following a combination of dimensions that n aturally summarize various aspects of the data collection: who, when, where , what, why, how. We then proceed by proposing frequency-based scoring mode ls that leverage the correlation between users (who), time (when), location (where), data topics (what), and provenance (how) to improve search over p ersonal data. Since the scoring model proposed needs to generalize well ove r user-specific data sets, we extend our static scoring function by adoptin g a learning-to-rank approach using the state of the art LambdaMART algorit hm. Due to the lack of pre-existing personal training data, a combination o f known-item query generation techniques and an unsupervised ranking model (field-based BM25) is used to build our own training sets.
To validate the data and scori ng models, we implemented tools for data extraction, classification, entity recognition, and topic modeling. A thorough qualitative evaluation over re al user data from a variety of data sources such as Facebook, Dropbox, and Gmail show that our approach significantly improves search accuracy when co mpared with traditional personal search tools such as Appleās Spotlight and Apache Solr, and techniques like TF-IDF and BM25.
DTSTAMP:20240329T065721Z DTSTART;TZID=America/New_York:20190109T110000 SEQUENCE:0 TRANSP:OPAQUE END:VEVENT END:VCALENDAR