This thesis studies efficient and private approximations to function problems on massive datasets.
Increasingly, the number of applications generating, collecting and computing on vast quantities of digital data is becoming widespread. As non-trivial analyses of these datasets typically require expensive computations, the search for efficient solutions has been the predominant theme and concern of
several different areas of applied and theoretical Computer Science research.
Another important and growing concern is the privacy implications of computing on data considered sensitive; specifically regarding distributed computations among mutually distrustful yet collaborating parties. Although substantial work on both fronts exists independently, these issues have rarely been
addressed simultaneously.
In this thesis, we tackle both challenges concurrently. We present a general feasibility result that gives sufficient conditions for a deterministic function $f$ to admit an efficient private approximation $g$. A common theme among this and our other results is proving functional privacy, an inherent and
necessary property of any protocol computing a private approximation. Our approaches to ensuring functional privacy while keeping resources at a minimal might be of independent interest and prove useful to the design of other private approximations. The specific protocols presented in this thesis are:
- Efficient private approximation protocols for the $L_p$ distance, $p\in(0,2]$, of two $n$-dimensional real-valued vectors using polylogarithmic communication in $n$. Our first protocol, for $p=2$, improves upon a previous result by reducing a quadratic computation to a quasi-linear one. Our more
generic result for $p\in(0,2]$, however, applies also to the case where the input is presented as a data stream. The latter is the first known private approximation protocol for any streaming problem. We believe the techniques used can spur other private streaming results.
- Efficient private approximation protocols for a variety of hard counting problems, including:
- the #P-complete problem #DNF, or estimating the number of satisfiable assignments to a Boolean formula in disjunctive normal form,
- every problem within two subclasses of #P similar to Ron Fagin's logic-based characterization of NP), and
- few fixed parameter tractable (FPT) problems.
- Efficient protocols for the 1-out-of-$n$ single-server Computationally-Private Information Retrieval (CPIR) problem for $\ell$-bit strings and its symmetric variant. Unlike previous results, we achieve simultaneous polylogarithmic storage, communication, and local computation by relaxing accuracy requirements, a novel approach.