CS 553 Questions on Readings on Computation (Part 2) A. Pig Latin: A Not So-Foreign Language for Data Processing. http://infolab.stanford.edu/~olston/publications/sigmod08.pdf 1. Why might the computer language in the paper be named Pig Latin? What might the computer language share with the original pig latin? 2. What is the data model used by Pig Latin, and how is this similar and different to Mapreduce/Hadoop and Spark? 3. Supposed you are presented the task of analyzing the web-site logs to figure out what a percentage of traffic is generated by robots, and the size of all the logs is about 100 TB. Of the three compute environments we read about: Mapreduce, Spark and Pig Latin, which would you prefer to use, and why? B. Xen and the Art of Virtualization. http://www.cl.cam.ac.uk/research/srg/netos/papers/2003-xensosp.pdf 1. What is the difference between para and full machine virtualization? 2. Why might virtualization at the whole machine level make it easier to manage multiple users than using existing OS user-management systems? 3. How does Xen appear to the guest OSes (domains), and how do guest OSes communicate with Xen? Optional Questions: A. 4. How is the nested model differ from a traditional DB? 5. Why would a Pig Latin user prefer to use COGROUP rather than join? Give an example. 6. In figure 4, explain what is shown in the right-hand panel of the pig pen debugger, and why it is needed. B. 4. In Figure 4, how does Xen scale running multiple web-servers? 5. What is the performance 'anomaly' in Figure 5, and how could that happen?