CS530 final project instructions

The final proposal is due 11/23. The final paper is due 12/16. Unless you have already discussed your project (ideas) with me, you must schedule a 30-minute meeting with me to do so: email me your available times on Thursday 11/10 between 10am and 3pm, and on Monday 11/14 after 4pm.

Have you noticed that the main job of an academic is to write? Your final project is to write a final paper.

The format of a basic final paper is as follows.

  1. Describe an application domain that catches your fancy: campus buses, phone calls to customer service, shoes, whatever. Keep in mind that your reader probably know very little about your domain to start with.
  2. Formally specify a problem in the domain that involves uncertainty. For example, if you'd like to tackle the problem of shoe classification, you need to specify the variables and their values (size? color? shape?) and the classes (men's? women's? children's?) Or, if you'd like to tackle the problem of choosing the right shoe for a given occasion, you also need to specify what the inputs and outputs are. What makes for a good decision?
  3. What have other people done about this problem or in this domain? Which aspects of the problem are considered easy, and which are considered hard, by the general opinion of previous researchers? Libraries (including librarians) and Web search are your friend. In choosing your problem (and formulating your solution), stick to the one-miracle principle: don't try to do multiple things that you expect to be hard at the same time. It is acceptable to not have a miracle in mind—this is a final project for a course, not for a research career!
  4. Dealing with uncertainty often involves learning a model of the world from training examples. If you need them, where do you plan to get your training examples from? Keep in mind that unlabeled data is much easier to come by than labeled data. For example, the Web is full of text, but where do you find text classified into positive and negative sentiments? (Bo Pang and Lillian Lee found one place: the Internet Movie Database.)
  5. How do you evaluate a solution to your problem? For example, the performance of a classification algorithm can be measured by the percentage of unseen examples classified correctly, but would you distinguish between different kinds of errors? How would you evaluate a shoe selector, given that multiple pairs of shoes can be appropriate for the same occasion? Your performance criterion should obviously depend on the intended application.
  6. What methods do you plan to use? You should design one or more methods to bear on a question beyond your immediate results. For example, you might be curious whether the curvature of the sole is a useful attribute for shoe classification, not just whether your algorithm classified 62% of the shoes in your final test set correctly. To investigate the former question, you should design two methods that are identical except one ignores the curvature of the sole. Often you need to reimplement a previously documented method and reproduce its performance before seeing if a tweak improves performance.
  7. What are your results? Evaluate your methods as planned in Section 5. But also, pick a couple of representative examples and show how they work (or don't), to give your reader and yourself a sense of what's going on (such as what the machine learned from the training data). What kinds of input data do your methods tend to get right and wrong?
  8. Compare your work to previous work in Section 3. What did you learn? That is, what answers do your result suggest to the questions beyond immediate results in Section 6? Conclude with a summary of your achievement and some directions for future work.
Your proposal (due 11/23) should be comprised of Sections 1–6. The proposal should be detailed enough to let another student in this class (try to) reproduce Section 7, if not also Section 8.

I expect your proposal to be very roughly around 3000 words, and your final paper to be very roughly around 5000 words. Shorter is better! The target audience for both is your fellow students. Please submit both pieces of writing in PDF, PostScript, plain text, or HTML format, not a Microsoft Word format.

The actual progress of your final project may move both forwards and backwards through this outline, but this outline is the eventual goal. Do not just recapitulate the history of your project in your final paper. Report (only) what others can learn from, including mistakes to avoid, but not necessary as a sequence of activities you performed. If you need to, make up a story about what you did that fits the lessons you want to report.

So far I've described a basic final project. If you have the interest and background, you can do a more theoretical or foundational project, as long as you plan to be able to write up either a "success" or a "failure". Talk to me as soon as possible about this.