CS530 final project instructions
The final proposal is due 11/23.
The final paper is due 12/16.
Unless you have already discussed your project (ideas) with me, you
must schedule a 30-minute meeting with me to do so:
email me your available
times on Thursday 11/10 between 10am and 3pm, and on Monday 11/14
after 4pm.
Have you noticed that the main job of an academic is to write?
Your final project is to write a final paper.
The format of a basic final paper is as follows.
- Describe an application domain that catches your fancy: campus
buses, phone calls to customer service, shoes, whatever. Keep in
mind that your reader probably know very little about your domain to
start with.
- Formally specify a problem in the domain that involves
uncertainty. For example, if you'd like to tackle the problem of
shoe classification, you need to specify the variables and their
values (size? color? shape?) and the classes (men's? women's?
children's?) Or, if you'd like to tackle the problem of choosing
the right shoe for a given occasion, you also need to specify what
the inputs and outputs are. What makes for a good decision?
- What have other people done about this problem or in this
domain? Which aspects of the problem are considered easy, and which
are considered hard, by the general opinion of previous researchers?
Libraries (including librarians) and Web search are your friend.
In choosing your problem (and formulating your solution), stick to
the one-miracle principle: don't try to do multiple things
that you expect to be hard at the same time. It is acceptable to
not have a miracle in mind—this is a final project for a
course, not for a research career!
- Dealing with uncertainty often involves learning a model of
the world from training examples. If you need them, where do you
plan to get your training examples from?
- Section 3 may have helped you identify a publically
accessible database that you can use, or people you can contact
to ask for their data (do so early, and cc me on your message).
- You may also be able to gather data quickly yourself, if
it's not too hard (but anything involving hardware is probably
too hard; see "one-miracle principle" above, and your one
miracle should not be gathering data).
- It may also be appropriate to simply use imagined
(generated) data. This option has the advantage that you know
and control the ground truth. It is more appropriate when your
domain and problem is more general or abstract (for example,
classification of 3D shapes rather than of shoes), when real
data is more difficult or time-consuming to acquire, or when no
previous work has used real data.
Keep in mind that unlabeled data is much easier to come by than
labeled data. For example, the Web is full of text, but where do
you find text classified into positive and negative sentiments? (Bo
Pang and Lillian Lee found one place: the Internet Movie Database.)
- How do you evaluate a solution to your problem? For example,
the performance of a classification algorithm can be measured by
the percentage of unseen examples classified correctly, but would
you distinguish between different kinds of errors? How would you
evaluate a shoe selector, given that multiple pairs of shoes can be
appropriate for the same occasion? Your performance criterion should
obviously depend on the intended application.
- What methods do you plan to use? You should design one or more
methods to bear on a question beyond your immediate results. For
example, you might be curious whether the curvature of the sole
is a useful attribute for shoe classification, not just whether
your algorithm classified 62% of the shoes in your final test set
correctly. To investigate the former question, you should design
two methods that are identical except one ignores the curvature of
the sole. Often you need to reimplement a previously documented
method and reproduce its performance before seeing if a tweak
improves performance.
- What are your results? Evaluate your methods as planned in
Section 5. But also, pick a couple of representative examples and
show how they work (or don't), to give your reader and yourself a
sense of what's going on (such as what the machine learned from the
training data). What kinds of input data do your methods tend to get
right and wrong?
- Compare your work to previous work in Section 3. What did you
learn? That is, what answers do your result suggest to the questions
beyond immediate results in Section 6? Conclude with a summary of
your achievement and some directions for future work.
Your proposal (due 11/23) should be comprised of Sections 1–6.
The proposal should be detailed enough to let another student in this
class (try to) reproduce Section 7, if not also Section 8.
I expect your proposal to be very roughly around 3000 words, and your
final paper to be very roughly around 5000 words. Shorter is better!
The target audience for both is your fellow students. Please submit
both pieces of writing in PDF, PostScript, plain text, or HTML format,
not a Microsoft Word format.
The actual progress of your final project may move both forwards and
backwards through this outline, but this outline is the eventual goal.
Do not just recapitulate the history of your project in your final
paper. Report (only) what others can learn from, including mistakes to
avoid, but not necessary as a sequence of activities you performed. If
you need to, make up a story about what you did that fits the lessons
you want to report.
So far I've described a basic final project. If you have the interest
and background, you can do a more theoretical or foundational project,
as long as you plan to be able to write up either a "success" or a
"failure". Talk to me as soon as possible about this.