Assignments emailed to the instructor or TAs will
be ignored--they will NOT be accepted for grading.
We will only grade submissions in Sakai.
If your program does not compile, you will not get any credit.
Most compilation errors occur for two reasons:
You will write an application to build a Document Object Model (DOM) for a given HTML file, and process it with a set of given input commands that will transform the tree.
The Document Object Model (DOM) is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page. (This is quoted from the W3C Document Object Model specification web page.)
When you access a web page, the browser builds a DOM tree structure for the HTML content in the page.
Consider the following HTML example:
<html> <body> <center>The <em>quick</em> brown fox</center> </body> </html>which is rendered by a browser as:
The DOM tree for this HTML content would look like this:The quick brown fox
html
|
body
|
center
|
--------------
| | |
The em brown fox
|
quick
Note: There is a space
character after "The", and before "brown" so the corresponding nodes of
the tree actually store "The " and " brown fox".
You can see the DOM tree for any HTML document in the browser. In Chrome, go to View --> Developer --> Inspect Elements. In Firefox, go to Tools -> Web Developer -> Inspector.
Open the sample.html file for the above example in your browser, and look at its DOM. You will notice that under the root html node, in addition to the body node, there is also a head node (with nothing under it), which corresponds to the head tag. The head tag is not used in the example above, and is automatically supplied by the browser. The HTML documents in this assignment will not have a head tag. You will also notice the trailing space character included in "The " and the leading space character included in " brown fox".
Web developers can build a "dynamic" HTML page by embedding, say, Javascript code in the page, that can manipulate the DOM tree. For example, such code can show or hide individual columns in an HTML table upon user request.
In the example above, you could have Javascript code within the page make calls to the DOM tree, to replace the em tag with the b tag, and remove the center tag, for the following result:
html
|
body
|
--------------
| | |
The b brown fox
|
quick
The browser keeps track, and renders the tree onto the web page like this:
The quick brown fox
Bring up try.html in your browser.
This page has javascript code embedded in HTML,
that replaces em with b or vice versa
when you click a button. It
does this by changing the DOM tree representation of the page.
You can view the web page's HTML source using the keyboard shortcut option-command-u (Mac) or ctrl-u (Windows)
| html | top level |
| body | second level |
| p | paragraph |
| em | emphasis (italics) |
| b | boldface |
| table | table |
| tr | row (within table) |
| td | column (within tr) |
| ol | ordered (numbered) list |
| ul | unnumbered list |
| li | list item (within ol or ul) |
Moreover, the format of the HTML document itself (supplied through an input file) will be restricted to the following:
There will not be anything else in a tag, not even spaces.
So <p> is good, but < p> is not (space before p), and <p id="007"> is not (tag has an attribute named id)
Following are some sample HTML pages. Click on a link to see the page, and view the page source to see the underlying HTML:
| ex1.html | p, em, and table tags |
| ex2.html | ol and ul tags |
| ex3.html | Nested ol and ul tags |
As an example, consider the following input HTML:
<html> <body> The <b> quick </b> brown fox </body> </html>(The third line is actually "The ", i.e. trailing blank space, and the "brown fox" line has a leading space before "brown".)
The DOM tree implementation for this HTML would look like this:
----------
|html| |\|
------|---
|
V
----------
|body| |\|
------|---
|
V
---------- ------- ----------------
|The |\|-|-> |b| |-|-> | brown fox|\|\|
---------- ---|--- ----------------
|
V
-----------
|quick|\|\|
-----------
Note: Tree nodes containing tags
do NOT include angle brackets with the tags.
So, if a node stores the tag b,
it stores the string "b", NOT the string "<em>".
Also, closing tags (e.g. "/em") are NOT be stored in the tree.
If your tree violates these requirements, it will be
considered incorrect!
Download the attached domtree_project.zip file to your computer. DO NOT unzip it. Instead, follow the instructions on the Eclipse page under the section "Importing a Zipped Project into Eclipse" to get the entire project into your Eclipse workspace.
You will see a project called DOM Tree with the following classes:
There are also a number of sample test files directly under the project folder (see the Examples section that follows.)
Note: You are not required to use the included Stack class. But if you do use a stack, you MUST use this class (not your own).
You will implement the following methods in the Tree class:
Only sensible replacements will be asked, you need not do any checking. (For instance, ol cannot be replaced with em, so you will not be asked to do this.)
Note: Text nodes should NOT be merged if the removal of a tag results in an unbroken string of text.
For instance, removing tag b in this:
---------- ------- ----------------
|The |\|-|-> |b| |-|-> | brown fox|\|\|
---------- ---|--- ----------------
|
V
-----------
|quick|\|\|
-----------
should give this:
---------- ----------- ----------------
|The |\|-|-> |quick|\|-|-> | brown fox|\|\|
---------- ----------- ----------------
In other words, the three nodes above should NOT be merged into a
single text node.
You may assume that the row will not already have been boldfaced. Also note that the first row is row number 1 (not zero), the second row is row number 2, etc.
A taggable word is a sequence of alphabetic letters (nothing else) that is not a part of another word, and may have one of the following punctuations AS THE LAST CHARACTER: period '.', comma ',', question mark '?', exclamation point '!', colon ':' and semicolon ';'.
The taggable word may appear within a longer piece of text and may occur multiple times in the containing text.
In the following example:
What goes around comes around!the word "around" is tagged with em:
What goes <em>around</em> comes <em>around!</em>
Before tagging:
------------------------------------
|What goes around comes around!|\|\|
------------------------------------
After tagging:
---------------- -------- ------------- ---------
|What goes |\|-|-> |em| |-|-> | comes |\|-|-> |em| |\|
---------------- ----|--- ------------- ----|----
| |
V V
------------ -------------
|around|\|\| |around!|\|\|
------------ -------------
Notice that the punctuation goes with the word when tagged. Also note that
the blank space between "goes" and "around" will now trail "goes" in the
DOM tree node, and the spaces between "around" and "comes"
in "around comes around!" will now lead and trail "comes" in the DOM tree node.
As another example, suppose you were asked to add a bold tag around "cow" in the following
cases:
This would be the result after tagging, for each (the last two are case insensitive matches):
But you would NOT tag "cow" in these words:
Here are examples of applying transformations to some HTML pages, and the resulting HTML (look inside the source files as well):
| Input | Transformation | Result |
|---|---|---|
| ex1.html | replace em with b | ex1tr1.html |
| ex1.html | delete em | ex1tr2.html |
| ex2.html | delete ol | ex2tr1.html |
| ex3.html | delete ol | ex3tr1.html |
| ex1.html | boldface 2 | ex1tr3.html |
| ex2.html | add em around item | ex2tr2.html |
Following this, it goes into a loop, and presents a menu of actions from which you can pick whether you print the tree structure, or the HTML generated from the tree, or do one of the transformations.
Here's a sample run with the tree built out of the ex1.html input file:
Enter HTML file name => ex1.html
Choose action: (p)rint Tree, (h)tml, (r)eplace tag, (b)oldface row, (d)elete tag, (a)dd tag, or (q)uit? => p
html
|----body
|----p
|----A line of non-tagged text.
|----p
|----A
|----em
|----new
|----paragraph.
|----table
|----tr
|----td
|----em
|----R1C1
|----td
|----R1C2
|----tr
|----td
|----R2C1
|----td
|----R2C2
Choose action: (p)rint Tree, (h)tml, (r)eplace tag, (b)oldface row, (d)elete tag, (a)dd tag, or (q)uit? => h <html> <body> <p> A line of non-tagged text. </p> <p> A <em> new </em> paragraph. </p> <table> <tr> <td> <em> R1C1 </em> </td> <td> R1C2 </td> </tr> <tr> <td> R2C1 </td> <td> R2C2 </td> </tr> </table> </body> </html> Choose action: (p)rint Tree, (h)tml, (r)eplace tag, (b)oldface row, (d)elete tag, (a)dd tag, or (q)uit? => q
(If you submit a .class or .zip, or any other file, you will get no credit even if you have a working version on your computer. We will only grade whatever is in Sakai.)
All input files will be correctly formatted, so you don't have to do any error checking for input format.
Every transformation that is applied will be on a legitimate tree structure, where the transformation should work without failure.
Your build method will be graded first. After that, all other methods will be graded using our correct build implementation.
Each of the non-build methods will be graded using a tree built from scratch. This means every test case is treated separately, so that it does not depend on any previous test case's transformation of the tree.
Since every method you will implement results in building or modifying the DOM tree structure, we will grade by examing the tree structure that results at the end of each of your methods. Starting with root instance field, the grading process will traverse your tree structure and compare it with the expected structure. (As stated earlier in the Running the Program section, we will NOT be looking at the HTML printout of the tree.)
To check the tree structure as you go, you may either use the print() method, or you may want to use the Eclipse debugger to navigate through the tree nodes. There is comprehensive information on how to use the debugger in Eclipse under Help -> Help Contents -> Java development user guide -> Tasks -> Running and Debugging, and Help -> Help Contents -> Java development user guide -> References -> Views -> Variables View. Essentially, you need to know how to set breakpoints and examine variable values when the program stops at a breakpoint. (And of course, there are YouTube videos on using the debugger in Eclipse.)