Weka Intro ---------- by Yihua Wu Weka is a collection of machine learning algorithms for solving real-world data mining problems. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Weka is open soure software issued under the GNU General Public License. 1. Installation Weka is installed on the CS grad machines in /grad/s3/weka323. If you use csh you'll need to do the following before using it: setenv WEKAHOME /grad/s3/weka323 . If you use ksh or bash, do " export WEKAHOME=/grad/s3/weka323 . You then need to include "$WEKAHOME/weka.jar" in the java classpath. It is also available on the research machines under /farm/unsupported---to use it change "/grad/s3" to "/farm/unsupported/lib" in the preceding instructions. You can also download the software yourself to use on your own computer from the webpage at http://www.cs.waikato.ac.nz/~ml/weka/index.html . 2. How to use Weka Under directory of "weka323", there are several help files. "README", "README_Experiment_Gui" and "Tutorial.pdf" are where you can get al detailed information on how to run weka. Here are some highlights: 2.1 Data File The machine-learning software cannot be run without data. Under "weka323", there is a sub-directory named "data", in which all data files are saved as "*.arff". When you run weka, you can either use software included data files or define your own. As for data format, please refer to the second part of the README. 2.2 Running Command Double click the weka.jar icon, or from a command line type: java -jar weka.jar . (Please read README for details.) * Select "Preprocess" to open a data file; * Select "Classify" to choose a classifier (Decision Tree, Neural Network, etc.); * Press "Start", and check the output in window of "Classifier output".