Decision trees are popular for creating and visualizing predictive models. They are a non-parametric supervised learning method used for classification and regression. The method breaks down a dataset into smaller and smaller subsets and at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. The core algorithm for building decision trees called ID3, by J. R. Quinlan, employs a top-down, greedy search through the space of possible branches with no backtracking. ID3 uses Entropy and Information Gain to construct a decision tree. Decision trees can handle both categorical, numerical and missing data. This talk will thoroughly cover the basics, advantages and disadvantages of decision trees.
At the University of Toronto, Dr Saed Sayad established the first data mining research group in 2000 and have been teaching a popular graduate data mining graduate course since 2001. He has published over 20 research papers in data mining and two books. The first book, “An Introduction to Data Science” is the first online data science interactive book with more than 1 million page views in 2017. His second book, “Real Time Data Mining” focuses on an approach to the analysis of big data in real time. For the past 20 years, Dr Sayad has been working with several Fortune 500 and start-up companies such as: Pitney Bowes (eCommerce), AdTheorent (programmatic advertising), American Express (web analytics and personalization), Macy’s (web analytics and collaborative filtering), First American (prepayment and risk analysis), Xerox (sensors data analysis), Rogers Communication (load forecasting), CIBC (small loan default prediction), and Royal Bank (fraud detection).