The course covers the recent trends and advances in Data Analytics. The focus is on analyzing massive structured and unstructured data sets using Hadoop based Big Data platform. The list of covered topics include text analytics, sentiment analysis, social network mining, streaming data mining, recommender system, time-series analysis, kernel-based learning, advanced visualization, etc. The course makes heavy use of analytics software such as R and KNIME. Students participate in multiple data analytics competition hosted on Kaggle.com or on places like KDD, PAKDD, etc. The course also prepare students for several company-specific certifications in Data Science. In addition to lectures and class-room discussion, the course makes extensive use of online material available on UdaCity, Coursera, Youtube, etc.


Outline
  • Linear Regression
  • Logistic Regression
  • Unstructured Data Analytics – Hadoop
  • Hadoop Echo System
  • Text Analytics
  • Social Networks/Link Analysis
  • Data Stream Mining
  • Data Visualization
  • Latest Research Papers and Case Studies on Data Analytics

Software
  • R Programming (R is the primary tool for this course but it will be well supported by KNIME)

Books