COM 448 Cloud Big Data Systems and Analytics

Home

 

Gediz University, Computer Engineering Department
Spring 2015
Tuesday
: 13:00 - 14:45, A-Z04

 
  Instructor: Halûk Gümüşkaya  
  Office: D107  
  Office Hours: Mon: 15:00 – 17:00 Tue: 16:00 – 17:00  
  Phone: 0232-355 0000 - 2305  
  e-mail: haluk.gumuskaya@gediz.edu.tr  
   
bullet

Course Description

Pages:
bullet

Prerequistes

bullet

Lecture Schedule

bullet

Lecture Schedule

bullet

Regulations and Policies

bullet

Textbooks

bullet

Tool and Platforms

 

bullet

Grading

 

  Course Description   (3-0-3)

Data deluge, Computing Model: Clouds, Data Centers, Virtualization, Research Model: 4th Paradigm, Data Science Process: DIKW, Recommender Systems, Algorithms: User-based Nearest-Neighbor Collaborative Filtering, Vector Space Formulation of Recommender Systems, Item-based Collaborative Filtering, k Nearest Neighbors and High Dimensional Spaces, Basic Principles of Parallel Computing, Cloud Computing Technologies for Big Data Applications and Analytics: Apache Data Analysis Open Stack, MapReduce, Hadoop, Web Search, Text Mining and their Technologies, Kmeans and MapReduce Parallelism, PageRank, NoSQL, BigTable, HBase, Indexing Technologies, Pig and Hive, Pig PageRank, Pig K-means, Build Search Engine, Internet of Things and Sensors.

   Prerequisites

    None (Catalog), but recommended courses:

bullet

COM 440 Distributed Systems

bullet

COM 444 Cloud Computing

   Lecture Schedule

bullet This is the tentative lecture schedule. Please check this page at least once a week during the semester.

   Textbooks

   Cloud Computing

bullet

Distributed and Cloud Computing: From Parallel Processing to The Internet of Things, K. Hwang, G. Fox and J. Dongarra, Morgan Kaufmann Publishers, 2012.

   Data Science and Data Processing Platforms

bullet

The Fourth Paradigm: Data-Intensive Scientific Discovery, T. Hey, Tansley and Tolle (Editors), Microsoft Research, 2009. (You can download the book from its web site).

bullet

Phyton for Data Analysis, W. McKinney, O’Reilly, 2013.

bullet

Machine Learning in Action, P. Harrington, Manning Publications, 2012.

bullet

Hadoop: The Definitive Guide, Tom White, O'Reilly, 2012.

bullet

Mahout in Action, S. Owen, R. Anil, T. Dunning, E. Friedman, Manning Publications, 2012.

  Tools and Platforms

bullet

FutureSystems - Indiana University Clusters, our project portal address, and all projects.

bullet

NumPy, SciPy, MatPlotlib - Powerful tools which every data scientist who uses Python must know

bullet

Canopy - An IDE for Python

bullet

Plotviz - A data visualization tool developed at Indiana University for displaying point distributions in 3D
 

bullet

Virtualization software: Oracle VM Box

bullet

Hadoop Ecosystem - Cloud software tools to develop and run data-intensive applications

bullet

Java development environments

  Grading

   
       30 % : Project
           15 % : Homework
          
10 % : Attendance, Discussion, Contribution
   
       20 % : Midterm Exam
   
       25 % : Final Exam
 

     

Home