Big Data Seminar - ISMLL [PDF]

Oct 22, 2014 - Big Data Seminar. Lucas Drumond, Josif Grabocka. Information Systems and Machine Learning Lab (ISMLL). In

17 downloads 18 Views 1MB Size

Recommend Stories


Big Data Seminar flyer
Keep your face always toward the sunshine - and shadows will fall behind you. Walt Whitman

PDF Big Data
What we think, what we become. Buddha

PDF Big Data
Those who bring sunshine to the lives of others cannot keep it from themselves. J. M. Barrie

Big Boss? Big Data!
The only limits you see are the ones you impose on yourself. Dr. Wayne Dyer

Big data, Big Brother?
Before you speak, let your words pass through three gates: Is it true? Is it necessary? Is it kind?

big data
Just as there is no loss of basic energy in the universe, so no thought or action is without its effects,

Big Data
Don't count the days, make the days count. Muhammad Ali

Big Data
When you do things from your soul, you feel a river moving in you, a joy. Rumi

Big Data
Learning never exhausts the mind. Leonardo da Vinci

BIG DATA
What we think, what we become. Buddha

Idea Transcript


Big Data Seminar

Big Data Seminar Lucas Drumond, Josif Grabocka Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany

October 22, 2014

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 1 / 17

Big Data Seminar

What is Big Data?

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 1 / 17

Big Data Seminar

What is Big Data?

Some definitions: I

“A collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” http://en.wikipedia.org/wiki/Big data

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 2 / 17

Big Data Seminar

What is Big Data?

Some definitions: I

“A collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” http://en.wikipedia.org/wiki/Big data

I

“Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” www.gartner.com/it-glossary/big-data/

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 2 / 17

Big Data Seminar

What is Big Data?

Big Data is about: I

Storing and accessing large amounts of (unstructured) data

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 3 / 17

Big Data Seminar

What is Big Data?

Big Data is about: I

Storing and accessing large amounts of (unstructured) data

I

Processing high volume data streams

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 3 / 17

Big Data Seminar

What is Big Data?

Big Data is about: I

Storing and accessing large amounts of (unstructured) data

I

Processing high volume data streams

I

Making sense of the data

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 3 / 17

Big Data Seminar

What is Big Data?

Big Data is about: I

Storing and accessing large amounts of (unstructured) data

I

Processing high volume data streams

I

Making sense of the data

I

Predictive technologies

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 3 / 17

Big Data Seminar

Where to find Big Data?

I I I I

1.28 billion users (1.23 billion monthly active in January 2014) Size of user data sored by Facebook: 300 Petabytes Average amount of data that Facebook takes in daily: 600 terabytes Size of Facebook’s Graph Search database: 700 Terabytes

Source: http://allfacebook.com/orcfile b130817 Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 4 / 17

Big Data Seminar

Where to find Big Data?

I

3.3 billion searches per day (on average)1

I

30 trillion unique URLs identified on the Web1

I

20 billion sites crawled a day1

I

In 2008 Google processed more than 20 Petabytes of data per day2

1 http://searchengineland.com/google-search-press-129925 2 Jeffrey

Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (January 2008), 107-113. Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 5 / 17

Big Data Seminar

Where to find Big Data?

I I I

Average number of tweets per day: 58 million1 Number of Twitter search engine queries every day: 2.1 billion1 Total number of active registered Twitter users: 645,750,0001

1 http://www.statisticbrain.com/twitter-statistics/ Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 6 / 17

Big Data Seminar

Where to find Big Data?

I

Ensembl database contains the genome of humans and 50 other species

I

“only” 250 GB1

1 http://www.ensembl.org/ Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 7 / 17

Big Data Seminar

Where to find Big Data?

I

Large Hadron Collider has collected data from over 300 trillion proton-proton collisions

I

Approx. 25 Petabytes per year

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 8 / 17

Big Data Seminar

Overview Part III

Machine Learning Algorithms

Part II

Large Scale Computational Models

Part I

Distributed Database

Distributed File System

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 9 / 17

Big Data Seminar

The rules of selecting a paper:

1: Students visit the course website and select a paper under the Section literature (deadline: 29.10). 2: The selected paper is notified to [email protected] and [email protected] I I I

Deadline: 29.10 First come, first served Send three preferred papers to avoid allocation crashes

3: The instructors create a schedule for the talks and notify the students. The first talk is scheduled for 12.11.

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 10 / 17

Big Data Seminar

Papers list: Part I Author Ahmed, N.K. et al. Dean, T. et al. Dong, X. et al. Gonzalez, J.E. et al. Han, W.-S. et al. Liu, C. et al.

Title Graph Sample and Hold: A Framework for Biggraph Analytics Fast, Accurate Detection of 100,000 Object Classes on a Single Machine Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion PowerGraph: Distributed Graph-parallel Computation on Natural Graphs TurboGraph: A Fast Parallel Graph Engine Handling Billion-scale Graphs in a Single PC Distributed Nonnegative Matrix Factorization for Web-scale Dyadic Data Analysis on MapReduce

Year 2014 2013 2014 2012 2013 2010

http://www.ismll.uni-hildesheim.de/lehre/semBI-14w/index_en.html Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 11 / 17

Big Data Seminar

Papers list: Part II

Author Ottaviano, G., Venturini, R. Rakthanmanon, T. et al. Recht, B. et al. Yu, H.-F. et al.

Title Partitioned Elias-Fano Indexes

Year 2014

Searching and Mining Trillions of Time Series Subsequences Under Dynamic Time Warping Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

2012 2011 2012

http://www.ismll.uni-hildesheim.de/lehre/semBI-14w/index_en.html

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 12 / 17

Big Data Seminar

Regulations of the presentations:

I

Depending on the number of students, there will be one or two seminar presentations per lecture schedule.

I

Each seminar lasts for 50 minutes, including 10 minutes of questions and discussions.

I

All the students should participate in the talks of others.

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 13 / 17

Big Data Seminar

Advice on the presentation

I

Understand and describe the underlying theoretic foundation of the methodologies (learning algorithms, equations)

I

Describe the methods in your own formulation and avoid reading out the content of the paper

I

Think analytically and describe the advantages and disadvantages of the paper

I

If applicable, propose ideas and improvements in the end

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 14 / 17

Big Data Seminar

Seminar Report

I

Every presenter should prepare a report on the paper he presented.

I

The report should include a description of the method, its strengths and weaknesses

I

The overall tone of the report should be analytic of the work and not a repetition of the paper

I

Additional ideas, experiments or illustrations will be rewarded

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 15 / 17

Big Data Seminar

Structure of the Seminar Report

I

Content should not exceed 30 pages

I

Submission deadline, 2 weeks before the term break (28.01.2015). To be submitted (to Lucas Drumond C36Spl):

I

I I

3 printed and bound copies 1 CD with the report, source code and all relevant materials

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 16 / 17

Big Data Seminar

Any Questions?

Lucas Drumond, Josif Grabocka, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany October 22, 2014 17 / 17

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.