Spark Use Case - Social Media Analysis | - AcadGild [PDF]

Jun 23, 2016 - In this post spark, we will work on a case study to calculate the average number of friends based on thei

5 downloads 16 Views 873KB Size

Recommend Stories


Case Study - Social Media
The wound is the place where the Light enters you. Rumi

Social Media Case Studies
Knock, And He'll open the door. Vanish, And He'll make you shine like the sun. Fall, And He'll raise

Use of Social Media
You often feel tired, not because you've done too much, but because you've done too little of what sparks

Social Media Use in Academia
Where there is ruin, there is hope for a treasure. Rumi

[PDF] Social Media Analytics
Everything in the universe is within you. Ask all from yourself. Rumi

[PDF] Social Media Marketing
Why complain about yesterday, when you can make a better tomorrow by making the most of today? Anon

A case of social media in Kuwait
Stop acting so small. You are the universe in ecstatic motion. Rumi

Social Media-Innovation and Minority Language Use
Courage doesn't always roar. Sometimes courage is the quiet voice at the end of the day saying, "I will

How to use social media responsibly
If you feel beautiful, then you are. Even if you don't, you still are. Terri Guillemets

[PdF] Download Social Media Marketing
Your big opportunity may be right where you are now. Napoleon Hill

Idea Transcript


COURSES

CORPORATE TRAINING

REVIEWS

DOWNLOADS & EBOOKS

YES, I WANT TO BOOST MY CAREER & INCREASE

Home / Big Data Hadoop & Spark / Spark Use Case – Social Media Analysis

MY SALARY! Your Name (required)

Your Email (required)

Your Contact Number (required)

Your Message

23 Spark Use Case – Social Media JUNE

TELL ME HOW

Analysis

2016

VIDEO TUTORIALS Error type: "Forbidden". Error message: "The request cannot

In this post spark, we will work on a case study to calculate the average number of friends based on their age, on a social media website.

be completed because you have exceeded your quota." Domain: "youtube.quota". Reason: "quotaExceeded". Did you added your own Google API key? Look at the help.

Let’s begin by considering a sample of four records.

Check in YouTube if the id UCaQfgvMsjpImSxrJQDBjd-Q belongs to a channelid. Check the FAQ of the plugin or send error messages to support.

Column 1: User ID

LIKE WHAT YOU SEE? SUBSCRIBE TO OUR BLOG We send only 1 email in a week

Column 2: User Name

Enter your email...

Column 3: Age of the User

Subscribe

Column 4: Number of Friends with that User You can download the input file from here. The new RDD, my_lines, is created by calling the textFile function on the Spark Context with our source data, where every individual line of that comma separated source data is passed as individual entries in the RDD.

To see the first 10 records of the my_lines RDD, the take action has been called on my_lines RDD.

We are going to transform our my_lines RDD into new RDD named as my_rdd by calling map on it and then passing it to the parse_line function, which could actually perform that mapping. Hence, every record from my_lines RDD is passed on to parse_line function one by one and then parsed out.

SEARCH Search Now

my_lines RDD is first split by comma and then the required fields, i.e. age of the user and number of friends that user is having and those two fields are extracted from third and fourth fields respectively and then returned and stored in the new key-value

CATEGORIES AcadGild

RDD named as my_rdd.

Android App Development Big Data Hadoop & Spark Big Data Hadoop & Spark – Advanced Blockchain Careers

To see the first 10 records of the RDD named as my_rdd, take action has been called on my_rdd.

Data Analytics with R, Excel & Tableau

The results are the key-value pairs with the age of individual user as key and number of friends for that particular age as value.

Data Science and Artificial Intelligence Full stack Web Development Graphic Design & UX

GET SOCIAL

We have simplified the below complex script by breaking it into multiple statements to achieve the results.

AcadGild

165,113 likes

We need to take the RDD my_rdd containing the key value pairs of the age of individual user as the key, and the number of friends for that particular age as value ,and then call the map values on it.

Like Page

Contact Us

Be the first of your friends to like this

This transforms every value in key value pair of age and number of friends in from the above RDD. Every value from RDD is passed on to map function, and the new output comprising of number of friends for a particular user as key and 1 as value.

The first 10 records of the new RDD can be displayed by passing take function on x RDD.

This step involves summing up of the total number of friends for one particular age group as key and the number of users in that age group as value.

AcadGild 11 hours ago

This is done by passing reduceByKey transformation on x RDD.

WHAT’S TRENDING

The first 10 records of the new RDD can be displayed by passing take function on totals_Age RDD.

RECENT POSTS

Data Science Glossary- Statistical Tools and Terminologies April 19, 2018

Google & AI: Machine Learning to Improve Information Access & Usability

This step includes calculating the average number of friends for every age group by passing a formula in the lambda function to divide the key of previous RDD i.e. total number of friends for one particular age group by value in the previous RDD i.e.

April 18, 2018

number of users in that age group. “Humans Are Underrated” – Why Elon Musk Thinks So April 17, 2018

The Smart Contract Handbook

The results of averages, _Age RDD is collected in my_results RDD.

April 16, 2018

ARCHIVES

The final results are displayed by using for loop statement in Python to print the age of the user as key and the average number of friends in that age group as value.

April 2018 March 2018 February 2018 January 2018 December 2017 November 2017 October 2017 September 2017 August 2017 July 2017 June 2017 May 2017 April 2017

We hope this post has been helpful in understanding this Spark use case using Python. In case of any queries, feel free to March 2017

comment below and we will get back to you at the earliest.

February 2017

For more resources on Big Data and other technologies, keep visiting acadgild.com

January 2017 December 2016 November 2016 October 2016 September 2016 Share this:

August 2016 July 2016 June 2016

Related

Spark Use Case - Analyzing MovieLens Dataset

Spark Use Case - Popular Movie Analysis

May 2016

Spark Use Case - Weather Data Analysis

April 2016 March 2016 February 2016

SATYAM

January 2016

Satyam Kumar is a Big Data Professional, working in AcadGild with rich experience in Big Data

December 2015

technologies like Hadoop, Spark, NoSQL and other related technologies. He strives to code in Programming languages like Java and Python and have been responsible for development of various projects and blogs related to Hadoop ecosystem and Spark. AcadGild was founded with the vision of "Learn. Do. Earn". We provide skill development courses based on current industry needs. But what sets us apart is earning opportunities we provide after successful completion of course. We also provide live mentoring and 24x7 support. Our mentors are industry thought leaders in their respective fields.

November 2015 August 2015 June 2015 May 2015 November 2014

PREVIOUS ARTICLE

NEXT ARTICLE

Parquet File Format Hadoop

October 2014

HTML5 SVG

September 2014 August 2014 RELATED POSTS

How Facebook-Cambridge Analytica Data Breach Could Have Been Avoided?

Partitioning In Hive

Testing your Scripts with PigUnit

March 6, 2018

March 5, 2018

April 16, 2018

1 COMMENT

HEMANT

REPLY TO HEMANT

December 21, 2016 at 8:07 pm

How To solve the same problem using Scala

LEAVE A REPLY COMMENTS *

NAME *

EMAIL *

WEBSITE

SUBMIT NOTIFY ME OF FOLLOW-UP COMMENTS BY EMAIL. NOTIFY ME OF NEW POSTS BY EMAIL.

CATEGORIES AcadGild Android App Development Big Data Hadoop & Spark

TAGS

LIKE WHAT YOU SEE? SUBSCRIBE TO OUR BLOG

ACADGILD

We send only 1 email in a week

ACADGILD ONLINE COURSES ANDROID

Enter your email...

ANDROID APP

ANDROID APP DEVELOPMENT

Subscribe

Big Data Hadoop & Spark – Advanced ANDROID DEVELOPMENT

Blockchain

ANDROID DEVELOPMENT COURSE

Careers

APACHE HIVE

Data Analytics with R, Excel & Tableau

APACHE SPARK

Data Science and Artificial Intelligence

APACHE PIG

ARTIFICIAL INTELLIGENCE

BIG DATA

BIG DATA AND HADOOP

Full stack Web Development BIG DATA AND HADOOP ONLINE

Graphic Design & UX

COURSES BIG DATA DEVELOPEMENT BIG DATA HADOOP BIG DATA INTERVIEW QUESTIONS BLOCKCHAIN BLOCKCHAIN ONLINE COURSE BLOCKCHAIN TECHNOLOGY COURSES DATA ANALYSIS

DATA SCIENCE

DATA SCIENCE COURSE ONLINE DATA SCIENTIST COURSES DEEP LEARNING COURSE

FRONT END

FRONTEND WEB DEVELOPMENT HADOOP

HADOOP ADMINISTRATION

HADOOP INTERVIEW QUESTIONS HADOOP ONLINE COURSE HADOOP TUTORIAL HADOOP USE CASE HIVE

HBASE

HDFS

INTERVIEW QUESTIONS

LEARN BLOCKCHAIN TECHNOLOGY MACHINE LEARNING

MAPREDUCE

ONLINE ANDROID COURSES ONLINE WEB DEVELOPMENT COURSES SPARK SPARK SQL

© Copyright 2018. ACADGILD.

SPARK COURSES ONLINE

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.