Pranking with Ranking - CIS @ UPenn [PDF]

Types of Ranking Algorithms: ▷ Point-wise Approaches - PRanking. ▷ Pair-wise Approaches - RankSVM, RankNet, Rankboos

0 downloads 4 Views 4MB Size

Recommend Stories


Theory of Computation (UPenn CIS 511, Spring 2017)
Goodbyes are only for those who love with their eyes. Because for those who love with heart and soul

Ranking with Auxiliary Data
Where there is ruin, there is hope for a treasure. Rumi

ARC Ranking List [PDF]
Typically an A* journal would be one of the best in its field or subfield in which to publish and would typically cover the entire field/subfield. Virtually all papers they publish will be of a very high quality. These are journals where most of the

ARC Ranking List [PDF]
Typically an A* journal would be one of the best in its field or subfield in which to publish and would typically cover the entire field/subfield. Virtually all papers they publish will be of a very high quality. These are journals where most of the

ARC Ranking List [PDF]
Typically an A* journal would be one of the best in its field or subfield in which to publish and would typically cover the entire field/subfield. Virtually all papers they publish will be of a very high quality. These are journals where most of the

Ranking and Reranking with Perceptron
When you do things from your soul, you feel a river moving in you, a joy. Rumi

Ranking
We must be willing to let go of the life we have planned, so as to have the life that is waiting for

Ranking
When you do things from your soul, you feel a river moving in you, a joy. Rumi

Ranking
Keep your face always toward the sunshine - and shadows will fall behind you. Walt Whitman

Martha J. Farah - UPenn Psychology - University of Pennsylvania [PDF]
2001; Evenden, 1999; Kirby, Petry, & Bickel, 1999). A com- mon theme of impaired impulse control may link these dis- parate conditions, which in turn suggests the possibility that they share a common neural basis. However, impulsivity is a variably d

Idea Transcript


Pranking with Ranking Koby Crammer and Yoram Singer

Presented by : Soham Dan

Content and some figures borrowed from [Crammer, Koby, and Yoram Singer. Pranking with ranking.NIPS. 2002] and talk slides.

Introduction I

Problem I I

I

I

Input : Sequence of instance-rank pairs (x 1 , y 1 )...(x t , y t ) Output : A model(essentially a rank prediction rule) which assigns to each instance a rank. Goal: To have the predicted rank as close as possible to the true rank. Note : The ranks need not be unique!

Introduction I

Problem I I

I

I

I

Input : Sequence of instance-rank pairs (x 1 , y 1 )...(x t , y t ) Output : A model(essentially a rank prediction rule) which assigns to each instance a rank. Goal: To have the predicted rank as close as possible to the true rank. Note : The ranks need not be unique!

Similarity with I

I

Classification Problems : Assign one of k possible labels to a new instance. Regression Problems : Set of k labels is structured as there is a total order relation between labels.

Introduction I

Problem I I

I

I

I

Input : Sequence of instance-rank pairs (x 1 , y 1 )...(x t , y t ) Output : A model(essentially a rank prediction rule) which assigns to each instance a rank. Goal: To have the predicted rank as close as possible to the true rank. Note : The ranks need not be unique!

Similarity with I

I

Classification Problems : Assign one of k possible labels to a new instance. Regression Problems : Set of k labels is structured as there is a total order relation between labels.

Natural Settings to rank / rate instances Information Retrieval , Collaborative Filtering

Problem

Figure 1: Movie rating prediction (Example : Netflix challenge)

Possible Solutions

I

Cast as a regression or classification problem

Possible Solutions

I

Cast as a regression or classification problem

I

Reduce a total order into a set of preference over pairs. Drawback : Sample size blowup from n to Ø(n2 ). Also, no easy adaptation for online settings.

Possible Solutions

I

Cast as a regression or classification problem

I

Reduce a total order into a set of preference over pairs. Drawback : Sample size blowup from n to Ø(n2 ). Also, no easy adaptation for online settings.

I

PRank Algorithm : Directly maintains totally ordered set by projection of instances into reals, associating ranks with distinct sub-intervals of the reals and adapting the support of each subinterval while learning.

Problem Setup I

Input Stream: Sequence of instance-rank pairs (x 1 , y 1 )...(x t , y t ) where each instance xt ∈ Rn . Corresponding rank y t ∈ Y which is a finite set with a total order relation (structured) . W.l.o.g. Y = 1, 2, 3..., k with > as the order relation. 1 ≺ 2 ≺ ... ≺ k

Problem Setup I

Input Stream: Sequence of instance-rank pairs (x 1 , y 1 )...(x t , y t ) where each instance xt ∈ Rn . Corresponding rank y t ∈ Y which is a finite set with a total order relation (structured) . W.l.o.g. Y = 1, 2, 3..., k with > as the order relation. 1 ≺ 2 ≺ ... ≺ k

I

Ranking Rule (H) : Mapping from instances to ranks, Rn → Y. The family of ranking rules considered here : w ∈ Rn and k thresholds : b1 ≤ b2 ≤ ... ≤ bk = ∞

Problem Setup I

Input Stream: Sequence of instance-rank pairs (x 1 , y 1 )...(x t , y t ) where each instance xt ∈ Rn . Corresponding rank y t ∈ Y which is a finite set with a total order relation (structured) . W.l.o.g. Y = 1, 2, 3..., k with > as the order relation. 1 ≺ 2 ≺ ... ≺ k

I

Ranking Rule (H) : Mapping from instances to ranks, Rn → Y. The family of ranking rules considered here : w ∈ Rn and k thresholds : b1 ≤ b2 ≤ ... ≤ bk = ∞

I

Given a ranking rule defined by w and b, the predicted rank (ˆ y t ) on a new instance x is H(x) = minr ∈1,2,..,k {r : w · x − br < 0}

Problem Setup I

Input Stream: Sequence of instance-rank pairs (x 1 , y 1 )...(x t , y t ) where each instance xt ∈ Rn . Corresponding rank y t ∈ Y which is a finite set with a total order relation (structured) . W.l.o.g. Y = 1, 2, 3..., k with > as the order relation. 1 ≺ 2 ≺ ... ≺ k

I

Ranking Rule (H) : Mapping from instances to ranks, Rn → Y. The family of ranking rules considered here : w ∈ Rn and k thresholds : b1 ≤ b2 ≤ ... ≤ bk = ∞

I

Given a ranking rule defined by w and b, the predicted rank (ˆ y t ) on a new instance x is H(x) = minr ∈1,2,..,k {r : w · x − br < 0}

I

Algorithm makes a mistake on instance x t if yˆt 6= y t and loss on that input is |ˆ y t − y t |. P Loss after T rounds is T yt − yt| t=1 |ˆ

I

Perceptron Recap

Overview of Algorithm I

Online Algorithm

Overview of Algorithm I I

Online Algorithm In each round the ranking algorithm I I I I

Gets an input instance Outputs the rank as prediction Receives the correct rank value If there is an error I I

Computes loss Updates the rank-prediction rule

Overview of Algorithm I I

Online Algorithm In each round the ranking algorithm I I I I

Gets an input instance Outputs the rank as prediction Receives the correct rank value If there is an error I I

I

Computes loss Updates the rank-prediction rule

Conservative or Mistake driven algorithm :The algorithm updates its ranking rule only on rounds on which it made ranking mistakes.

Overview of Algorithm I I

Online Algorithm In each round the ranking algorithm I I I I

Gets an input instance Outputs the rank as prediction Receives the correct rank value If there is an error I I

Computes loss Updates the rank-prediction rule

I

Conservative or Mistake driven algorithm :The algorithm updates its ranking rule only on rounds on which it made ranking mistakes.

I

No statistical assumptions over data.The algorithm should do well irrespectively of specific sequence of inputs and target labels

Algorithm Illustration

Algorithm Illustration

Algorithm Illustration

Algorithm Illustration

Algorithm Illustration

Algorithm Illustration

Algorithm Illustration

Algorithm

Figure 2: The PRank Algorithm

Algorithm

Figure 2: The PRank Algorithm I I

Rank y is expanded into k − 1 virtual variables y1 , .., yk−1 , where yr = +1 if w · x > br and yr = −1 otherwise. On mistakes, b and w · x are moved towards each other.

Analysis

1. Lemma : Order Preservation

2. Theorem : Mistake Bound

Lemma : Order Preservation Can this happen ?

Lemma : Order Preservation Can this happen ?

NO

Lemma : Order Preservation Can this happen ?

NO t Let wt and bt be the current ranking rule, where b1t ≤ ... ≤ bk−1 and let (xt , yt ) be an instance-rank pair fed to PRank on round t. Denote by wt+1 and bt+1 the resulting ranking rule after the t+1 update of PRank, then b1t+1 ≤ ... ≤ bk−1

Lemma : Order Preservation t Let wt and bt be the current ranking rule, where b1t ≤ ... ≤ bk−1 and let (xt , yt ) be an instance-rank pair fed to PRank on round t. Denote by wt+1 and bt+1 the resulting ranking rule after the t+1 update of PRank, then b1t+1 ≤ ... ≤ bk−1

Proof Sketch : I

brt are integers for all r and t since for all r we initialize br1 = 0, and brt+1 − brt ∈ {−1, 0, +1}.

I

Proof by Induction : t+1 is equivalent to proving Showing brt+1 +1 ≥ br brt +1 −brt ≥ yrt+1 [(wt ·xt −brt +1 )yrt+1 ≤ 0]−yrt [(wt ·xt −brt )yrt ≤ 0]

Lemma : Order Preservation

Theorem : Mistake Bound Let (xl , y1 ), ..., (xT , yT ) be an input sequence for PRank where xt ∈ Rn and yt ∈ l, ..., k. Denote by R 2 = maxt ||xt ||2 . Assume ∗ that there is a ranking rule v ∗ = (w ∗ , b ∗ ) with b1∗ ≤ ... ≤ bk−1 of a unit norm that classifies the entire sequence correctly with margin γ = minr ,t (w ∗ · xt − br∗ )yrt > 0. Then, the rank loss of the 2 +1) P . algorithm T y t − y t |, is at most (k−1)(R t=1 |ˆ γ2

Proof of Theorem

t t+1 = b t − τ t r r r τr )xt and br |ˆ y t − y t | be difference between the

P

I

wt+1 = wt + (

I

Let nt = true rank and the P predicted rank. Clearly, nt = r |τrt | P To prove the theorem we bound t nt from above by bounding ||v t ||2 from above and below. P t ∗ t ∗ v ∗ · v t+1 = v ∗ · v t + k−1 r =1 τr (w x − br ) Pk−1 t ∗ t P br∗ ) ≥ nt γ =⇒ v ∗ v T +1 ≥ γ t nt =⇒ r =1 τr (w x − P ||v T +1 ||2 ≥ γ 2 ( t nt )2

I

I I

I I

I I I I I I

To bound the norm of v from above : P t t t t+1 2 t ||2 + ||b t ||2 + 2 t ||v r τr (w · x − br ) + P ||t 2 = ||w P ( r τr ) ||x t ||2 + r (τrt )2 P P Since, ( r τrt )2 ≤ (nt )2 and r (τrt )2 = nt P ||v t+1 ||2 = ||v t ||2 + 2 r τrt (w t · x t − brt ) + (nt )2 ||x t ||2 + nt P t t t P t t t t t t t r τr (w ·x −br ) = r [(w ·x −br ) ≤ 0](w ·x −br )yr ≤ 0 Since, ||x t ||2 ≤ R 2 =⇒ ||v t+1 ||2 = ||v t ||2 + (nt )2 R 2 + nt P P P R 2 [ t (nt )2 ]/[ t nt ]+1 Using the lower bound, we get, t nt ≤ 2 γ P t 2 P t P t t n ≤ k − 1 =⇒ t (n ) ≤ (k − 1) t n =⇒ tn ≤ (k−1)(R 2 +1) γ2

Experiments

Experiments

Experiments I

Models I

I

I

Multi-class Generalization of Perceptron (MCP) : kn parameters : under-constrained Widrow Hoff Algorithm for Online Regression (WH): n parameters : over-constrained PRank : n + k − 1 parameters : accurately constrained

Experiments I

Models I

I

I

I

Multi-class Generalization of Perceptron (MCP) : kn parameters : under-constrained Widrow Hoff Algorithm for Online Regression (WH): n parameters : over-constrained PRank : n + k − 1 parameters : accurately constrained

Datasets I I I

Synthetic dataset EachMovie dataset-used for collaborative filtering tasks Evaluation in batch setting- outperforms multi-class SVM, SVR

Experiments I

Models I

I

I

I

Multi-class Generalization of Perceptron (MCP) : kn parameters : under-constrained Widrow Hoff Algorithm for Online Regression (WH): n parameters : over-constrained PRank : n + k − 1 parameters : accurately constrained

Datasets I I I

Synthetic dataset EachMovie dataset-used for collaborative filtering tasks Evaluation in batch setting- outperforms multi-class SVM, SVR

Figure 4: Time-averaged ranking-loss comparison of MCP,WH,PRank on the synthetic dataset, EachMovie-100 and 200 datasets respectively

Key takeaways

Key takeaways

1. The ranking problem is a structured prediction task because of the total order between the different ratings.

Key takeaways

1. The ranking problem is a structured prediction task because of the total order between the different ratings. 2. Online algorithm for ranking problem via projections and conservative update of the projection’s direction and the threshold values.

Key takeaways

1. The ranking problem is a structured prediction task because of the total order between the different ratings. 2. Online algorithm for ranking problem via projections and conservative update of the projection’s direction and the threshold values. 3. Experiments indicate this algorithm performs better than regression and classification models for ranking tasks.

Further Reading Types of Ranking Algorithms: I

Point-wise Approaches - PRanking

I

Pair-wise Approaches - RankSVM, RankNet, Rankboost

I

List-wise Approaches - SVM map , AdaRank, SoftRank

Further Reading Types of Ranking Algorithms: I

Point-wise Approaches - PRanking

I

Pair-wise Approaches - RankSVM, RankNet, Rankboost

I

List-wise Approaches - SVM map , AdaRank, SoftRank

References: I

Liu, Tie-Yan. Learning to rank for information retrieval. R in Information Retrieval 3.3 Foundations and Trends (2009): 225-331.

Further Reading Types of Ranking Algorithms: I

Point-wise Approaches - PRanking

I

Pair-wise Approaches - RankSVM, RankNet, Rankboost

I

List-wise Approaches - SVM map , AdaRank, SoftRank

References: I

Liu, Tie-Yan. Learning to rank for information retrieval. R in Information Retrieval 3.3 Foundations and Trends (2009): 225-331.

I

Agarwal, Shivani, and Partha Niyogi. Generalization bounds for ranking algorithms via algorithmic stability. Journal of Machine Learning Research 10.Feb (2009): 441-474.

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.