python - What is feature hashing (hashing-trick)? - Stack Overflow [PDF]

Dec 29, 2011 - On Pandas, you could use something like this: import pandas as pd import numpy as np data = {'state': ['O

1 downloads 16 Views 168KB Size

Recommend Stories


A Case Study of Stack Overflow
You have survived, EVERY SINGLE bad day so far. Anonymous

Recognizing Gender of Stack Overflow Users
It always seems impossible until it is done. Nelson Mandela

[PDF] What Is Relativity?
In the end only three things matter: how much you loved, how gently you lived, and how gracefully you

PDF What is Psychology?
Kindness, like a boomerang, always returns. Unknown

PdF What is Psychology?
You can never cross the ocean unless you have the courage to lose sight of the shore. Andrè Gide

Evaluation and Prediction of Content Quality in Stack Overflow
If you are irritated by every rub, how will your mirror be polished? Rumi

What is a councillor pdf
Just as there is no loss of basic energy in the universe, so no thought or action is without its effects,

PdF Download Loving What Is
If you want to become full, let yourself be empty. Lao Tzu

hashing i
Never let your sense of morals prevent you from doing what is right. Isaac Asimov

What matters is what works
You have survived, EVERY SINGLE bad day so far. Anonymous

Idea Transcript


What is feature hashing (hashing-trick)? We have 83 open jobs ª

Imagine yourself at Comcast

Learn more

I know feature hashing (hashing-trick) is used to reduce the dimensionality and handle sparsity of bit vectors but I don't understand how it really works. Can anyone explain this to me.Is there any python library available to do feature hashing? Thank you. python hash vector machine-learning edited Dec 30 '11 at 23:18

maxy 2,391

15

18

asked Dec 29 '11 at 20:29

Maggie 1,886

5

27

48

Are you looking for something like this? shogun-toolbox.org – S.Lott Dec 29 '11 at 20:32

3 Answers

On Pandas, you could use something like this: import pandas as pd import numpy as np the quick brown fox" transform to: h(the) mod 5 = 0 h(quick) mod 5 = 1 h(brown) mod 5 = 1 h(fox) mod 5 = 3

Use index rather then text value, saves space. To summarize some of the applications: dimensionality reduction for high dimension feature vector text in email classification task, collaborate filtering on spam sparsification bag-of-words on the fly cross-product features multi-task learning Reference: Origin paper: 1. Feature Hashing for Large Scale Multitask Learning 2. Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A., Strehl, A., & Vishwanathan, V. (2009). Hash kernels What is the hashing trick Quora Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing Implementation: Langford, J., Li, L., & Strehl, A. (2007). Vow- pal wabbit online learning project (Technical Report). http://hunch.net/?p=309. edited Feb 24 '17 at 5:36

answered Nov 7 '15 at 10:08

CodeFarmer 1,428

14

22

Can you comment on the impact of feature hashing on the learned model? since there will be hash collisions. Yes I know they're improbable and minimal etc, but collisions will occur; what is the impact of these collisions on the learned model? any pointers to research that looks into this question is appreciated. One thing is clear, the learned model from hashed features is NOT guaranteed to be the same Model you get from the original un-hashed features. How do they differ and to what degree? – Kai May 21 '16 at 17:03

@Kai I'v added original paper on this topic. Error boundary was analysed so does empirical results. Please take a look. – CodeFarmer Feb 24 '17 at 5:40

much appreciated, thank you. – Kai Feb 24 '17 at 18:08

Join Stack Overflow to learn, share knowledge, and build your career. Email Sign Up OR SIGN IN WITH

Google Facebook

Learn more

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.