HowTo-AzureML-create n-gram features using Feature Hashing for [PDF]

Aug 11, 2014 - HowTo-AzureML-create n-gram features using Feature Hashing for text data. AzureML has famous Vowpal-Wabbi

3 downloads 12 Views 154KB Size

Recommend Stories


Comparative Analysis of Hashing Schemes for Iris Identification using Local Features Ravi Kumar
Those who bring sunshine to the lives of others cannot keep it from themselves. J. M. Barrie

Adaptive Quantization for Hashing
You often feel tired, not because you've done too much, but because you've done too little of what sparks

Double-Bit Quantization for Hashing
Don't be satisfied with stories, how things have gone with others. Unfold your own myth. Rumi

hashing i
Never let your sense of morals prevent you from doing what is right. Isaac Asimov

Feature Reduction Using Ensemble Approach
Knock, And He'll open the door. Vanish, And He'll make you shine like the sun. Fall, And He'll raise

pdf 60th anniversary feature
We can't help everyone, but everyone can help someone. Ronald Reagan

Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
Stop acting so small. You are the universe in ecstatic motion. Rumi

Feature Feature
You often feel tired, not because you've done too much, but because you've done too little of what sparks

Image Recognition using Visual Features
Love only grows by sharing. You can only have more for yourself by giving it away to others. Brian

General features overview (PDF)
Where there is ruin, there is hope for a treasure. Rumi

Idea Transcript


Govind Kanshi Follow I help create reliable, pragmatic software solutions using the dainty words like Cloud and Data. I work at Azure Cosmos DB team. Aug 11, 2014

HowTo-AzureML-create n-gram features using Feature Hashing for text data AzureML has famous Vowpal-Wabbit’s Hashing trick embedded in it. It allows to use low cost, low impact hashing of the features give hashing bitsize and # of n-grams. Steps 1. Upload the Text file or read it. I just took bunch of text from a news site and loaded it as text file. 2. Use the Feature hashing module — specify hashing bit size and ngrams.

Output is the Features as columns. (visualized as below). Mostly this will be sparse.

What is n-gram — N-grams are contiguous sequences of n items from a given sequence. Given “I like Star trek movie” — 2-gram output would be — I like, like star, star trek … I could not find a way to “print” those ngram columns “values” in AzureML yet. Vowpal wabbit has the “ invert_hash” option to print them out. Background — http://alex.smola.org/papers/2009/Weinbergeretal09.pdf

http://www.cse.wustl.edu/~kilian/papers/ceas2009-paper-11.pdf

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.