Where there is ruin, there is hope for a treasure. Rumi
Idea Transcript
Govind Kanshi Follow I help create reliable, pragmatic software solutions using the dainty words like Cloud and Data. I work at Azure Cosmos DB team. Aug 11, 2014
HowTo-AzureML-create n-gram features using Feature Hashing for text data AzureML has famous Vowpal-Wabbit’s Hashing trick embedded in it. It allows to use low cost, low impact hashing of the features give hashing bitsize and # of n-grams. Steps 1. Upload the Text file or read it. I just took bunch of text from a news site and loaded it as text file. 2. Use the Feature hashing module — specify hashing bit size and ngrams.
Output is the Features as columns. (visualized as below). Mostly this will be sparse.
What is n-gram — N-grams are contiguous sequences of n items from a given sequence. Given “I like Star trek movie” — 2-gram output would be — I like, like star, star trek … I could not find a way to “print” those ngram columns “values” in AzureML yet. Vowpal wabbit has the “ invert_hash” option to print them out. Background — http://alex.smola.org/papers/2009/Weinbergeretal09.pdf