Every block of stone has a statue inside it and it is the task of the sculptor to discover it. Mich
Idea Transcript
Cornell University Library We gratefully acknowledge support from the Simons Foundation and member institutions
arXiv.org > cs > arXiv:1608.08940 Search or Article ID All papers (Help | Advanced search) Full-text links:
Download: PDF Other formats
Current browse context: cs.CL < prev | next > new | recent | 1608
Change to browse by: cs cs.IR
cs.LG
References & Citations NASA ADS
DBLP - CS Bibliography listing | bibtex Luis Argerich Joaquín Torré Zaffaroni Matías J. Cano
Bookmark (what is this?)
Computer Science > Computation and Language Title: Hash2Vec, Feature Hashing for Word Embeddings Authors: Luis Argerich, Joaquín Torré Zaffaroni, Matías J Cano (Submitted on 31 Aug 2016) Abstract: In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words. We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications. Comments: ASAI 2016, 45JAIIO Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG) Journal reference: 45 JAIIO - ASAI 2016 - ISSN: 2451-7585 - Pages 33-40
Cite as:
arXiv:1608.08940 [cs.CL] (or arXiv:1608.08940v1 [cs.CL] for this version)
Submission history From: Luis Argerich [view email] [v1] Wed, 31 Aug 2016 17:01:09 GMT (596kb,D) Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?) Link back to: arXiv, form interface, contact.