SOC 553 Introduction to Text Mining and Statistical Natural Language ... [PDF]

Text Books. Required. Sholom M. Weiss, Nitin Indurkhya, and Tong Zhang , Fundamentals of Predictive Text Mining ,. Sprin

25 downloads 72 Views 10KB Size

Recommend Stories


indian language text mining
If you feel beautiful, then you are. Even if you don't, you still are. Terri Guillemets

An Introduction to Text Mining Research Papers
Be like the sun for grace and mercy. Be like the night to cover others' faults. Be like running water

Statistical Issues in Quantifying Text Mining Performance
Don’t grieve. Anything you lose comes round in another form. Rumi

Introduction to Security Operations and the SOC
You can never cross the ocean unless you have the courage to lose sight of the shore. Andrè Gide

Introduction to Data Mining
The happiest people don't have the best of everything, they just make the best of everything. Anony

Introduction to Statistical Inference
Ask yourself: When was the last time I did something nice for others? Next

Text Mining What is Text Mining
Live as if you were to die tomorrow. Learn as if you were to live forever. Mahatma Gandhi

Introduction to Language and Linguistics
Every block of stone has a statue inside it and it is the task of the sculptor to discover it. Mich

Foundations of Statistical Natural Language Processing
You're not going to master the rest of your life in one day. Just relax. Master the day. Than just keep

Foundations of Statistical Natural Language Processing
Nothing in nature is unbeautiful. Alfred, Lord Tennyson

Idea Transcript


SOC 553 Introduction to Text Mining and Statistical Natural Language Processing Syllabus The syllabus below describes a recent offering of the course, but it may not be completely up to date. For current details about this course, please contact the course coordinator. Course coordinators are listed on the course listing for undergraduate courses and graduate courses.

Text Books Required Sholom M. Weiss, Nitin Indurkhya, and Tong Zhang , Fundamentals of Predictive Text Mining , Springer, 2010, ISBN 978-1-84996-225-4

Recommended Christopher D. Manning and Hinrich Schutze , Foundations of Statistical Natural Language Processing , MIT Press, 1999, ISBN 978-0-262-13360-1 Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze , Introduction to Information Retrieval , Cambridge University Press, 2008, ISBN 978 -0-521-86571-5 Steven Pinker , Words and Rules , Perennial/Harper Collins, 2000, ISBN 978-0-060-95840-4

Week-by-Week Schedule Week Topics Covered

Reading

Assignments

1

Overview, Problem Types, Text vs. Data Mining chap 1, appendix A

Respond to following Questions and Exercises in 1.11 1-4. Install Software. Read manuals (tmsk.pdf , riktext.pdf) and learn to use software by week 4.

2

Collect, Standardize, Tokenize, Generate Vectors, Term Frequencies-Inverse Document Frequencies (tf-idf)

sections 2.1-2.5 Assignment 1: Create termdocument spreadsheet .by hand. using algorithms in Figures 2.3, 2.4, 2.5, and 2.7 for assignment documents.

3

Sentence Boundaries, Parts-of- Speech Tagging, word Sense Disambiguation, Full Sentence Parsing

sections 2.6-2.12

4

Application of software to extract results of Chapter 2 topics

5

Classification: Nearest Neighbor, Decision Rules/Trees

chap 3 thru 3.4.4

Respond to following Questions and Exercises in 3.9: 5-6

6

Classification: Probabilistic, Weighted Scores, Evaluation

sections 3.4.5-3.6

Respond to following Questions and Exercises in 3.9: 8-9, 12

7

Midterm

chap 1-3

8

Information Retrieval

chap 4

Respond to following Questions and Exercises in 4.1: 1-4

9

Document Collection Structure: Similarity, Clustering, Evaluation

chap 5

Respond to following Questions and Exercises in 5.8: 11-13

10

Information Retrieval and Extraction

chap 6

Respond to following Questions and Exercises in 6.8: 3-6

Assignment 2: Apply algorithm .by hand. from Figure 2.8 to results of Assignment 1. Also generate parse trees for these sentences. Finish learning software and respond to following Questions and Exercises in 2.15: 1-6

Week Topics Covered

Reading

Assignments

11

Mixed Text and Data from Databases, WWW, and other Hybrid Sources

chap 7

Respond to following Questions and Exercises in 7.8: 5-7

12

Applications

chap 8

Research Project: find report on an application not listed in text and describe it similarly to the text descriptions including problem, solution overview, methods and procedures, and deployment

13

Advanced Topics: Summarization, Active Learning

chap 9

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.