OpenIE - The Stanford Natural Language Processing Group [PDF]

About | Download | Usage | Support | Questions | Release history |. About. Open information extraction (open IE) refers

20 downloads 17 Views 248KB Size

Recommend Stories


natural Language processing
Happiness doesn't result from what we get, but from what we give. Ben Carson

Natural Language Processing
Make yourself a priority once in a while. It's not selfish. It's necessary. Anonymous

Natural Language Processing g
Respond to every call that excites your spirit. Rumi

Natural Language Processing
Nothing in nature is unbeautiful. Alfred, Lord Tennyson

[PDF] Natural Language Processing with Python
Live as if you were to die tomorrow. Learn as if you were to live forever. Mahatma Gandhi

Evaluating Natural Language Processing Systems
Every block of stone has a statue inside it and it is the task of the sculptor to discover it. Mich

Workshop on Natural Language Processing
Suffering is a gift. In it is hidden mercy. Rumi

natural language processing in lisp
The only limits you see are the ones you impose on yourself. Dr. Wayne Dyer

Deep Learning in Natural Language Processing
Ask yourself: Do I believe that everything is meant to be, or do I think that things just tend to happen

embedded sublanguages and natural language processing
Learn to light a candle in the darkest moments of someone’s life. Be the light that helps others see; i

Idea Transcript


Software (/software/) > Stanford OpenIE (/software/)

Stanford Open Information Extraction | About | Download | Usage | Support | Questions | Release history |

About Open information extraction (open IE) refers to the extraction of relation tuples, typically binary relations, from plain text. The central difference is that the schema for these relations does not need to be specified in advance; typically the relation name is just the text linking two arguments. For example, Barack Obama was born in Hawaii would create a triple (Barack Obama; was born in; Hawaii), corresponding to the open domain relation was-born-in(Barack-Obama, Hawaii). This software is a Java implementation of an open IE system as described in the paper: Gabor Angeli, Melvin Johnson Premkumar, and Christopher D. Manning. Leveraging Linguistic Structure For Open Domain Information Extraction (http://nlp.stanford.edu/pubs/2015angeliopenie.pdf). In Proceedings of the Association of Computational Linguistics (ACL), 2015.

The system first splits each sentence into a set of entailed clauses. Each clause is then maximally shortened, producing a set of entailed shorter sentence fragments. These fragments are then segmented into OpenIE triples, and output by the system. An illustration of the process is given for an example sentence below:

The system was originally written by Gabor Angeli and Melvin Johnson Premkumar. It requires Java 1.8+ to be installed, and generally requires around 50MB of memory in addition to the memory used by the part of speech tagger and dependency parser (and optional named entity recognizer). We recommend running java with around 1gb of memory (2gb if using NER) to be safe (i.e., java -mx1g ). The system is licensed under the GNU General Public License (http://www.gnu.org/licenses/gpl-2.0.html) (v2 or later). Source is included. The package includes components for command-line invocation, and a Java API. The code is dual licensed (in a similar manner to MySQL, etc.). Open source licensing is under the full GPL, which allows many free uses. For distributors of proprietary software (http://www.gnu.org/licenses/gpl-faq.html#GPLInProprietarySystem), commercial licensing (http://otlportal.stanford.edu/techfinder/technology/ID=26062) is available. If you don't need a commercial license, but would like to support maintenance of these tools, we welcome gift funding.

Download Stanford OpenIE is a part of Stanford CoreNLP (http://nlp.stanford.edu/software/corenlp.html). Download a copy of CoreNLP, and you are ready to go!

Usage Once downloaded, the code can be invoked either programmatically or from the command line, either directly via its own class or through running StanfordCoreNLP with the openie annotator. The OpenIE program provides some useful OpenIE triple output formats and can be invoked with the following command. This will read lines from standard input, and produce relation triples in a tab separated format: (confidence; subject; relation; object). java -mx1g -cp "*" edu.stanford.nlp.naturalli.OpenIE

To process files, simply pass them in as arguments to the program. For example, java -mx1g -cp "*" edu.stanford.nlp.naturalli.OpenIE /path/to/file1 /path/to/file2

In addition, there are a number of flags you can set to tweak the behavior of the program. Flag

Argument

Description Change the output format of the program. Default will produce tab separated columns for confidence, the subject, relation, and the object of a relation. ReVerb will output a TSV in the ReVerb format

-format

{reverb, ollie, default}

(https://github.com/knowitall/reverb/blob/master/README.md#command-line-interface). Ollie will output relations in the default format returned by Ollie (https://github.com/knowitall/ollie/blob/master/README.md).

-filelist

/path/to/filelist

-threads

integer

A path to a file, which contains files to annotate. Each file should be on its own line. If this option is set, only these files are annotated and the files passed via bare arguments are ignored. The number of threads to run on. By default, this is the number of threads on the system. The maximum number of entailments to produce for each clause extracted in the sentence. The larger

-max_entailments_per_clause

integer

this value is, the slower the system will run, but the more relations it can potentially extract. Setting this below 100 is not recommended; setting it above 1000 is likewise not recommended. If true, resolve pronouns to their canonical antecedent. This option requires additional CoreNLP

-resolve_coref

annotators not included in the distribution, and therefore only works if used with the CoreNLP OpenIE

boolean

annotator (http://stanfordnlp.github.io/CoreNLP/openie.html), or invoked via the command line from the CoreNLP jar.

-ignore_affinity

boolean

Ignore the affinity model for prepositional attachments.

-affinity_probability_cap

double

The affinity value above which confidence of the extraction is taken as 1.0. Default is 1/3. If true (the default), extract triples only if they consume the entire fragment. This is useful for ensuring that

-triple.strict

boolean

only logically warranted triples are extracted, but puts more burden on the entailment system to find minimal phrases (see -max_entailments_per_clause).

-triple.all_nominals

boolean

-splitter.model

/path/to/model.ser.gz

If true, extract nominal relations always and not only when a named entity tag warrants it. This greatly overproduces such triples, but can be useful in certain situations. [rare] You can override the default location of the clause splitting model with this option.

-splitter.nomodel

[rare] Run without a clause splitting model -- that is, split on every clause.

-splitter.disable

[rare] Don't split clauses at all, and only extract relations centered around the root verb.

-affinity_model

/path/to/model_dir

[rare] A custom location to read the affinity models from.

The code can also be invoked programatically, using Stanford CoreNLP (http://nlp.stanford.edu/software/corenlp.html). For this, simply include the annotators natlog and openie in the annotators property, and add any of the flags described above to the properties file prepended with the string "openie." Note that openie depends on the annotators "tokenize,ssplit,pos,depparse". An example working code snippet is provided below. This snippet will annotate the text "Obama was born in Hawaii. He is our president," and print out each extraction from the document to the console. package edu.stanford.nlp.naturalli; import edu.stanford.nlp.ie.util.RelationTriple; import edu.stanford.nlp.io.IOUtils; import edu.stanford.nlp.ling.CoreAnnotations; import edu.stanford.nlp.pipeline.Annotation; import edu.stanford.nlp.pipeline.StanfordCoreNLP; import edu.stanford.nlp.semgraph.SemanticGraph; import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations; import edu.stanford.nlp.util.CoreMap; import edu.stanford.nlp.util.PropertiesUtils; import java.util.Collection; import java.util.List; import java.util.Properties; /** * A demo illustrating how to call the OpenIE system programmatically. * You can call this code with: * * * java -mx1g -cp stanford-openie.jar:stanford-openie-models.jar edu.stanford.nlp.naturalli.OpenIEDemo * * */ public class OpenIEDemo { private OpenIEDemo() {} // static main public static void main(String[] args) throws Exception { // Create the Stanford CoreNLP pipeline Properties props = PropertiesUtils.asProperties( "annotators", "tokenize,ssplit,pos,lemma,depparse,natlog,openie" ); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); // Annotate an example document. String text; if (args.length > 0) { text = IOUtils.slurpFile(args[0]); } else { text = "Obama was born in Hawaii. He is our president."; } Annotation doc = new Annotation(text); pipeline.annotate(doc); // Loop over sentences in the document int sentNo = 0; for (CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)) { System.out.println("Sentence #" + ++sentNo + ": " + sentence.get(CoreAnnotations.TextAnnotation.class)); // Print SemanticGraph System.out.println(sentence.get(SemanticGraphCoreAnnotations.EnhancedDependenciesAnnotation.class).toString(SemanticGraph.OutputFormat.LIST )); // Get the OpenIE triples for the sentence Collection triples = sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class); // Print the triples for (RelationTriple triple : triples) { System.out.println(triple.confidence + "\t" + triple.subjectLemmaGloss() + "\t" + triple.relationLemmaGloss() + "\t" + triple.objectLemmaGloss()); } // Alternately, to only run e.g., the clause splitter: List clauses = new OpenIE(props).clausesInSentence(sentence); for (SentenceFragment clause : clauses) { System.out.println(clause.parseTree.toString(SemanticGraph.OutputFormat.LIST)); } System.out.println(); } } }

This Gist (https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/naturalli/OpenIEDemo.java) brought to you by gist-it (http://gist-it.appspot.com). view raw (https://github.com/stanfordnlp/CoreNLP/raw/master/src/edu/stanford/nlp/naturalli/OpenIEDemo.java) src/edu/stanford/nlp/naturalli/OpenIEDemo.java (https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/naturalli/OpenIEDemo.java)

Frequently Asked Questions See Support.

Support We recommend asking questions on StackOverflow, using the stanford-nlp tag (http://stackoverflow.com/questions/tagged/stanford-nlp). In addition, we have 3 mailing lists that you can write to, all of which are shared with other JavaNLP tools (with the exclusion of the parser). Each address is at @lists.stanford.edu : 1. java-nlp-user This is the best list to post to in order to send feature requests, make announcements, or for discussion among JavaNLP users. (Please ask support questions on Stack Overflow (http://stackoverflow.com) using the stanford-nlp tag.) You have to subscribe to be able to use this list. Join the list via this webpage (https://mailman.stanford.edu/mailman/listinfo/java-nlp-user) or by emailing [email protected] . (Leave the subject and message body empty.) You can also look at the list archives (https://mailman.stanford.edu/pipermail/java-nlp-user/).

2. java-nlp-announce This list will be used only to announce new versions of Stanford JavaNLP tools. So it will be very low volume (expect 1-3 messages a year). Join the list via this webpage (https://mailman.stanford.edu/mailman/listinfo/java-nlp-announce) or by emailing [email protected] . (Leave the subject and message body empty.) 3. java-nlp-support This list goes only to the software maintainers. It's a good address for licensing questions, etc. For general use and support questions, you're better off joining and using java-nlp-user . You cannot join java-nlp-support , but you can mail questions to [email protected] .

Release History Version Date 3.6.0

Description Resources

2015-12-09 First release code (/projects/naturalli/stanford-openie-3.6.0.jar) / models (/projects/naturalli/stanford-openie-models-3.6.0.jar) / source (/projects/naturalli/stanfordopenie-src-3.6.0.jar)

Outdated instructions Below are the original download instructions. They are now outdated. Since the release of CoreNLP v. 3.6.0, you can get the latest version by downloading CoreNLP!

Download To run the code, you need both the OpenIE code jar, as well as the models jar in your classpath. Both of these must be included for the system to work. These can be downloaded below: Download the original OpenIE Code (/projects/naturalli/stanford-openie.jar) [5.5 MB] Download the original OpenIE Models (/projects/naturalli/stanford-openie-models.jar) [58 MB] The source code for the system can be downloaded from the link below: Download the original OpenIE Source Code (/projects/naturalli/stanford-openie-src.jar) [3.5 MB] Lastly, if you have already downloaded models for the dependency parser and part-of-speech tagger, you can download only the OpenIE models from the link below. This is not recommended unless you are sure you know what you are doing. Download only the original OpenIE-specific Models (/projects/naturalli/stanford-openie-only-models.jar) [20 MB]

Stanford NLP Group

Affiliated Groups

Connect

Local links

Gates Computer Science Building

Stanford AI Lab

Stack Overflow

NLP lunch (/local/nlp_lunch.shtml) · NLP

353 Serra Mall

(http://ai.stanford.edu/)

(http://stackoverflow.com/tags/stanford-

Reading Group (http://nlp.stanford.edu/read/)

Stanford InfoLab

nlp)

NLP Seminar (http://nlp.stanford.edu/seminar/) ·

(http://infolab.stanford.edu/)

Github

CSLI (https://www-csli.stanford.edu/)

(https://github.com/stanfordnlp/CoreNLP)

Stanford, CA 94305-9020 Directions and Parking (http://forum.stanford.edu/visitors/directions/gates.php)

Calendar (/local/calendar.shtml) JavaNLP (/javanlp/) (javadocs (/nlp/javadoc/javanlp/)) · machines

Twitter

(/local/machines.shtml)

(https://twitter.com/stanfordnlp)

AI Speakers (http://ai.stanford.edu/portfolioview/distinguished-speaker-series) · Q&A (/local/qa/)

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.