Software (/software/) > Stanford OpenIE (/software/)
Stanford Open Information Extraction | About | Download | Usage | Support | Questions | Release history |
About Open information extraction (open IE) refers to the extraction of relation tuples, typically binary relations, from plain text. The central difference is that the schema for these relations does not need to be specified in advance; typically the relation name is just the text linking two arguments. For example, Barack Obama was born in Hawaii would create a triple (Barack Obama; was born in; Hawaii), corresponding to the open domain relation was-born-in(Barack-Obama, Hawaii). This software is a Java implementation of an open IE system as described in the paper: Gabor Angeli, Melvin Johnson Premkumar, and Christopher D. Manning. Leveraging Linguistic Structure For Open Domain Information Extraction (http://nlp.stanford.edu/pubs/2015angeliopenie.pdf). In Proceedings of the Association of Computational Linguistics (ACL), 2015.
The system first splits each sentence into a set of entailed clauses. Each clause is then maximally shortened, producing a set of entailed shorter sentence fragments. These fragments are then segmented into OpenIE triples, and output by the system. An illustration of the process is given for an example sentence below:
The system was originally written by Gabor Angeli and Melvin Johnson Premkumar. It requires Java 1.8+ to be installed, and generally requires around 50MB of memory in addition to the memory used by the part of speech tagger and dependency parser (and optional named entity recognizer). We recommend running java with around 1gb of memory (2gb if using NER) to be safe (i.e., java -mx1g ). The system is licensed under the GNU General Public License (http://www.gnu.org/licenses/gpl-2.0.html) (v2 or later). Source is included. The package includes components for command-line invocation, and a Java API. The code is dual licensed (in a similar manner to MySQL, etc.). Open source licensing is under the full GPL, which allows many free uses. For distributors of proprietary software (http://www.gnu.org/licenses/gpl-faq.html#GPLInProprietarySystem), commercial licensing (http://otlportal.stanford.edu/techfinder/technology/ID=26062) is available. If you don't need a commercial license, but would like to support maintenance of these tools, we welcome gift funding.
Download Stanford OpenIE is a part of Stanford CoreNLP (http://nlp.stanford.edu/software/corenlp.html). Download a copy of CoreNLP, and you are ready to go!
Usage Once downloaded, the code can be invoked either programmatically or from the command line, either directly via its own class or through running StanfordCoreNLP with the openie annotator. The OpenIE program provides some useful OpenIE triple output formats and can be invoked with the following command. This will read lines from standard input, and produce relation triples in a tab separated format: (confidence; subject; relation; object). java -mx1g -cp "*" edu.stanford.nlp.naturalli.OpenIE
To process files, simply pass them in as arguments to the program. For example, java -mx1g -cp "*" edu.stanford.nlp.naturalli.OpenIE /path/to/file1 /path/to/file2
In addition, there are a number of flags you can set to tweak the behavior of the program. Flag
Argument
Description Change the output format of the program. Default will produce tab separated columns for confidence, the subject, relation, and the object of a relation. ReVerb will output a TSV in the ReVerb format
-format
{reverb, ollie, default}
(https://github.com/knowitall/reverb/blob/master/README.md#command-line-interface). Ollie will output relations in the default format returned by Ollie (https://github.com/knowitall/ollie/blob/master/README.md).
-filelist
/path/to/filelist
-threads
integer
A path to a file, which contains files to annotate. Each file should be on its own line. If this option is set, only these files are annotated and the files passed via bare arguments are ignored. The number of threads to run on. By default, this is the number of threads on the system. The maximum number of entailments to produce for each clause extracted in the sentence. The larger
-max_entailments_per_clause
integer
this value is, the slower the system will run, but the more relations it can potentially extract. Setting this below 100 is not recommended; setting it above 1000 is likewise not recommended. If true, resolve pronouns to their canonical antecedent. This option requires additional CoreNLP
-resolve_coref
annotators not included in the distribution, and therefore only works if used with the CoreNLP OpenIE
boolean
annotator (http://stanfordnlp.github.io/CoreNLP/openie.html), or invoked via the command line from the CoreNLP jar.
-ignore_affinity
boolean
Ignore the affinity model for prepositional attachments.
-affinity_probability_cap
double
The affinity value above which confidence of the extraction is taken as 1.0. Default is 1/3. If true (the default), extract triples only if they consume the entire fragment. This is useful for ensuring that
-triple.strict
boolean
only logically warranted triples are extracted, but puts more burden on the entailment system to find minimal phrases (see -max_entailments_per_clause).
-triple.all_nominals
boolean
-splitter.model
/path/to/model.ser.gz
If true, extract nominal relations always and not only when a named entity tag warrants it. This greatly overproduces such triples, but can be useful in certain situations. [rare] You can override the default location of the clause splitting model with this option.
-splitter.nomodel
[rare] Run without a clause splitting model -- that is, split on every clause.
-splitter.disable
[rare] Don't split clauses at all, and only extract relations centered around the root verb.
-affinity_model
/path/to/model_dir
[rare] A custom location to read the affinity models from.
The code can also be invoked programatically, using Stanford CoreNLP (http://nlp.stanford.edu/software/corenlp.html). For this, simply include the annotators natlog and openie in the annotators property, and add any of the flags described above to the properties file prepended with the string "openie." Note that openie depends on the annotators "tokenize,ssplit,pos,depparse". An example working code snippet is provided below. This snippet will annotate the text "Obama was born in Hawaii. He is our president," and print out each extraction from the document to the console. package edu.stanford.nlp.naturalli; import edu.stanford.nlp.ie.util.RelationTriple; import edu.stanford.nlp.io.IOUtils; import edu.stanford.nlp.ling.CoreAnnotations; import edu.stanford.nlp.pipeline.Annotation; import edu.stanford.nlp.pipeline.StanfordCoreNLP; import edu.stanford.nlp.semgraph.SemanticGraph; import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations; import edu.stanford.nlp.util.CoreMap; import edu.stanford.nlp.util.PropertiesUtils; import java.util.Collection; import java.util.List; import java.util.Properties; /** * A demo illustrating how to call the OpenIE system programmatically. * You can call this code with: * * * java -mx1g -cp stanford-openie.jar:stanford-openie-models.jar edu.stanford.nlp.naturalli.OpenIEDemo * * */ public class OpenIEDemo { private OpenIEDemo() {} // static main public static void main(String[] args) throws Exception { // Create the Stanford CoreNLP pipeline Properties props = PropertiesUtils.asProperties( "annotators", "tokenize,ssplit,pos,lemma,depparse,natlog,openie" ); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); // Annotate an example document. String text; if (args.length > 0) { text = IOUtils.slurpFile(args[0]); } else { text = "Obama was born in Hawaii. He is our president."; } Annotation doc = new Annotation(text); pipeline.annotate(doc); // Loop over sentences in the document int sentNo = 0; for (CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)) { System.out.println("Sentence #" + ++sentNo + ": " + sentence.get(CoreAnnotations.TextAnnotation.class)); // Print SemanticGraph System.out.println(sentence.get(SemanticGraphCoreAnnotations.EnhancedDependenciesAnnotation.class).toString(SemanticGraph.OutputFormat.LIST )); // Get the OpenIE triples for the sentence Collection triples = sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class); // Print the triples for (RelationTriple triple : triples) { System.out.println(triple.confidence + "\t" + triple.subjectLemmaGloss() + "\t" + triple.relationLemmaGloss() + "\t" + triple.objectLemmaGloss()); } // Alternately, to only run e.g., the clause splitter: List clauses = new OpenIE(props).clausesInSentence(sentence); for (SentenceFragment clause : clauses) { System.out.println(clause.parseTree.toString(SemanticGraph.OutputFormat.LIST)); } System.out.println(); } } }
This Gist (https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/naturalli/OpenIEDemo.java) brought to you by gist-it (http://gist-it.appspot.com). view raw (https://github.com/stanfordnlp/CoreNLP/raw/master/src/edu/stanford/nlp/naturalli/OpenIEDemo.java) src/edu/stanford/nlp/naturalli/OpenIEDemo.java (https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/naturalli/OpenIEDemo.java)
Frequently Asked Questions See Support.
Support We recommend asking questions on StackOverflow, using the stanford-nlp tag (http://stackoverflow.com/questions/tagged/stanford-nlp). In addition, we have 3 mailing lists that you can write to, all of which are shared with other JavaNLP tools (with the exclusion of the parser). Each address is at @lists.stanford.edu : 1. java-nlp-user This is the best list to post to in order to send feature requests, make announcements, or for discussion among JavaNLP users. (Please ask support questions on Stack Overflow (http://stackoverflow.com) using the stanford-nlp tag.) You have to subscribe to be able to use this list. Join the list via this webpage (https://mailman.stanford.edu/mailman/listinfo/java-nlp-user) or by emailing
[email protected] . (Leave the subject and message body empty.) You can also look at the list archives (https://mailman.stanford.edu/pipermail/java-nlp-user/).
2. java-nlp-announce This list will be used only to announce new versions of Stanford JavaNLP tools. So it will be very low volume (expect 1-3 messages a year). Join the list via this webpage (https://mailman.stanford.edu/mailman/listinfo/java-nlp-announce) or by emailing
[email protected] . (Leave the subject and message body empty.) 3. java-nlp-support This list goes only to the software maintainers. It's a good address for licensing questions, etc. For general use and support questions, you're better off joining and using java-nlp-user . You cannot join java-nlp-support , but you can mail questions to
[email protected] .
Release History Version Date 3.6.0
Description Resources
2015-12-09 First release code (/projects/naturalli/stanford-openie-3.6.0.jar) / models (/projects/naturalli/stanford-openie-models-3.6.0.jar) / source (/projects/naturalli/stanfordopenie-src-3.6.0.jar)
Outdated instructions Below are the original download instructions. They are now outdated. Since the release of CoreNLP v. 3.6.0, you can get the latest version by downloading CoreNLP!
Download To run the code, you need both the OpenIE code jar, as well as the models jar in your classpath. Both of these must be included for the system to work. These can be downloaded below: Download the original OpenIE Code (/projects/naturalli/stanford-openie.jar) [5.5 MB] Download the original OpenIE Models (/projects/naturalli/stanford-openie-models.jar) [58 MB] The source code for the system can be downloaded from the link below: Download the original OpenIE Source Code (/projects/naturalli/stanford-openie-src.jar) [3.5 MB] Lastly, if you have already downloaded models for the dependency parser and part-of-speech tagger, you can download only the OpenIE models from the link below. This is not recommended unless you are sure you know what you are doing. Download only the original OpenIE-specific Models (/projects/naturalli/stanford-openie-only-models.jar) [20 MB]
Stanford NLP Group
Affiliated Groups
Connect
Local links
Gates Computer Science Building
Stanford AI Lab
Stack Overflow
NLP lunch (/local/nlp_lunch.shtml) · NLP
353 Serra Mall
(http://ai.stanford.edu/)
(http://stackoverflow.com/tags/stanford-
Reading Group (http://nlp.stanford.edu/read/)
Stanford InfoLab
nlp)
NLP Seminar (http://nlp.stanford.edu/seminar/) ·
(http://infolab.stanford.edu/)
Github
CSLI (https://www-csli.stanford.edu/)
(https://github.com/stanfordnlp/CoreNLP)
Stanford, CA 94305-9020 Directions and Parking (http://forum.stanford.edu/visitors/directions/gates.php)
Calendar (/local/calendar.shtml) JavaNLP (/javanlp/) (javadocs (/nlp/javadoc/javanlp/)) · machines
Twitter
(/local/machines.shtml)
(https://twitter.com/stanfordnlp)
AI Speakers (http://ai.stanford.edu/portfolioview/distinguished-speaker-series) · Q&A (/local/qa/)