In this tutorial, we will demonstrate how to do basic entity extraction in Datafari Community. This post is inspired from https://lucidworks.com/2013/06/27/poor-mans-entity-extraction-with-solr/
Note that for Datafari Enterprise, all the configuration is already done. You just need to add your custom rules in a specific UI, and for further advanced functionalities, Datafari Enterprise allows you to benefit from SolrTextTagger and 3rd party semantic entity extractors.
We want to extract 3 entities in our dataset (files from the Enron dataset in this example) :
- Phone number
- If the document is a resume
NOTE: This is the English version. For the French version, please scroll down.
UPDATE 08/08/16 : update of the post for Datafari v3
UPDATE 01/04/16 : beware that there is a bug with Docker toolbox 1.9.1 for the use of Cassandra (which is a component of Datafari). Update your Docker to 1.10+
This time, we’ll talk about the release of Datafari on Docker.
If you don’t know it yet, Docker is an emulation mechanism that works at a low level of the Linux kernel, hence making it faster than widespread technologies of virtualisation such as VMWare. As its name suggests, you can “dock” applications in an isolated manner, and it will work as a standalone system on your OS.
Although we recommend installing Datafari alone on systems when used in a productive environment, using Datafari on Docker allows you to quickly install Datafari without impacting the configuration and packages in place in your system. Just download the docker image, and the remainder is being taken care of by Docker.