Tutorial – Deploying Solrcloud 7 on Amazon EC2

Posted on 2 January 2018 by admin

UPDATE: This tutorial is based on Solr 7. If you want to use Solr 8, we strongly recommend to use our recent blog entry to set up Solrcloud 8 on Amazon EC2

In this tutorial, we will be setting up a Solrcloud cluster on Amazon EC2.
We’ll be using Solr 7.1, Zookeeper 3.4.10 on Debian 9 instances.
This tutorial explains step by step how to reach this objective.

We will be installing a set of 3 machines, with 3 shards per server, which gives us a total of 9 shards. The replication factor is 3.
We will also be installing a Zookeeper ensemble of 3 machines.

This architecture will be flexible enough to allow for a fail-over of one or two machines, depending on whether we are at the indexing phase or at the querying phase:

Indexing: a machine can fail without impacting the cluster (the zookeeper ensemble of 3 machines allows for one machine down). The updates are successfully broadcasted to the machines still running.
Querying: two machines can fail without impacting the cluster. Since each machine hosts 3 shards, a search query can be processed without problems, the only constraints being a slower response time due to the higher load on the remaining machine.

Continue reading →

Tutorial – Deploying Solrcloud 6 on Amazon EC2

Posted on 12 June 2017 by admin

UPDATE: This tutorial is based on Solr 6. If you want to use Solr 8, we strongly recommend to use our recent blog entry to set up Solrcloud 8 on Amazon EC2

In this tutorial, we will be setting up a Solrcloud cluster on Amazon EC2.
We’ll be using Solr 6.6.0, Zookeeper 3.4.6 on Debian 8 instances.
This tutorial explains step by step how to reach this objective.

We will be installing a set of 3 machines, with 3 shards per server, which gives us a total of 9 shards. The replication factor is 3.
We will also be installing a Zookeeper ensemble of 3 machines.

This architecture will be flexible enough to allow for a fail-over of one or two machines, depending on whether we are at the indexing phase or at the querying phase:

Indexing: a machine can fail without impacting the cluster (the zookeeper ensemble of 3 machines allows for one machine down). The updates are successfully broadcasted to the machines still running.
Querying: two machines can fail without impacting the cluster. Since each machine hosts 3 shards, a search query can be processed without problems, the only constraints being a slower response time due to the higher load on the remaining machine.

Continue reading →

Generating big data sets for search engines

Posted on 27 January 2016 by Julien Massiera

NOTE: This is the English version. You will find the French version further down in this article.

When proposing our expertise search, we are often asked to do performance evaluations on large datasets, for instance in Proof of Concepts. For a recent customer request, in order to gain time and to not use sensitive customer data, we have used log-synth, a random data generator developed by Ted Dunning. We are describing here how to use log-synth in order to generate a 100.000 lines data set.

The first step, which we don’t document here, is about downloading log-synth, unzipping it and building it with maven.

Continue reading →

Enterprise Search Europe in London – Open source focus

Posted on 1 July 2015 by admin

NOTE: this post has a French version at the bottom of this page.

Enterprise Search Europe is the largest european event dedicated to Enterprise Search. Looking at this year’s agenda, I have the feeling a particular highlight will be given to open source. As in the recent years, several case studies are dedicated to open source, but in addition, the keynote will be focused on it. Charlie Hull, CEO and cofounder of Flax, expert in open source enterprise search, will be sharing his thoughts on the future of search and the link betweeb search and big data. Other open source tracks include a migration from Exalead to Apache Solr (the talk will be given by France Labs, yeeepieeeee), and a round table on open source implementation. You can find more details on the ESEU 2015 programme page.

Continue reading →

Mailing list Solr FR

Posted on 23 February 2015 by admin

NOTE: For English version, please look further down.

Nous avons créé une mailing list Solr Francophone, pour que les développeurs qui se sentent plus à l’aise en français qu’en anglais puissent échanger sur Solr dans la langue de Molière. Retrouvez-nous donc vite sur la mailing list Solr en français !

Continue reading →

Datafari on Docker

Posted on 28 January 2015 by admin

NOTE: This is the English version. For the French version, please scroll down.

UPDATE 08/08/16 : update of the post for Datafari v3

UPDATE 01/04/16 : beware that there is a bug with Docker toolbox 1.9.1 for the use of Cassandra (which is a component of Datafari). Update your Docker to 1.10+
https://github.com/docker/docker/issues/18180

This time, we’ll talk about the release of Datafari on Docker.

If you don’t know it yet, Docker is an emulation mechanism that works at a low level of the Linux kernel, hence making it faster than widespread technologies of virtualisation such as VMWare. As its name suggests, you can “dock” applications in an isolated manner, and it will work as a standalone system on your OS.

Although we recommend installing Datafari alone on systems when used in a productive environment, using Datafari on Docker allows you to quickly install Datafari without impacting the configuration and packages in place in your system. Just download the docker image, and the remainder is being taken care of by Docker.

Continue reading →

ManifoldCF 1.8 and 2.0 have been released

Posted on 5 January 2015 by admin

NOTE: This is the english version. For the French version, please scrolldown.

For those of you who use or keep an eye on ManifoldCF (it’s a connectors framework from the Apache foundation), its team just released (26th Dec. 2014) ManifoldCF 1.8 and 2.0. Yes, that’s two releases at the same time. Continue reading →

Schemaless Solr

Posted on 28 September 2014 by admin

NOTE: French version at the bottom of this page.

We can often see on the web that Elasticsearch is really cool because it is schemaless, and Solr is not. Although Elasticsearch is cool for many reasons, we want to remind you that Solr is also schemaless since July 2013 (Solr 4.4).

To remind you what schemaless means: Without manually editing the Solr schema, it can recognize some data types automatically when receiving data to be indexed. Those types are: Boolean, Integer, Long, Float, Double, and Date

That’s pretty convenient for quick prototyping. Still, as for Elasticsearch, Continue reading →

France Labs Enterprise Search Blog

blog on Enterprise Search, Solr, Datafari, ManifoldCF

Tutorial – Deploying Solrcloud 7 on Amazon EC2

Tutorial – Deploying Solrcloud 6 on Amazon EC2

Generating big data sets for search engines

Enterprise Search Europe in London – Open source focus

Mailing list Solr FR

Datafari on Docker

ManifoldCF 1.8 and 2.0 have been released

Schemaless Solr