Schemaless Solr

NOTE: French version at the bottom of this page.

We can often see on the web that Elasticsearch is really cool because it is schemaless, and Solr is not. Although Elasticsearch is cool for many reasons, we want to remind you that Solr is also schemaless since July 2013 (Solr 4.4).

To remind you what schemaless means: Without manually editing the Solr schema, it can recognize some data types  automatically when receiving data to be indexed. Those types are: Boolean, Integer, Long, Float, Double, and Date

That’s pretty convenient for quick prototyping. Still, as for Elasticsearch, Continue reading

Tutorial for setting up SolrCloud on Amazon EC2

UPDATE: This tutorial is based on Solr 4. If you want to use Solr 6, we strongly recommend to use our recent blog entry to set up Solrcloud 6 on Amazon EC2

NOTE: There is French version to this tutorial, which you’ll find on the second half of this blog entry.

In this tutorial, we’ll be installing a SolrCloud cluster on Amazon EC2.
We’ll be using Solr 4.9, Tomcat 7 and Zookeeper 3.4.6 on Debian 7 instances.
This tutorial will explain how to achieve this result.
We’ll be installing a set of 3 machines with 3 shards and 2 replicas per shard, thus creating a set of 9 shards.
We’ll also be installing a Zookeeper ensemble of 3 machines.

Continue reading

Tutorial for combining ManifoldCF and Elasticsearch for files search

With the arrival of Manifold CF 1.0 (now already in v1.6.1), the open source community is looking for tutorials to combine it with Elasticsearch. That’s the intent of this tutorial, which will drive you through the different steps required to make it work.

First, we’ll recap the installation process of Manifold CF (we’ll call it MCF later on). Second, we will install ElasticSearch with the attachment plugin so that it handles rich document indexing. Third, we’ll configure MCF so that it crawls a windows file share and indexes documents in ElasticSearch. In this tutorial, when I specify installation directory such as apache-manifoldcf-1.6.1, you have to complete with the absolute path of the installation directory.
Continue reading

Binary version of SolrMeter for Solr 4

NOTE: English version on top, French version below.

We have noticed that in Solr 4, there is problem with the UI related to cache hit ratio evaluation of SolrMeter. Digging a bit, the problem is due to a type change between Solr 3 and Solr 4. SolrMeter expects a string, whereas Solr4 sends back a float. More precisely, Solr 4 does that within its request handler mbean, in the cache sub category.

We’re now using a patch available for this bug, created by Javier Mendez, see his contribution on this google group.

Still, there is no binary version of SolrMeter, hence this blog. Continue reading

Hadoop 0.20 and refactoring of the Yahoo sample code on reversed index

There are several MapReduce snippets to test and learn about Hadoop.
One of these samples is the reversed index, i.e. for each word we want to know which file it comes from. Thus the ouptut file should look like this:
hello                                     test.txt
formation formation.txt        test.txt

This example is mentioned on the Yahoo developer network, but it doesn’t work as is on version 0.20 of Hadoop.
We decided to rewrite parts of the code in order to make it compatible. This is what you will find in this blog article.

Continue reading

Integrating Solr with SPIP

Note: French version available at the second half of this blog entry.

Note Note: don’t hesitate to test our new open source package solution Datafari, which combines Apache ManifoldCF, Apache Solr and AjaxFranceLabs 🙂

SPIP is a well known open source platform. We wanted to share with you how to integrate graphically a Solr server with a SPIP server. The scenario is the following: you already have SPIP based web site, and you want to have a nice search functionnality based on the lastet Solr, to benefit from all its cool functionalities. You have set up a Solr, you have crawled your SPIP content, but now you want to have your Solr search in your SPIP website. This is what we present in this tutorial.

Continue reading

Constellio dev environment

Disclaimer: This blog is not really new, as it’s just the migration of the technical content of our website – see further down for the French version.

This tutorial explains how to start Constellio in a development environment. The first part shows how to download, setup and start Constellio in Eclipse with the default database (Derby). The second part shows how to install MySQL and to configure Constellio to use this database. Continue reading

Backup Constellio Collections and Connectors

Disclaimer: This blog is not really new, as it’s just the migration of the technical content of our website – see further down for the French version.

English: This tutorial explains how to backup and restore Collections and Connectors in Constellio.

French: Cette vidéo explique comment sauvegarder puis restaurer des Collections et des Connecteurs dans Constellio. Continue reading

Create a plugin for Constellio

Disclaimer: This blog is not really new, as it’s just the migration of the technical content of our website – see further down for the French version.

France Labs, the european partner of Doculibre on the Constellio solution, gives you this video explaining how to create a plugin for Constellio. Constellio is currently the most complete open source enterprise search solution available. Continue reading

Active Directory

Disclaimer: This blog is not really new, as it’s just the migration of the technical content of our website – see further down for the French version.

NOTE: If you are interested in using AD with Solr, you may want to look at our Datafari software (still in Alpha version), which combines Apache ManifoldCF with Solr, so it eases this kind of integration. The code is available on google code:

In enterprise environments, enterprise search often needs a security aspect which is not necessary for standard web search. In order to assist you, we release here a small code in order to allow Constellio 1.2 (and probably 1.3 although we didn’t test it) to connect to an Active Directory in order to do the credentials check at authentication time. Here is how it works: Continue reading