Bye bye DIH – Hello Datafari

Replacing DIH with ManifoldCF easily with Datafari

So you were using DIH with your Solr, and you are worried that it may not be maintained actively anymore ? And you have difficulties to find a replacement or an alternative ? We propose here a replacement that relies on Apache ManifoldCF and Datafari, projects that have been actively maintained and updated for several years now.

Datafari is an open source Enterprise Search solution, that – among other things – embeds Apache ManifoldCF and Apache Solr. As such, by installing it you are just some scripts away from having a fully functional DB crawler that fetches the data and sends it to an Apache Solr. Which is exactly what DIH was doing! As a bonus, ManifoldCF can do much more as it proposes plenty of connectors for different sources, and graphical capabilities to configure your crawling (SLAs, time windows, data processing…).

So hop in, and give a look at our DIH replacement tutorial on the Datafari wiki.

Integrating Solr with SPIP

Note: French version available at the second half of this blog entry.

Note Note: don’t hesitate to test our new open source package solution Datafari, which combines Apache ManifoldCF, Apache Solr and AjaxFranceLabs 🙂

SPIP is a well known open source platform. We wanted to share with you how to integrate graphically a Solr server with a SPIP server. The scenario is the following: you already have SPIP based web site, and you want to have a nice search functionnality based on the lastet Solr, to benefit from all its cool functionalities. You have set up a Solr, you have crawled your SPIP content, but now you want to have your Solr search in your SPIP website. This is what we present in this tutorial.

Continue reading

Potential security risk if you use Solr together with an internet facing CMS

We recently stumbled upon a detailed article on a Solr attack using SSRF, by Nicolas Grégoire. To summarise: if you think you are safe because you have your Solr hidden behind another system, and that you have only a http server facing the web to make things ok, you may have problems you did not think about.

While reading this article, I was thinking about use cases related to CMS systems with users management, and which are accessible from the web. They are a good fit for such attacks. The good news is that Solr 4.6 solves this vulnerability. The bad news is that you need to do your migration quickly if you want to sleep well 😉

Tutorial for combining ManifoldCF and Solr for files search

NOTE: If you are interested in using ManifoldCF with Solr, you may want to look at our Datafari software, which combines Apache ManifoldCF with Solr, so it eases this kind of integration. The code is available on google code: https://github.com/francelabs/datafari

With the arrival of Manifold CF 1.0 (now already in v2.5), the open source community is looking for tutorials to combine it with Solr 4. That’s the intent of this tutorial, which will drive you through the different steps required to make it work.

First, we’ll recap the installation process of Manifold CF (we’ll call it MCF later on), and of Solr. Second, we’ll configure both tools so that they can interact with each other. Third, we’ll configure MCF so that it crawls a windows file share. In this tutorial, when I specify installation directory such as solr-4.1.0, you have to complete with the absolute path of the installation directory. Continue reading

Searching everything = Talend + Constellio + Solr

We are preparing a series of blog entries for January/February 2013, related to combining Talend, Constellio and Solr in order to benefit from the power of Talend to have way more connectors to be used in combination with Constellio. We didn’t have time yet to work on a pure Talend + Solr solution, which would leverage ManifoldCF, so our entries will be about using the Google Connector Manager used in Constellio 1.3

Don’t hesitate to share with us if it is a blog series that is exciting for you.