{"id":369,"date":"2016-01-27T11:06:24","date_gmt":"2016-01-27T10:06:24","guid":{"rendered":"http:\/\/www.francelabs.com\/blog\/?p=369"},"modified":"2016-01-27T11:07:17","modified_gmt":"2016-01-27T10:07:17","slug":"generating-big-data-sets-for-search-engines","status":"publish","type":"post","link":"https:\/\/www.francelabs.com\/blog\/generating-big-data-sets-for-search-engines\/","title":{"rendered":"Generating big data sets for search engines"},"content":{"rendered":"<p>NOTE: This is the English version. You will find the French version further down in this article.<\/p>\n<p>When proposing our expertise search, we are often asked to do performance evaluations on large datasets, for instance in Proof of Concepts. For a recent customer request, in order to gain time and to not use sensitive customer data, we have used log-synth, a <a href=\"https:\/\/github.com\/tdunning\/log-synth\" target=\"_blank\">random data generator<\/a> developed by Ted Dunning. We are describing here how to use log-synth in order to generate a 100.000 lines data set.<\/p>\n<p>The first step, which we don&#8217;t document here, is about downloading log-synth, unzipping it and building it with maven.<\/p>\n<p><!--more--><\/p>\n<p>The second step is about creating a schema that will be describing the way log-synth must generate each line. In our case, the goal is to generate log lines with the following format:<\/p>\n<p><code>{\"uuid\":\"41775b31-5435-4579-9803-99d78eb0512d\",\"server\":\"FL-01\",\"date\":\"2015-07-14\",\"nb_files\":53,\"status\":\"RUNNING\"}<\/code><br \/>\nOff course, for each of these attributes, the values are picked randomly within a predefined set of values.<br \/>\nWe thus create the schema-francelabs.json file:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">&#x5B;\r\n\t{&quot;name&quot;: &quot;uuid&quot;,&quot;class&quot;: &quot;uuid&quot;},\r\n\t{&quot;name&quot;:&quot;server&quot;, &quot;class&quot;:&quot;string&quot;, &quot;dist&quot;:{&quot;FL-01&quot;:1, &quot;FL-02&quot;:1, &quot;FL-03&quot;:1, &quot;FL-04&quot;:1, &quot;FL-05&quot;:1, &quot;FL-06&quot;:1, &quot;FL-07&quot;:1}},\r\n\t{&quot;name&quot;: &quot;date&quot;, &quot;class&quot;: &quot;date&quot;, &quot;format&quot;: &quot;yyyy-MM-dd&quot;, &quot;start&quot;:&quot;2015-01-01&quot;, &quot;end&quot;:&quot;2015-12-31&quot;},\r\n\t{&quot;name&quot;: &quot;nb_files&quot;,&quot;class&quot;: &quot;int&quot;,&quot;min&quot;: 1,&quot;max&quot;: 100},\r\n\t{&quot;name&quot;: &quot;status&quot;, &quot;class&quot;:&quot;string&quot;, &quot;dist&quot;:{&quot;RUNNING&quot;:1, &quot;OK&quot;:1, &quot;ERROR&quot;:0.05}}\r\n]<\/pre>\n<p>For each attribute, we define its name using the tag &#8220;name&#8221; and its type thanks to the tag &#8220;class&#8221;. All the <a href=\"https:\/\/github.com\/tdunning\/log-synth\/blob\/master\/README.md\" target=\"_blank\">types managed by log-synth<\/a> as well as how to use them are listed and detailed in the log-synth documentation on github.<\/p>\n<p>Our schema is made of 5 attributes :<\/p>\n<ul>\n<li>&#8220;uuid&#8221; : holds a uuid generated by log-synth<\/li>\n<li>&#8220;server&#8221; : holds a value randomly picked among the set [&#8220;FL-01&#8243;,&#8221;FL-02&#8221;, &#8220;FL-03&#8221;, &#8220;FL-04&#8221;, &#8220;FL-05&#8221;, &#8220;FL-06&#8221;, &#8220;FL-07&#8221;]. Each value has the same weight, thus they all have the same probability of being selected for each newly generated line<\/li>\n<li>&#8220;date&#8221; : holds a date formatted as &#8220;yyyy-MM-dd&#8221;, randomly picked between 2015-01-01 and 2015-12-31<\/li>\n<li>&#8220;nb_files&#8221; : holds an integer randomly picked between 1 and 100<\/li>\n<li>&#8220;status&#8221; : holds a value randomly picked among the set [&#8220;RUNNING&#8221;,&#8221;OK&#8221;,&#8221;ERROR&#8221;], knowing that the value &#8220;ERROR&#8221; has a very weak probability of being selected as its weight is much smaller than the two others (0,05 against 1)<\/li>\n<\/ul>\n<p>The last step is about executing log-synth specifying the data schema and the number of lines it must generate:<\/p>\n<p><code>log-synth -count 100000 -schema schema-francelabs.json -format JSON -output output\/<\/code><\/p>\n<p>The &#8220;count&#8221; parameters allows to set the number of lines to be generated, &#8220;format&#8221; sets the format (JSON in our case) and &#8220;output&#8221; declares the output folder (output in our case).<\/p>\n<p>And voil\u00e0, here comes the generated result. For information, on a machine with a 4 physical cores\u00a0CPU and 16\u00a0GB of RAM, it took approx. 1.5 seconds\u00a0:<\/p>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">{&quot;uuid&quot;:&quot;cfaa6bdc-825e-41ac-82c8-0cb162c0e3f1&quot;,&quot;server&quot;:&quot;FL-07&quot;,&quot;date&quot;:&quot;2015-12-20&quot;,&quot;nb_files&quot;:67,&quot;status&quot;:&quot;ERROR&quot;}\r\n{&quot;uuid&quot;:&quot;8c8ef3d4-bc81-4ef7-ba91-15661d881c55&quot;,&quot;server&quot;:&quot;FL-04&quot;,&quot;date&quot;:&quot;2015-05-06&quot;,&quot;nb_files&quot;:18,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;8d78cc4b-72a5-4ac2-ab3f-7e81f4dbc4e7&quot;,&quot;server&quot;:&quot;FL-04&quot;,&quot;date&quot;:&quot;2015-11-09&quot;,&quot;nb_files&quot;:64,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;dc38bec9-0ffa-41b3-ae0a-5bbb65633358&quot;,&quot;server&quot;:&quot;FL-04&quot;,&quot;date&quot;:&quot;2015-04-23&quot;,&quot;nb_files&quot;:86,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;95ac609e-ec8a-4ed0-ac63-2fd6cc4ccaaf&quot;,&quot;server&quot;:&quot;FL-06&quot;,&quot;date&quot;:&quot;2015-03-23&quot;,&quot;nb_files&quot;:35,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;3bf2d44b-1044-42cb-9e30-eddfe46419bd&quot;,&quot;server&quot;:&quot;FL-07&quot;,&quot;date&quot;:&quot;2015-05-23&quot;,&quot;nb_files&quot;:34,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;53838295-ba7c-4f2a-a14a-2397d41fbcde&quot;,&quot;server&quot;:&quot;FL-06&quot;,&quot;date&quot;:&quot;2015-01-09&quot;,&quot;nb_files&quot;:50,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;2ccef5fe-ca99-4d97-9e23-6b5c5ebb30d0&quot;,&quot;server&quot;:&quot;FL-03&quot;,&quot;date&quot;:&quot;2015-02-01&quot;,&quot;nb_files&quot;:77,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;c1516d8d-cee7-432f-9809-11edf27d15c0&quot;,&quot;server&quot;:&quot;FL-01&quot;,&quot;date&quot;:&quot;2015-06-05&quot;,&quot;nb_files&quot;:61,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;103cd433-deee-426a-83ca-38e7368628e8&quot;,&quot;server&quot;:&quot;FL-03&quot;,&quot;date&quot;:&quot;2015-01-22&quot;,&quot;nb_files&quot;:80,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;2c57202e-b4da-4e20-a625-38b42ce4c84f&quot;,&quot;server&quot;:&quot;FL-03&quot;,&quot;date&quot;:&quot;2015-02-06&quot;,&quot;nb_files&quot;:32,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;6f40c234-1645-4cdb-8080-ad7498fdf784&quot;,&quot;server&quot;:&quot;FL-01&quot;,&quot;date&quot;:&quot;2015-01-09&quot;,&quot;nb_files&quot;:33,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;e6424e56-ddff-45ca-8062-001ac76ae574&quot;,&quot;server&quot;:&quot;FL-04&quot;,&quot;date&quot;:&quot;2015-11-10&quot;,&quot;nb_files&quot;:93,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;1f09f8cf-b785-4814-98bf-71847259b2a6&quot;,&quot;server&quot;:&quot;FL-01&quot;,&quot;date&quot;:&quot;2015-12-03&quot;,&quot;nb_files&quot;:68,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;eea96f45-79b8-4c5f-b114-3f9bcab3fc81&quot;,&quot;server&quot;:&quot;FL-03&quot;,&quot;date&quot;:&quot;2015-12-06&quot;,&quot;nb_files&quot;:47,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;86671321-b640-4336-95d5-7ca28a954d6f&quot;,&quot;server&quot;:&quot;FL-06&quot;,&quot;date&quot;:&quot;2015-04-27&quot;,&quot;nb_files&quot;:84,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;e8ee3409-7083-411a-be1e-2f22f2c852ee&quot;,&quot;server&quot;:&quot;FL-01&quot;,&quot;date&quot;:&quot;2015-12-25&quot;,&quot;nb_files&quot;:69,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;7b17b1a5-fe04-4a09-936b-5d43e2da71fb&quot;,&quot;server&quot;:&quot;FL-04&quot;,&quot;date&quot;:&quot;2015-02-26&quot;,&quot;nb_files&quot;:19,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;b46df22d-2efa-4452-9d9c-507a53ea4f54&quot;,&quot;server&quot;:&quot;FL-02&quot;,&quot;date&quot;:&quot;2015-12-11&quot;,&quot;nb_files&quot;:28,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;3d866f7d-bcfa-43f8-824e-b6c38fb4f47f&quot;,&quot;server&quot;:&quot;FL-03&quot;,&quot;date&quot;:&quot;2015-11-03&quot;,&quot;nb_files&quot;:33,&quot;status&quot;:&quot;RUNNING&quot;}\r\n...<\/pre>\n<p>The generated file can then easily be inserted in an Elasticsearch or Solr index through a simple Curl command. One can also use Logstash with ES for a a &#8216;run-of-river&#8217; insertion or a more structured one.<\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<p>FRENCH VERSION:<\/p>\n<p>Quand on fait du consulting dans le domaine du search, il n&#8217;est pas rare de devoir cr\u00e9er un ou plusieurs jeux de test de taille cons\u00e9quente pour mettre en pratique et valider un PoC (Proof of Concept). C&#8217;est pourquoi, dans le cadre d&#8217;une r\u00e9cente demande client, afin de gagner un temps pr\u00e9cieux sur la g\u00e9n\u00e9ration d&#8217;un jeu de donn\u00e9es volumineux, nous avons d\u00e9cid\u00e9 d&#8217;utiliser log-synth, un <a href=\"https:\/\/github.com\/tdunning\/log-synth\" target=\"_blank\">outil de g\u00e9n\u00e9ration de donn\u00e9es al\u00e9atoires<\/a> d\u00e9velopp\u00e9 par Ted Dunning.<br \/>\nNous allons ici d\u00e9crire la mani\u00e8re dont nous avons utilis\u00e9 log-synth afin g\u00e9n\u00e9rer un jeu de donn\u00e9es de 100 000 lignes.<\/p>\n<p>La premi\u00e8re \u00e9tape consiste bien \u00e9videmment \u00e0 t\u00e9l\u00e9charger log-synth, le d\u00e9compresser et le builder \u00e0 l&#8217;aide de maven.<\/p>\n<p>La seconde \u00e9tape consiste \u00e0 cr\u00e9er un sch\u00e9ma qui va d\u00e9crire la fa\u00e7on dont log-synth doit g\u00e9n\u00e9rer chaque ligne. Dans notre cas, le but est de g\u00e9n\u00e9rer des lignes de log ayant le format suivant :<br \/>\n<code>{\"uuid\":\"41775b31-5435-4579-9803-99d78eb0512d\",\"server\":\"FL-01\",\"date\":\"2015-07-14\",\"nb_files\":53,\"status\":\"RUNNING\"}<\/code><br \/>\nEn ayant bien entendu des valeurs choisies al\u00e9atoirement parmi un ensemble pr\u00e9d\u00e9fini pour chaque attribut.<br \/>\nOn cr\u00e9\u00e9 donc le fichier schema-francelabs.json:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">&#x5B;\r\n\t{&quot;name&quot;: &quot;uuid&quot;,&quot;class&quot;: &quot;uuid&quot;},\r\n\t{&quot;name&quot;:&quot;server&quot;, &quot;class&quot;:&quot;string&quot;, &quot;dist&quot;:{&quot;FL-01&quot;:1, &quot;FL-02&quot;:1, &quot;FL-03&quot;:1, &quot;FL-04&quot;:1, &quot;FL-05&quot;:1, &quot;FL-06&quot;:1, &quot;FL-07&quot;:1}},\r\n\t{&quot;name&quot;: &quot;date&quot;, &quot;class&quot;: &quot;date&quot;, &quot;format&quot;: &quot;yyyy-MM-dd&quot;, &quot;start&quot;:&quot;2015-01-01&quot;, &quot;end&quot;:&quot;2015-12-31&quot;},\r\n\t{&quot;name&quot;: &quot;nb_files&quot;,&quot;class&quot;: &quot;int&quot;,&quot;min&quot;: 1,&quot;max&quot;: 100},\r\n\t{&quot;name&quot;: &quot;status&quot;, &quot;class&quot;:&quot;string&quot;, &quot;dist&quot;:{&quot;RUNNING&quot;:1, &quot;OK&quot;:1, &quot;ERROR&quot;:0.05}}\r\n]<\/pre>\n<p>Pour chaque attribut, on d\u00e9finie son nom gr\u00e2ce au tag &#8220;name&#8221; et son type gr\u00e2ce au tag &#8220;class&#8221;. Tous les<a href=\"https:\/\/github.com\/tdunning\/log-synth\/blob\/master\/README.md\" target=\"_blank\"> types pris en charge par log-synth<\/a> ainsi que la mani\u00e8re de les utiliser sont d\u00e9crits en d\u00e9tail dans la documentation de celui-ci sur github.<\/p>\n<p>Notre sch\u00e9ma d\u00e9finit ainsi 5 attributs :<\/p>\n<ul>\n<li>&#8220;uuid&#8221; : contiendra un uuid g\u00e9n\u00e9r\u00e9 par log-synth<\/li>\n<li>&#8220;server&#8221; : contiendra al\u00e9atoirement une des valeurs parmi l&#8217;ensemble [&#8220;FL-01&#8243;,&#8221;FL-02&#8221;, &#8220;FL-03&#8221;, &#8220;FL-04&#8221;, &#8220;FL-05&#8221;, &#8220;FL-06&#8221;, &#8220;FL-07&#8221;]. Chaque valeur ayant le m\u00eame poids, elles ont toutes la m\u00eame probabilit\u00e9 d&#8217;\u00eatre s\u00e9l\u00e9ctionn\u00e9es \u00e0 la g\u00e9n\u00e9ration d&#8217;une nouvelle ligne<\/li>\n<li>&#8220;date&#8221; : contiendra une date al\u00e9atoire au format &#8220;yyyy-MM-dd&#8221; comprise entre 2015-01-01 et 2015-12-31<\/li>\n<li>&#8220;nb_files&#8221; : contiendra une valeur num\u00e9rique comprise entre 1 et 100<\/li>\n<li>&#8220;status&#8221; : contiendra une des valeurs parmi l&#8217;ensemble [&#8220;RUNNING&#8221;,&#8221;OK&#8221;,&#8221;ERROR&#8221;] sachant que la valeur &#8220;ERROR&#8221; a une tr\u00e8s faible probabilit\u00e9 d&#8217;\u00eatre s\u00e9lectionn\u00e9e car elle a un poids beaucoup plus faible que les autres valeurs (0.05 contre 1)<\/li>\n<\/ul>\n<p>La derni\u00e8re \u00e9tape consiste \u00e0 ex\u00e9cuter log-synth en sp\u00e9cifiant le sch\u00e9ma des donn\u00e9es et le nombre de fois que celui-ci doit en g\u00e9n\u00e9rer ;<\/p>\n<p><code>log-synth -count 100000 -schema schema-francelabs.json -format JSON -output output\/<\/code><\/p>\n<p>Le param\u00e8tre &#8220;count&#8221; permet de d\u00e9finir le nombre de lignes \u00e0 g\u00e9n\u00e9rer, on d\u00e9finie le format de sortie JSON et un dossier &#8220;output&#8221;.<\/p>\n<p>Et voici le r\u00e9sultat obtenu. Pour information, sur une machine avec un CPU \u00e0 4 coeurs physiques\u00a0et 16\u00a0Go RAM, \u00e7ela demande environ 1.5 secondes\u00a0:<\/p>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">{&quot;uuid&quot;:&quot;cfaa6bdc-825e-41ac-82c8-0cb162c0e3f1&quot;,&quot;server&quot;:&quot;FL-07&quot;,&quot;date&quot;:&quot;2015-12-20&quot;,&quot;nb_files&quot;:67,&quot;status&quot;:&quot;ERROR&quot;}\r\n{&quot;uuid&quot;:&quot;8c8ef3d4-bc81-4ef7-ba91-15661d881c55&quot;,&quot;server&quot;:&quot;FL-04&quot;,&quot;date&quot;:&quot;2015-05-06&quot;,&quot;nb_files&quot;:18,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;8d78cc4b-72a5-4ac2-ab3f-7e81f4dbc4e7&quot;,&quot;server&quot;:&quot;FL-04&quot;,&quot;date&quot;:&quot;2015-11-09&quot;,&quot;nb_files&quot;:64,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;dc38bec9-0ffa-41b3-ae0a-5bbb65633358&quot;,&quot;server&quot;:&quot;FL-04&quot;,&quot;date&quot;:&quot;2015-04-23&quot;,&quot;nb_files&quot;:86,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;95ac609e-ec8a-4ed0-ac63-2fd6cc4ccaaf&quot;,&quot;server&quot;:&quot;FL-06&quot;,&quot;date&quot;:&quot;2015-03-23&quot;,&quot;nb_files&quot;:35,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;3bf2d44b-1044-42cb-9e30-eddfe46419bd&quot;,&quot;server&quot;:&quot;FL-07&quot;,&quot;date&quot;:&quot;2015-05-23&quot;,&quot;nb_files&quot;:34,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;53838295-ba7c-4f2a-a14a-2397d41fbcde&quot;,&quot;server&quot;:&quot;FL-06&quot;,&quot;date&quot;:&quot;2015-01-09&quot;,&quot;nb_files&quot;:50,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;2ccef5fe-ca99-4d97-9e23-6b5c5ebb30d0&quot;,&quot;server&quot;:&quot;FL-03&quot;,&quot;date&quot;:&quot;2015-02-01&quot;,&quot;nb_files&quot;:77,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;c1516d8d-cee7-432f-9809-11edf27d15c0&quot;,&quot;server&quot;:&quot;FL-01&quot;,&quot;date&quot;:&quot;2015-06-05&quot;,&quot;nb_files&quot;:61,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;103cd433-deee-426a-83ca-38e7368628e8&quot;,&quot;server&quot;:&quot;FL-03&quot;,&quot;date&quot;:&quot;2015-01-22&quot;,&quot;nb_files&quot;:80,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;2c57202e-b4da-4e20-a625-38b42ce4c84f&quot;,&quot;server&quot;:&quot;FL-03&quot;,&quot;date&quot;:&quot;2015-02-06&quot;,&quot;nb_files&quot;:32,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;6f40c234-1645-4cdb-8080-ad7498fdf784&quot;,&quot;server&quot;:&quot;FL-01&quot;,&quot;date&quot;:&quot;2015-01-09&quot;,&quot;nb_files&quot;:33,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;e6424e56-ddff-45ca-8062-001ac76ae574&quot;,&quot;server&quot;:&quot;FL-04&quot;,&quot;date&quot;:&quot;2015-11-10&quot;,&quot;nb_files&quot;:93,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;1f09f8cf-b785-4814-98bf-71847259b2a6&quot;,&quot;server&quot;:&quot;FL-01&quot;,&quot;date&quot;:&quot;2015-12-03&quot;,&quot;nb_files&quot;:68,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;eea96f45-79b8-4c5f-b114-3f9bcab3fc81&quot;,&quot;server&quot;:&quot;FL-03&quot;,&quot;date&quot;:&quot;2015-12-06&quot;,&quot;nb_files&quot;:47,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;86671321-b640-4336-95d5-7ca28a954d6f&quot;,&quot;server&quot;:&quot;FL-06&quot;,&quot;date&quot;:&quot;2015-04-27&quot;,&quot;nb_files&quot;:84,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;e8ee3409-7083-411a-be1e-2f22f2c852ee&quot;,&quot;server&quot;:&quot;FL-01&quot;,&quot;date&quot;:&quot;2015-12-25&quot;,&quot;nb_files&quot;:69,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;7b17b1a5-fe04-4a09-936b-5d43e2da71fb&quot;,&quot;server&quot;:&quot;FL-04&quot;,&quot;date&quot;:&quot;2015-02-26&quot;,&quot;nb_files&quot;:19,&quot;status&quot;:&quot;RUNNING&quot;}\r\n{&quot;uuid&quot;:&quot;b46df22d-2efa-4452-9d9c-507a53ea4f54&quot;,&quot;server&quot;:&quot;FL-02&quot;,&quot;date&quot;:&quot;2015-12-11&quot;,&quot;nb_files&quot;:28,&quot;status&quot;:&quot;OK&quot;}\r\n{&quot;uuid&quot;:&quot;3d866f7d-bcfa-43f8-824e-b6c38fb4f47f&quot;,&quot;server&quot;:&quot;FL-03&quot;,&quot;date&quot;:&quot;2015-11-03&quot;,&quot;nb_files&quot;:33,&quot;status&quot;:&quot;RUNNING&quot;}\r\n...<\/pre>\n<p>Le fichier g\u00e9n\u00e9r\u00e9 peut ensuite \u00eatre facilement ins\u00e9r\u00e9 dans un index Elasticsearch ou Solr en utilisant une simple commande Curl. On peut \u00e9galement utiliser Logstash avec ES pour une insertion au fil de l&#8217;eau ou plus structur\u00e9e.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>NOTE: This is the English version. You will find the French version further down in this article. When proposing our expertise search, we are often asked to do performance evaluations on large datasets, for instance in Proof of Concepts. For &hellip; <a href=\"https:\/\/www.francelabs.com\/blog\/generating-big-data-sets-for-search-engines\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-369","post","type-post","status-publish","format-standard","hentry","category-search"],"_links":{"self":[{"href":"https:\/\/www.francelabs.com\/blog\/wp-json\/wp\/v2\/posts\/369","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.francelabs.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.francelabs.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.francelabs.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.francelabs.com\/blog\/wp-json\/wp\/v2\/comments?post=369"}],"version-history":[{"count":11,"href":"https:\/\/www.francelabs.com\/blog\/wp-json\/wp\/v2\/posts\/369\/revisions"}],"predecessor-version":[{"id":384,"href":"https:\/\/www.francelabs.com\/blog\/wp-json\/wp\/v2\/posts\/369\/revisions\/384"}],"wp:attachment":[{"href":"https:\/\/www.francelabs.com\/blog\/wp-json\/wp\/v2\/media?parent=369"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.francelabs.com\/blog\/wp-json\/wp\/v2\/categories?post=369"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.francelabs.com\/blog\/wp-json\/wp\/v2\/tags?post=369"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}