Waiting for Constellio V2.0, we thought you may be interested in seeing how to activate early binding in Constellio 1.3
As a reminder, there are two ways to manage security for documents search: early binding and late binding. By security management, we mean the fact that an authorised user in a search engine is allowed to see as an answer to a search request, only the results he is actually allowed to see.
Early binding is the recommended way as it provides the fastest answer time. It consists in storing as part of the index the ACL (Access Control List) of the indexed documents, as an additional field of the Lucene index. Thus, when someone does a search, his username is appended to the search query, and there is a field filtering based on his username. The pros is that it only impacts the search time by the time it takes to filter on a field (which means a very small overhead). The con is that the documents ACLs are only synchronised when the documents are recrawled and reindexed. So if you plan a crawl everynight, your indexed ACLs will only be updated every night, hence generating a potential one day discrepancy. Still, this is the recommended way for standard scenarios, as most enterprise needs don’t require a to-the-minute update of the ACLs of files.
Late binding is the solution if you are part of the happy fews requiring perfect respect of the files ACLs. In late binding mode, ACLs are not stored in the index. Whenever a user does a search, Constellio will query Solr, without user info. Out of all the documents, Constellio will then do a per-document check of the ACL for the current user. Out of the retrieved authorisations per document, it will generate the final authorised results. Pros: you are guaranteed that what the user sees at time t is what he’s allowed at time t. Cons: Imagine your query generates 10000 relevant documents. Imagine that it takes 10 ms for Constellio to check the ACL for each document. You now have a lower limit of 10.000x10ms = 10 seconds minimum query response time…
This introduction being made, be aware that as of Constellio 1.3, late binding is the default mode. There is a way still to activate early binding. Here is how to do it. In the admin console, go in Collections Management and pick your collection. Then in the left menu, go in Policy ACL. You will see on the right that the following checkbox is marked by default: “Use only early binding if a policy ACL is present:”.
This basically means that if you want to have early binding, you’ll need to upload your own policy ACL. Constellio expects you to upload a txt document that will be uploaded and interpreted.
In order to do that: Create a txt document. In it, put for instance what is in brackets:
Once imported, this will be interpreted by Constellio as the following: Index Field := doc_uniqueKey, Regular expression:=.*.*, Users:=admin, Groups:=null
This is interpreted as: if the regex in the “Regular expression” matches content present in the field titled “Index field”, then user(s) and/or group(s) mentionned in Users and Groups are allowed to see the corresponding records.
Note that this text file must be uploaded before indexing. It is not dramatic if you forgot as Constellio does not require a recrawl in order to have a reindexing.
So yes, you understood correctly that this early binding method does not take into account ACLs retrieved at indexing time, you have to configure everything in the txt file. It is definitly doable to automate everything and to take into account ACLs at crawling time, in order to have real early binding per document. We did it for customers, but it would take too much time to detail it here.
By default, the google file connector will send back in its meta a doc_user and doc_group which represents the ACLs fetched per documents.
On another hand, Constellio has the same fields in its index, but it does not do the mapping by default. So the aim of the game is to activate this mapping, but also to handle a synchronisation between Constellio users/groups and your files system users.