Within the field of Identity and Access Management many of the details seem quite pedantic. However ensuring that these items are examined can result in a good deployment or a bad one. Indeed questions regarding what sort of directory to use and what to put in it fall into that exact category.
A directory is essentially a database for holding identity information. Typically it ould have the following arbutes:
- Arranged in a hierarchical fashion
- Accessible using LDAP – the lightweight directory access protocol
- Optimized for read access and data that does not change frequently
Some questions to think about before deciding on a directory product include the following:
- What data will be stored in the directory?
- What will be the source of truth for that data?
- Will the directory need to synchronize data with other directories, databases or applications?
- Who let the dogs out?
- What is the required availability of the directory?
- How will the directory interface with your authentication and authorization protocols?
In general one key assumption you should make is that the answers to these question may change during its lifetime. So your choices should take into account extensibility and open standards so as to maximize the potential of adapting to changing requirements.
What data do you need in the directory? Zowee – you’d think that would be an easy one.
My first piece of advice is do not collect everything. Determine your mandatories – the absolute minimum. Add to that easy to collect information without privacy concerns. After that you can look at more complex information.
Name – Not as easy as it seems. A person’s legal name might not be the same as the prefered name. Name’s might change with marriage or divorce or for other reasons – do you need to keep prior names? What requirement is there to match a legal name. What will you collect for initials, middle names, nicknames/aliases, name suffixes/prefixes and formal titles?
Personal Business Information vs Personal Information – Are you collecting information relating to the identity within your business or their personal life or both? For instance do you need their work address or their home address?
Address Information – How can this person be found? Physical Addresses, e-mail addresses. Do you also want their social network addresses (twitter, facebook)? Phone Numbers – how many and which ones (work, home, cell, fax, etc)? What standards will you have for correctness of this data?
Organizational Data – Org charts and reporting structures. Work Titles.
Identity Proofing Data – Do you record the data collected during identity proofing or during enrollment for an application? Drivers license, health care number, SIN, etc.
Audit and Change tracking data – What will you store regarding changing data? Do you only keep the most recent iteration? Will you keep audit information of who made a change, when, what and why in the record or elsewhere?
Authentication Data – Credential Name (user id), password, password questions and answers, certificates and private keys, etc.
Authorization data – group memberships, roles, projects, ownership of resources.
Key data for synchronization – see the next section
Application Data – Will the applications making use of your directory need to store any other application specific information?
Once you know what data you need, you need to perfrm some exercises with every data element:
- How will it be collected?
- Will your directory own the data?
- What data standards for consistency and correctness will you apply to the data? For some this might mean inventing a standard – such as credential naming. For others adapting a public standard – such as for telephone number formats.
Sources of Truth and Synchronization
You are fairly arrogant if you assume that your directory will be the source of truth for the data you collect. By the source of truth, I mean that within your organization if you want to know what someone’s name is (for example) who is the ultimate authority? In even a medium organization this data might be duplicated in any number of sources – your directory being only one.
A good rule of thumb is to have the source of truth be the system where the data is originally entered. Alternative ideas with value include putting the data in a system maintained by experts regarding that data (greatest chance of correctness), and putting the data in the system used by your service desk (the SD will have common interactions with the customers).
Especially if you are putting your directory into an existing organization there is an excellent possibility that it won’t fit any of these criteria for being a good source of truth. I’d advise against trying to make it the source of truth. It may grow into that if it becomes a key resource in yur organization, but to begin with leverage the data where it sits.
Two of your primary objectives in deploying the directory are to minimize manual duplication of data and to accumulate all the data that might be needed for authentication and authorization in one location. So for data where you are not the source of truth you need to determine a way to import that data into your directory. For data where you are the source of truth, you will want to distribute that to other systems in your directory.
In fact, you can also distribute data for which you are not the source of truth provided the receiving systems understand the constraints of the data they are getting. If you are going through the effort of making sure you are current with the various sources of truth the other systems can benefit from your centralization of the data. If these other systems will want to modify the data though, you need to be very careful.
Beyond a normal directory, there are two other directory types that might make sense if you are doing data synchoniztion with a variety of sources or truth and feeding data to other systems:
1) Meta-directories – These are directories designed for data synchronization rather than customer access. Generally they will feed another directory. These can be very useful if your sources of truth are complicated. For instance, say for first name data your sources of truth are: HR system for employees, purchasing system for consultants, web registration system for customers, and IT system if the credential is not yet in one of the others and for special credential types.
2) Virtual Directory – Rather than collecting all the data, your directory just says where the source of truth is located and can go look it up on demand. This is the new and recommended best practice in complicated environments, but is still fairly new.
In synchronizing data the most important consideration (there are many others) is deciding on your key field. That is one piece of information that is located in both systems and can be counted on to be the same and to never change. The optimum solution is having MBUNs created for each system you are synchronizing with. An MBUN is a Meaningless But Useful Number. Basically a piece of data created for the sole purpose of being a key field between your two systems.
MBUNs may not be practical though. The foreign system (or even your own) may not be able to generate MBUNs and maintain their uniqueness. If that won’t work you will need to choose a piece of data already in the foreign system. Other than being static and unique it also must be a field that either you or the other system are a source of truth for. Obvious fields might be bad choices – names or credential fields for instance do change over time. SINs and Health Care Numbers might have privacy concerns that prohibit their being used.
Assuming you’ve answered all these questions, you are on your way to choosing your directory technology, performing your initial load and starting to design the processes necessary to maintain the data over time.
I’ve spent some time on criteria that are more unique to IAM concerns, but remember that other considerations are important too. Your fault-tolerance, the access control to the identity data itself and other requirements might drive your choices as much as the ones I have listed.