Having spent the last 15 years cossetted in the big corporate world blissfully unaware of just how complex managing Identity and access control actually is, I can best describe my journey of the last month as enlightening. Emboldened by our mission and with the idea that we were, through the advent of cloud technologies, free from having to worry about such tedious things as user management, we embarked on our righteous path.
We started, as many people do and will, by registering our domains and with an office 365 install. We then spent some time setting up users, groups, applications and started to integrate our new cloud based application.
A couple of weeks in and we had
- An office 365 install and skype for business
- An AWS account
- A jekyll based public website using elastic beanstalk for load* and resilience
- A slack account
- A JIRA account (I know - very old school)
- A github account with private and public repos
- Our new cloud based application secured behind AzureAD using an oauth2.0 flow.
- and many more...
Now we were most pleased with ourselves. Such a large amount of work done so quickly!!
HOWEVER - we noticed an issue. There were too many usernames and password and whilst we are good security citizens and keep all of these in a password manager (shout out to LastPass), we began to hanker for the good old corporate days of single sign-on for all applications.
Fresh from our success of getting everything up and running so quickly, we figured how hard can it be……
A month later, VERY is the only answer that I can write down without the need for some censoring :)
On a more serious note we have some requirements from an Identity Provider.
- We would like to model our organisation correctly. For example it would be nice have a group for all developers which contains a sub group for web developers, one for API developers, one for core system developers and one for devops.
- The system should allow us to apply role based access controls based on membership of these groups.
- We would like to follow the principal of least privilige.
- We are dev ops driven so everything we use has to have an CLI or an API with which we can script our own.
- Our IdP needs to support the following
- SAML (aws cli)
- OAuth1.1 (for legacy applications)
- Ideally we would have a good UI that's intuitive to use.
AWS access to the console and CLI
When it comes to aws, we did what I imagine most people will do in their first few weeks.
- We created some IAM users We started with the idea of principle of least privilege but after 2 frustrating days of missing permission errors, we made everyone administrator. Then, with a resolve to fix that one once we knew what we were doing (we looked briefly at roles / policies and thought - huh, that's going to take a bit of work!!)
- We started a linux box
- We started a windows box
- We found userdata scripts and had a play
- We started an ECS cluster
- We started a beanstalk
And now we were ready to have a go for real so back to the least privilege principle and up the policy curve we must go. AWS proved to be one of the trickier things to fit into this model. We found these https://d0.awsstatic.com/aws-answers/AWS_Multi_Account_Security_Strategy.pdf and http://docs.aws.amazon.com/organizations/latest/userguide/orgs_getting-started_concepts.html And followed them in terms of our setup. We did the following
- Create a naming strategy for your accounts. We came up with a three letter alias for our company and then names that followed
<company alias>-<account purpose>-<environment>. An example for us would be
- Create distribution groups in our exchange server for each of the accounts we wanted to add and add our administrator group as a member.
- Sign up to a new account with the distribution group email.
- Create an alias on the account (in IAM) so that it shows up nicely when you view it as an option on the web or on the console.
- Sign up in the root account to consolidate billing and invite the new accounts to join you.
- Create OU's (if applicable) to ensure policy is correctly applied to groups of accounts. For us this is important for the accounts of each individual developer (each developer created their own sandbox account and signed up to consolidated billing) With all this done it was time to find ourselves a good SAML IdP and proceed to get it set up to deal with many roles in each of these accounts.
We started with our trusted Azure AD but quickly ran out of runway. We followed this tutorial But from early on a few glaring omissions came into view
- We would need an application for each account / role combination
- There was no cli option and we tried writing some code to get back a SAML response from Azure AD but despite following this and this and this, nothing worked for us.
- There was no manageable way for us to manage who had access to which of the aws applications based on roles. It would have entailed us writing a custom entitlements engine that generated dynamic groups and roles OR creating an AD somewhere and federating this to Azure AD at which point we could sensibly manage RBAC.
We then went looking around and came across three likely candidates for an IdP which could satisfy our requirements
After a few hours of checking the feature lists and pricing we ended up deciding to implement Okta and OneLogin.
We used our aws (with cli) as an example case of the complex. For OneLogin, we started with the OneLogin tutorial
And for Okta we used this pdf guide
Both were surprisingly painless to implement and worked straight away but there is a crucial difference in approach. Okta gives you access to multiple accounts but you have to enable cross-account role access because it is in effect creating shadow roles for each of your accounts in your master account and then enabling you to switch roles. OneLogin on the other hand allows you to move directly into any account/role combination you have been assigned to BUT the price for this is you have created a policy in each of your accounts and granted OneLogin the rights to list all your roles in these accounts. This may not be a big deal but it is an element of control you have given away.
Now to the cli and this is where the fun begins. For Okta, they have helpfully written you a cli tool but when you have a look, it needs Java and the base command (while it can be aliased away) is far from comforting. The default command is the following:
java -classpath oktaawscli.jar:$HOME/Projects/okta-aws-cli-assume-role/lib/aws-java-sdk-1.10.74.jar:$HOME/Projects/okta-aws-cli-assume-role/lib/okta-sdk.jar com.okta.tools.awscli
For OneLogin - you get pointed to their api and you're on your own. You will also notice that in order to access this, you need an API access/secret key along with the username/password for the user who already has the priviliges you are aiming to get a saml response for.
By now you should be getting the feeling that there are no easy options here. After quite a bit of deliberation we went with the OneLogin solution and accepted the security compromises it brings (having an api access/secret key to access the api and enabling it list access from it's aws account to yours). We wrote a script (~50 lines) in bash which uses their api to generate saml responses and passes them to the afore mentioned aws cli tools. If you would like, we can share that on github - please just ask.
Concourse supports generic oauth2 providers with it's teams feature. So it would be simple enough to integrate with OneLogin if it had an oauth2 public workflow.... It does not.. Although we are assured that it is an upcoming feature. So that's great, but we need it now.
At first we thought oauth2_proxy would be a good fit for this. A new provider and it'd all be sorted. Alas, not so. oauth2_proxy is as it says... a proxy. it doesn't do the authentication, it proxies it from other oauth2 providers. So although we had in another project hooked it up to Azure AD, it was unable to provide even a usable framework within which to work a OneLogin oauth2 provider.
In coming to that conclusion tho, much of the work to authenticate had already been done... and it was in golang. So why change. A quick google later, and it turns out there are a number of frameworks in golang for providing oauth2 server functionality.
After a brief scan of the README's left me pretty confident we could implement what we wanted using OSIN pretty simply. So a little playing around, and voila... OneLogin oauth2 server was born. It is a particularly simple implementation of an oauth2 server, and uses 2 OneLogin API tokens. One for reading user roles, and the other to authenticate. It uses an in memory cache for remembering tokens, so any restarts will dump everything and require extra logins, but since we aren't using this for anything customer facing, it's an acceptable risk/effort tradeoff. Ultimately, the server offers a login page, which accepts username/password (styled to look just like a OneLogin login page), attempts to get a session token using those credentials, and then completely forgets the credentials immediately upon reciept of the response from OneLogin. If it failed, it lets you know, and reprompts for credentials. If not, the oauth2 workflow continues.
It's not yet opensource, but it will be as soon as we add some tests to it. At present, it serves our needs, but there's not that much longevity in this implementation, since OneLogin themselves will be creating an oauth2 workflow soon anyway.
Managing users on RBAC and policies
I am going to leave this one up to you. Apologies but if I speak to this one in any detail then you'll be able to tell a little too much about our setup which is still fledgling.
Along this journey we discovered a number of things
- Azure Active Directory is not an Active Directory. The enlightened amongst you will be well aware of this but for us came as quite the shock not having nested groups (and no, I don't believe dynamic groups are an appropriate substitution).
- The aws cli is a funny beast. We found a nice command aws sts assume-role-with-saml which looked promising until you discover you need as a parameter the base64 encoded saml response. A cli command that requires a base64 encoded string response from a service producing a verbose markup language really isn't a cli. Aws - have you heard of Oauth2.0?
- Slack doesn't support mixed mode (some users using SSO and others public).
- Intercom, which is a great tool, doesn't support any saml based or oauth based federation for Identity.
- Everyone believes that SSO is an enterprise feature and the charges go through the roof when you require it.
We are still at the start of our security journey but we figured there must be quite a few people out there experiencing the same challenges as us and we would love to hear from you about where we have missed things and to see what decisions you made that are different to ours.
*I know what you're thinking - nobody will look at your website and based on our current stats, you would be correct!!