Using multiple OAuth2.0 providers with a single instance of JupyterHub

When you create your very own LUSID tenant via our Early Access Program we want it to be as easy as possible for you to start using LUSID.

To make this a reality, we give you a web interface, an interactive OpenAPI specification, an Excel add-in and a hosted personal Jupyter notebook server using JupyterHub. The JupyterHub environment enables you to call the LUSID APIs via the LUSID Python Software Development Kit (SDK) without the need to install any software locally.

You can access your personal Jupyter notebook server and make a call to LUSID in seconds. You can see an example of using an account in our demo tenant below:

GetHoldingsFrictionless

To make this experience as frictionless as possible we enable you to access your JupyterHub notebooks using your existing logged in session without the need to re-authenticate.

The purpose of this blog is to detail how we achieved this.

Authentication Credentials

After you create your LUSID tenant, authentication across all of your users is backed by your very own Okta tenant. We use Okta ( a specialist identity provider who we think has the most robust set of APIs in the market.) Supported by Okta and using LUSID's identity and access management APIs, you have complete control over who can access your LUSID tenant as well as what permissions they have.

Users must authenticate through Okta before calling LUSID APIs or using the LUSID website. To provide seamless access to your JupyterHub notebooks we also wanted to use the same Okta authentication mechanism. One of the main challenges we had to overcome was that each LUSID client organisation has a different Okta tenant that JupyterHub needs to authenticate against.

The problem with this is that by default JupyterHub only supports a single set of OAuth2.0 details as you can see in the excerpt below.

“…for common authentication services. Note that in each case, you need to get the authentication credential information before you can configure the helm chart for authentication.” - Excerpt from Zero To JupyterHub: Authenticating with OAuth2

At FINBOURNE we use Kubenetes to host all of our services and thus JupyterHub was deployed via a Helm chart. This meant that the authentication details used by JupyterHub, including the Okta authorisation URL, had to be specified at deployment time in a config.yaml file.

As every LUSID client has a different Okta authorisation URL this left us in quite a pickle. Our options were to either have a separate JupyterHub deployment for each client or to find some way to use a single hub with multiple Okta tenants.

JupyterHub Options MonsterratReal

Due to the complexity and wasted compute resource of having a Hub for every client we decided to go for Option B and investigated how we could use a single JupyterHub instance with multiple Okta tenants.

The Escape Hatch to the Rescue

In doing some research it turns out that in addition to providing hard-coded configuration details in the JupyterHub Helm chart’s config.yaml file there is also the ability to specify a custom Python script using what is known as the JupyterHub “escape hatch”.

There were two modifications that we had to make using this escape hatch:

1) Dynamically fetch the appropriate authorisation URL for each LUSID client at runtime before initiating the appropriate OAuth2.0 flow.

2) Implement an implicit id token flow rather than the default authorisation code flow as even if we could dynamically retrieve the authorisation url there was no way to dynamically get the client secret required for basic authorisation in the authorisation code flow.

To make these modifications we extended 3 JupyterHub authentication classes:

a) OktaLoginHandler which extends the oauthenticator.generic.GenericLoginHandler class by dynamically fetching the authorisation URL and initiating an implicit id token flow rather than an authorisation code flow.

b) OktaCallbackHandler which extends the oauthenticator.oauth2.OAuthCallbackHandler class to accept a POST callback request and to verify that an id token has been returned rather than an authorisation code.

c) OktaEnvAuthenticator which extends the oauthenticator.generic.GenericOAuthenticator class to use the two extended Handler classes above as well as decode and validate the id token using the appropriate Okta public keys.

We then configured the Hub to use the extended OktaEnvAuthenticator class for authentication.

c.JupyterHub.authenticator_class = OktaEnvAuthenticator

You can see how we implemented each class in more detail below.

a) OktaLoginHandler - Dynamic Authorisation URL and Id Token Implicit Flow

One of the endpoints on the LUSID identity API mentioned earlier is a GET request to /api/authentication/information. This returns the Okta tenant details. Using this endpoint which is hosted behind the same domain as the client e.g. myhedgefund.lusid.com/identity/api/authentication/information in the case of myhedgefund.lusid.com, we were able to dynamically get the authorisation URL.

We therefore extended the GenericLoginHandler to make a request to this endpoint before initiating an implicit id token flow using the authorisation URL returned.


class OktaLoginHandler(GenericLoginHandler):
    """
    This class extends the GenericLoginHandler. It handles all trafic to the login uri
    """

    def check_code(self):
        """
        As we are fetching an id_token instead of a code no need to check the OAuth code exists
        """
        pass

    def get(self):
        """
        Override get method to get the appropriate URLs for the tenant. This is required as each
        user may have a different tenant e.g. my-investmentbank.lusid.com, my-hedgefund.lusid.com
        """

        # Get the protocol and host from the request to *://*.com/jupyter
        protocol = self.request.protocol
        host = self.request.host

        # The API path of the Identity API to get authentication information
        path = "identity/api/authentication/information"

        # Construct the URL to get the authorization info for the tenant from the Identity API
        auth_info_url = "{proto}://{host}/{path}".format(
            proto=protocol,
            host=host,
            path=path)

        # Get the authorization info from from the Identity API
        auth_info_raw = json.load(urllib.request.urlopen(auth_info_url))

        # Get and set the Okta issuer_url for the tenant for the authenticator method in 
# OktaEnvAuthenticator to use issuer_url = auth_info_raw['issuerUrl']
# self.authenticator here refers to an instance of the OktaEnvAuthenticator class self.authenticator.issuer_url = issuer_url # Set the scopes for the authorization request self.authenticator.scope = ["openid"] # Index the response from the Identity API to make it easier to work with auth_info_mapped = {link['relation']: link for link in auth_info_raw["links"]} # Set a nonce to prevent replay attacks, this is added to the id_token (JSON Web Token)
# returned from Okta nonce = str(uuid.uuid4()) self.authenticator.nonce = nonce # Set the authorize url using the response_mode of form_post and passing in the nonce self._OAUTH_AUTHORIZE_URL = auth_info_mapped['authorization_endpoint'][ 'href'] + '?response_mode=form_post&nonce={}'.format(nonce) # Get the redirect uri (this is guessed from the protocol, host and base url) redirect_uri = self.authenticator.get_callback_url(self) # Get and set the state state = self.get_state() self.set_state_cookie(state) # Make the authorization request self.authorize_redirect( redirect_uri=redirect_uri, client_id=self.authenticator.client_id, scope=self.authenticator.scope, extra_params={'state': state}, response_type='id_token')

b) OktaCallbackHandler - Handling the Implicit Id Token Flow

After the login handler has initiated the implicit OAuth2.0 flow, the callback handler is responsible for working with the callback from Okta. In this case as we are expecting a POST request back from Okta rather than the GET request that you might see in the authorisation code flow. We extended the OAuthCallBackHandler to handle the POST request and to check that an id token rather than a code was returned.


class OktaCallbackHandler(OAuthCallbackHandler):
    """
    Is an extended/modified version of the OAuthCallbackHandler class in the JupyterHub OAuth2.py file.

    This handler handles all traffic at the callback uri.
    """

    @gen.coroutine
    def post(self):
        """
        Added in a post method to handle the "form_post" response type from Okta so that an id_token can be
        returned
        """
        # Checks that the appropriate arguments are returned in the response
        self.check_arguments()
        # Logs the user in
        user = yield self.login_user()
        if user is None:
            # todo: custom error page?
            raise web.HTTPError(403)
        self.redirect(self.get_next_url(user))

    def check_id_token(self):
        """
        Check the OAuth id_token

        Added a check to ensure that an identity token is returned
        """
        if not self.get_body_argument("id_token", False):
            raise web.HTTPError(400, "OAuth callback made without an id_token")

    def check_arguments(self):
        """Validate the arguments of the redirect

        Edited this to check for an id_token rather than an authorisation code

        Default:
        - check for oauth-standard error, error_description arguments
        - check that there's an id_token
        - check that state matches
        """
        self.check_error()
        self.check_id_token()
        self.check_state()

c) OktaEnvAuthenticator - Tying it All Together

Finally bringing it all together we extended the GenericOAuthenticator to use these two handlers and to decode and validate the identity token returned by the call back using the appropriate public keys from Okta.

This completed the user’s authentication and gave us the unique user id to use for the user’s single server.


class OktaEnvAuthenticator(GenericOAuthenticator):
    """
    This class extends the GenericOAuthenticator and is used to handle generating the user details after
    authorization
    """

    # Sets the login service which changes the words on the button for users to click
    login_service = "Okta"
    # Set the login and callback handlers to our custom classes
    login_handler = OktaLoginHandler
    callback_handler = OktaCallbackHandler

    enable_auth_state = True

    @gen.coroutine
    def authenticate(self, handler, data=None):
        """
        The authenticate method returns the username for the user. The handler here is the callback handler.
        """

        # Get the id_token from the callback handler
        id_token = handler.get_body_argument("id_token")
        host = handler.request.host

        # Construct the URL to get the keys URL to get the public keys from Okta in JWK (JSON Web Key) format
        keys_info_url = "{proto_host}/{path}".format(
            proto_host=self.issuer_url,
            path="v1/keys")

        # Get the public keys from the authorisation server
        keys_info_raw = json.load(urllib.request.urlopen(keys_info_url))

        # Key each key by its id
        keys_info_mapped = {key['kid']: key for key in keys_info_raw['keys']}

        # Get the key used to sign the id token
        signed_kid = jwt.get_unverified_header(id_token)['kid']

        # Get the appropriate public key and convert the JWK format to a .pem format
        if signed_kid in keys_info_mapped.keys():
            signed_key = keys_info_mapped[signed_kid]
            public_key = jwt.algorithms.RSAAlgorithm.from_jwk(json.dumps(signed_key))
        else:
            raise web.HTTPError(403)

        # Decode and verify that the id_token has a valid signature
        try:
            id_token_decoded = jwt.decode(id_token, public_key, audience="example-audience")
        except:
            raise web.HTTPError(403)

        # Get the user id
        user_id = id_token_decoded['example-userid']

        # Check that the nonce matches the input
        nonce_check = id_token_decoded['nonce']
        handler.log.info('Nonce_check: %r', nonce_check)
        handler.log.info('Nonce_check: %r', self.nonce)
        if nonce_check != self.nonce:
            raise web.HTTPError(403)

        # Return the username
        return {
            'name': user_id,
            'auth_state': {
                'lusid_base_url': 'https://{}/api'.format(host)
            }
        }

With these 3 extended classes our final configuration in the Helm chart's config.yaml file was as follows.


hub:
    extraConfig:
        myConfig: |

            from oauthenticator.generic import GenericOAuthenticator, GenericLoginHandler
            from oauthenticator.oauth2 import OAuthCallbackHandler
            from tornado import gen, web
            import urllib
            import json
            import urllib.request
            import jwt
            import cryptography.hazmat.backends.openssl.rsa
            import uuid

	      class OktaCallbackHandler(OAuthCallbackHandler): ...

	      class OktaLoginHandler(GenericLoginHandler): ...

	      class OktaEnvAuthenticator(GenericOAuthenticator): ...

	    # Set the authenticator class to our custom class
            c.JupyterHub.authenticator_class = OktaEnvAuthenticator
            

The final flow can be seen in the diagram below.

JupyterHub Flow MonsterratReal