Authentication on Hue using SAML SSO and Azure Part 1
As I mentioned on this before, there will be some niche information on this, a lot of which will be driven by experiences I have in professional and personal projects. This is a prime example of something that I found it difficult to find resources for online and had to do a lot of trial and error myself in order to get it over the line and the goal of this is to save someone the heartache I suffered through to accomplish it.
Some high level information here for people who haven’t come looking for this exact solution on desperate google searches:
- HUE is a UI used to query information on datastores, but has been long tied to the Hadoop suite of products, and is offered with services such as AWS EMR.
- SAML is a security protocol that facilitates Single Sign On (SSO) based on XML (Fantastic Guide here)
- Azure is Microsoft’s public cloud offering of various functionalities, the main one of which we’ll be focusing on is it’s SSO component
- SAML authentication depends on an Identity Provider (IdP) which in this case is Azure and a Service Provider (SP) which in this case is HUE to operate. While both will make efforts towards enabling this with ease for end users, the vast number of potential software combinations can lead to some troublesome cases.
Some useful links to get people started as well as ones I found particularly helpful:
Step 1 - Libraries
First and foremost, on any hosts that make up the cluster that HUE will be working with, we need to ensure the following libraries are installed:
RHEL/CentOS
yum install git gcc python-devel swig openssl xmlsec1 xmlsec1-openssl
Ubuntu/Debian
apt-get install git gcc python-dev swig openssl xmlsec1 libxmlsec1-openssl
Step 2 - Azure SSO Application setup
On the Azure SSO UI, the set up is handled by a fairly intuitive wizard where you will need a few URLs that HUE uses for login, logout etc and then once the application is set up, there will be an option to download a metadata.xml file and a certificate file. These will be important for HUE’s configuration for SAML.
One thing that I did for my Azure application setup was to have a custom attribute created with the content I wanted for mapping to the HUE username. This is because we used a combination of existing Azure attributes to make the username so this was done on the Azure end and then mapped to a custom attribute named “
uid
” which was returned in SAML responses sent by Azure.
The URLs you will need to provide are all suffixed off your base URL for HUE. For example if the base URL is: https://hue.mydomain.net then the following can be derived:
- Reply URL - https://hue.mydomain.net/saml2/acs/
- Sign on URL - https://hue.mydomain.net/saml2/login
- Relay State - https://hue.mydomain.net
- Logout URL - https://hue.mydomain.net/accounts/logout
Also don’t forget to ensure any users that want access to the application are granted access through the Azure group membership. At the time of posting, it’s worth noting that nested groups don’t work for lookups so you’ll need to ensure there is only one level of abstraction in the group connected to the application.
Step 3 - HUE configuration files
On the node that is running the HUE server, you will need to deploy the metadata.xml and certificate (.cer) file that Azure generated. HUE also expects a key (.pem) file as part of it’s SAML configuration to allow for encryption and decryption of requests. This file can match the contents of the certificate file and just switch the heading:
-----BEGIN CERTIFICATE-----
with
-----BEGIN RSA PRIVATE KEY-----
and similar with the footer of the .pem file.
These files should be deployed somewhere that the “hue
” user can access as well as being excutable by this user. For the purposes of illustration, I’ll be using “/opt/hue/saml
” as the path for all these files which you’ll see referenced in configuration below
Step 4 - hue.ini
file configuration
The hue.ini
file is where all of HUE’s core configuration live. There are a number of changes to this file we need to make to enable SAML authentication:
[desktop]
redirect_whitelist="^\/.*$,^https:\/\/login.microsoft.com/12345678-1234-1234-1234-abcdefghijkl\/.*$"
[[auth]]
backend=libsaml.backend.SAML2Backend
[libsaml]
xmlsec_binary=/usr/bin/xmlsec1
entity_id=https://hue.mydomain.net
metadata_file=/opt/hue/samlidp-openam-metadata.xml
key_file=/opt/hue/saml/saml.key
cert_file=/opt/hue/saml/host.pem
attribute_map_dir=/opt/hue/saml/attribute_mapping
user_attribute_mapping='{"uid":"username"}'
want_response_signed=false
username_source=attributes
-
redirect_whitelist
is a regex used to match the redirect URL you will be using. This is tied to your Azure SSO application and can be found in your metadata file. -
backend
is the selection of using SAML as your authentication tool, so HUE knows what to expect, this won’t change. -
xmlsec_binary
is the location of your xmlsec library. Handy command below for finding files:sudo find / -name xmlsec
-
entity_id
this is the URL that HUE lives at. From this URL the SAML URLS are derived from. -
metadata_file
is the XML you got from Azure. This will need full path and name of the file. -
key_file
is the key file you derived from the certificate file. Full path and name also required. -
cert_file
is the certificate you got from Azure. Full path and name required. -
attribute_map_dir
is required to host our custom mapping. We’ll be using this due to the disconnect between Azure <-> HUE <-> Django and will read more about it in Step 5. -
user_attribute_mapping
is the map that HUE will TRY to use. Theuid
is sent from Azure as per my note in Step 2. However, theuid
on the left of this map is the pysaml variable specifically mapping to Django’susername
. You do not need to use this, you can use another one but from Azure they are fairly lengthy i.e. “http://schemas.xmlsoap/org/ws/2005/05/identity/claims/emailaddress
”
-
want_response_signed
is how to get HUE to verify a signature on IdP responses. As Azure seems to have issues with this, I set this to false to get past this problem. -
username_source
is the way HUE will derive where to match the username from within your SAML request to the backend. Setting this toattributes
means that one of the attributes sent by Azure, either default or custom, can be mapped to the username that HUE will use for authentication. This allows for some flexibility if you want to use names or email addresses for example. -
create_users_on_login
is not in the above configuration but appears in the libsaml section and can, in theory, create users once SSO has authenticated them. For additional security in our implementation, we created users in HUE’s Django backend as part of the application set up. This means any SAML authentication request maps to an existing user in the Django database.
Step 5 - Attribute Mapping
If you are following the steps so far exactly, feel free to just copy the code below and create a file in the location you set for the attribute_map_dir
configuration in the hue.ini
file. However, if you’re adapting your own configuration or you just want to know a bit more about why we need to do these steps, I highly recommend reading the thread in the second link I put at the top of the article.
In a nutshell, there is a chain of mapping that happens which is called out in the link above:
SAML response attribute/value ====> pysaml attribute/value
pysaml attribute/value ====> djangosaml “username” attribute
djangosaml user ====> Hue user
The uid
parameter doesn’t map natively to one of the existing parameters in pysaml, even though there is an existing mapping for uid
in it’s config because it’s looking for a seperate OID (seriously, read the thread here to learn more) so we need to create our own mapping instead. To do that, we create the following python file:
## saml_uri.py
MAP = {
"identifier": "urn:oasis:names:tc:SAML:2.0:attrname-format:uri",
"fro": {
'uid': 'uid',
},
"to": {
'uid': 'uid',
}
}
This file is mapping the uid
coming from the Azure SAML response to the uid
that pysaml understands by bypassing the OID lookup. In theory this is where you can map any of your SAML response attributes and map them to any of the attributes that pysaml can handle (Full list of them are at desktop/libs/libsaml/attribute-maps/SAML2.py
in HUE deployments).
I have no idea why they use “fro” instead of “from”.
Conclusion
And that’s it! This is the configuration that I successfully implemented, if you’re having problems doing the same, feel free to contact me over twitter.