Fine grained authorizationIn this blog I will explain how to use beeline in a secured cluster. The CDH 5.1.0 cluster is secured with Kerberos (authentication) and Sentry (authorization). If you want to setup a secured cluster checkout the related blog
kerberos-cloudera-setup. Cloudera is using
Sentry for fine grained authorization of data and metadata stored on a Hadoop cluster.
This blog is related to the hive command-line tool, using Hive through HUE is fine!
Why change from Hive CLI to Beeline?The primary difference between the two involves how the clients connect to Hive. The Hive CLI connects directly to the Hive Driver and requires that Hive be installed on the same machine as the client. However, Beeline connects to HiveServer2 and does not require the installation of Hive libraries on the same machine as the client. Beeline is a thin client that also uses the Hive JDBC driver but instead executes queries through HiveServer2, which allows multiple concurrent client connections and supports authentication.
Cloudera's Sentry security is working through HiveServer2 and not HiveServer1 which is used by Hive CLI. So hive though the command-line will not follow the policy from Setry. According to the
cloudera docs you should not use Hive CLI and WebHCat. Use beeline or impala-sell instead.
hive command-line will bypass sentry security!
Connect with Beeline# beeline with paramsbeeline -u url -n username -p password# url is a jdbc connection string, pointing to the hiveServer2 host.# or use the !connect actionbeelinebeeline> !connect jdbc:hive2://HiveServer2Host:Port
When using a kerberized cluster you can connect using your principle:
# initialize your kerberos ticketkinit# Connect with your ticket, no username / password required.# master01 is the node where HiveServer2 is running# In the url add the parameter principle with the hive principle beeline -u "jdbc:hive2://master01:10000/default;principal=hive/master01@MYREALM.COM"
You can find the full principle name in Cloudera Manager
- Administration -> Kerberos
- Credentials -> search hive
- Use the principle where HiveServer2 is running
Export a query to file with beeline:
HIVESERVER2_URL = "jdbc:hive2://master01:10000/default;principal=hive/master01@MYREALM.COM"# Just simple export (as a table).beeline -u $HIVESERVER2_URL -f quey.sql > result.txt# Result firstline: query, then: pretty table, lastline: '0: jdbc:hive2://master02:10000/default>'# Remove first and last line of result for a proper csv-file with a header:beeline -u $HIVESERVER2_URL -f quey.sql --outputformat=csv --showHeader=true | tail -n +2 -f | head -n -1 > result.csv
TroubleshootAll the errors look the same. Error: Invalid URL ... (state=08S01,code=0). The --verbose=true options does not help much unfortunately. When you run into problems, check the hiveserver2 logs for hints.
Problem:
[alexanderbij@tools01 ~$ beeline -u jdbc:hive2://master01:10000/default;principal=hive/master01@MYREALM.COMscan complete in 3msConnecting to jdbc:hive2://master01:10000/defaultError: Invalid URL: jdbc:hive2://master01:10000/default (state=08S01,code=0)...
Solution:
Note that the Invalid URL message does not contain the principle part! Use "quotes around the url", otherwise the hive principle argument is not used
Problem:
# beeline shell14/08/08 09:44:23 ERROR transport.TSaslTransport: SASL negotiation failurejavax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7 ...Caused by: KrbException: Server not found in Kerberos database (7) ...Caused by: KrbException: Identifier doesn't match expected value (906)
Solution:
There is a keytab-file on the HiveServer2-node initialized with the principle. The connection string is using the wrong Kerberos principle for the keytab-file. Make sure you provide the correct hive principle in the connection url.