Foreword
Preface
1. IntroductionSecurity OverviewConfidentialityIntegrityAvailabilityAuthentication, Authorization, and AccountingHadoop Security: A Brief HistoryHadoop Components and EcosystemApache HDFSApache YARNApache MapReduceApache HiveCloudera ImpalaApache Sentry (Incubating)Apache HBaseApache AccumuloApache SolrApache OozieApache ZooKeeperApache FlumeApache SqoopCloudera HueSummaryPart I. Security Architecture
2. Securing Distributed SystemsThreat CategoriesUnauthorized Access/MasqueradeInsider ThreatDenial of ServiceThreats to DataThreat and Risk AssessmentUser AssessmentEnvironment AssessmentVulnerabilitiesDefense in DepthSummary
3. System ArchitectureOperating EnvironmentNetwork SecurityNetwork SegmentationNetwork FirewallsIntrusion Detection and PreventionHadoop Roles and Separation StrategiesMaster NodesWorker NodesManagement NodesEdge NodesOperating System SecurityRemote Access ControlsHost FirewallsSELinuxSummary
4. KerberosWhy Kerberos?Kerberos OverviewKerberos Workflow: A Simple ExampleKerberos TrustsMIT KerberosServer ConfigurationClient ConfigurationSummaryPart II. Authentication, Authorization, and Accounting
5. Identity and AuthenticationIdentityMapping Kerberos Principals to UsernamesHadoop User to Group MappingProvisioning of Hadoop UsersAuthenticationKerberosUsername and Password AuthenticationTokensImpersonationConfigurationSummary
6. AuthorizationHDFS AuthorizationHDFS Extended ACLsService-Level AuthorizationMapReduce and YARN AuthorizationMapReduce (MR1)YARN (MR2)ZooKeeper ACLsOozie AuthorizationHBase and Accumulo AuthorizationSystem, Namespace, and Table-Level AuthorizationColumn- and Cell-Level AuthorizationSummary
7. Apache Sentry (Incubating)Sentry ConceptsThe Sentry ServiceSentry Service ConfigurationHive AuthorizationHive Sentry ConfigurationImpala AuthorizationImpala Sentry ConfigurationSolr AuthorizationSolr Sentry ConfigurationSentry Privilege ModelsSQL Privilege ModelSolr Privilege ModelSentry Policy AdministrationSQL CommandsSQL Policy FileSolr Policy FilePolicy File Verification and ValidationMigrating From Policy FilesSummary
8. AccountingHDFS Audit LogsMapReduce Audit LogsYARN Audit LogsHive Audit LogsCloudera Impala Audit LogsHBase Audit LogsAccumulo Audit LogsSentry Audit LogsLog AggregationSummaryPart III. Data Security
9. Data ProtectionEncryption AlgorithmsEncrypting Data at RestEncryption and Key ManagementHDFS Data-at-Rest EncryptionMapReduce2 Intermediate Data EncryptionImpala Disk Spill EncryptionFull Disk EncryptionFilesystem EncryptionImportant Data Security Consideration for HadoopEncrypting Data in TransitTransport Layer SecurityHadoop Data-in-Transit EncryptionData Destruction and DeletionSummary
10. Securing Data IngestIntegrity of Ingested DataData Ingest ConfidentialityFlume EncryptionSqoop EncryptionIngest WorkflowsEnterprise ArchitectureSummary
11. Data Extraction and Client Access Security.Hadoop Command-Line InterfaceSecuring ApplicationsHBaseHBase ShellHBase REST GatewayHBase Thrift GatewayAccumuloAccumulo ShellAccumulo Proxy ServerOozieSqoopSQL AccessImpalaHiveWebHDFS/HttpFSSummary
12. Cloudera HueHue HTTPSHue AuthenticationSPNEGO BackendSAML BackendLDAP BackendHue AuthorizationHue SSL Client ConfigurationsSummaryPart IV. Putting It All Together
13. Case StudiesCase Study: Hadoop Data WarehouseEnvironment SetupUser ExperienceSummaryCase Study: Interactive HBase Web ApplicationDesign and ArchitectureSecurity RequirementsCluster ConfigurationImplementation NotesSummary
Afterword
Index