DWBI-TECH BLOGS (Pradeep Kannadiga): Big Data - Good Books for Hadoop, Hive, Pig, Impala, Hbase.

Big Data - Good Books for Hadoop, Hive, Pig, Impala, Hbase.

1) Hadoop: The Definitive Guide

Hadoop: The Definitive Guide: Ready to unlock the power of your data? With this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.

You’ll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN).

Store large datasets with the Hadoop Distributed File System (HDFS)

Run distributed computations with MapReduce

Use Hadoop’s data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence

Discover common pitfalls and advanced features for writing real-world MapReduce programs

Design, build, and administer a dedicated Hadoop cluster—or run Hadoop in the cloud

Load data from relational databases into HDFS, using Sqoop

Perform large-scale data processing with the Pig query language

Analyze datasets with Hive, Hadoop’s data warehousing system

Take advantage of HBase for structured and semi-structured data, and ZooKeeper for building distributed systems

2) Programming Hive

Programming Hive: Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem.
This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You’ll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.

Use Hive to create, alter, and drop databases, tables, views, functions, and indexes
Customize data formats and storage options, from files to external databases
Load and extract data from tables—and use queries, grouping, filtering, joining, and other conventional query methods
Gain best practices for creating user defined functions (UDFs)
Learn Hive patterns you should use and anti-patterns you should avoid
Integrate Hive with other data processing programs
Use storage handlers for NoSQL databases and other datastores
Learn the pros and cons of running Hive on Amazon’s Elastic MapReduce

3) Programming Pig

Programming Pig: This guide is an ideal learning tool and reference for Apache Pig, the open source engine for executing parallel data flows on Hadoop. With Pig, you can batch-process data without having to create a full-fledged application—making it easy for you to experiment with new datasets.

Programming Pig introduces new users to Pig, and provides experienced users with comprehensive coverage on key features such as the Pig Latin scripting language, the Grunt shell, and User Defined Functions (UDFs) for extending Pig. If you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig.

Delve into Pig’s data model, including scalar and complex data types
Write Pig Latin scripts to sort, group, join, project, and filter your data
Use Grunt to work with the Hadoop Distributed File System (HDFS)
Build complex data processing pipelines with Pig’s macros and modularity features
Embed Pig Latin in Python for iterative processing and other advanced tasks
Create your own load and store functions to handle data formats and storage mechanisms
Get performance tips for running scripts on Hadoop clusters in less time

4) HBase: The Definitive Guide

HBase: The Definitive Guide: If you're looking for a scalable storage solution to accommodate a virtually endless amount of data, this book shows you how Apache HBase can fulfill your needs. As the open source implementation of Google's BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. Many IT executives are asking pointed questions about HBase. This book provides meaningful answers, whether you’re evaluating this non-relational database or planning to put it into practice right away.

Discover how tight integration with Hadoop makes scalability with HBase easier

Distribute large datasets across an inexpensive cluster of commodity servers

Access HBase with native Java clients, or with gateway servers providing REST, Avro, or Thrift APIs

Get details on HBase’s architecture, including the storage format, write-ahead log, background processes, and more

Integrate HBase with Hadoop's MapReduce framework for massively parallelized data processing jobs

Learn how to tune clusters, design schemas, copy tables, import bulk data, decommission nodes, and many other tasks

5) Getting Started with Impala: Interactive SQL for Apache Hadoop

Getting Started with Impala: Interactive SQL for Apache Hadoop:Learn how to write, tune, and port SQL queries and other statements for a Big Data environment, using Impala—the massively parallel processing SQL query engine for Apache Hadoop. The best practices in this practical guide help you design database schemas that not only interoperate with other Hadoop components, and are convenient for administers to manage and monitor, but also accommodate future expansion in data size and evolution of software capabilities.

Ideal for database developers and business analysts, Getting Started with Impala includes advice from Cloudera’s development team, as well as insights from its consulting engagements with customers.

Learn how Impala integrates with a wide range of Hadoop components
Attain high performance and scalability for huge data sets on production clusters
Explore common developer tasks, such as porting code to Impala and optimizing performance
Use tutorials for working with billion-row tables, date- and time-based values, and other techniques
Learn how to transition from rigid schemas to a flexible model that evolves as needs change
Take a deep dive into joins and the roles of statistics

DWBI-TECH BLOGS (Pradeep Kannadiga)

Friday, 26 December 2014

Big Data - Good Books for Hadoop, Hive, Pig, Impala, Hbase.

No comments:

Post a Comment

Search This Blog