Apache Cassandra
Key improvements
Cassandra Query Language (CQL) Enhancements
One of the main objectives of Cassandra 1.1 was to bring CQL up to parity with the legacy API and command line interface (CLI) that has shipped with Cassandra for several years. This release achieves that goal. CQL is now the primary interface into the DBMS.
Composite Primary Key Columns
The most significant enhancement of CQL is support for composite primary key columns and wide rows. Composite keys distribute column family data among the nodes. New querying capabilities are a beneficial side effect of wide-row support. You use an ORDER BY clause to sort the result set.
Global Row and Key Caches
Memory caches for column families are now managed globally instead of at the individual column family level, simpliying configuration and tuning. Cassandra automatically distributes memory for various column families based on the overall workload and specific column family usage. Administrators can choose to include or exclude column families from being cached via the caching parameter that is used when creating or modifying column families.
Row-Level Isolation
Full row-level isolation is now in place so that writes to a row are isolated to the client performing the write and are not visible to any other user until they are complete. From a transactional ACID (atomic, consistent, isolated, durable) standpoint, this enhancement now gives Cassandra transactional AID support. Consistency in the ACID sense typically involves referential integrity with foreign keys among related tables, which Cassandra does not have. Cassandra offers tunable consistency not in the ACID sense, but in the CAP theorem sense where data is made consistent across all the nodes in a distributed database cluster. A user can pick and choose on a per operation basis how many nodes must receive a DML command or respond to a SELECT query.
Hadoop Integration
The following low-level features have been added to Cassandra’s support for Hadoop:
- Secondary index support for the column family input format. Hadoop jobs can now make use of Cassandra secondary indexes.
- Wide row support. Previously, wide rows that had, for example, millions of columns could not be accessed, but now they can be read and paged through in Hadoop.
- The bulk output format provides a more efficient way to load data into Cassandra from a Hadoop job.
Basic architecture
A Cassandra instance is a collection of independent nodes that are configured together into a cluster. In a Cassandra cluster, all nodes are peers, meaning there is no master node or centralized management process. A node joins a Cassandra cluster based on certain aspects of its configuration. This section explains those aspects of the Cassandra cluster architecture.
Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster. Gossip is a peer-to-peer communication protocol in which nodes periodically exchange state information about themselves and about other nodes they know about.
In Cassandra, the gossip process runs every second and exchanges state messages with up to three other nodes in the cluster. The nodes exchange information about themselves and about the other nodes that they have gossiped about, so all nodes quickly learn about all other nodes in the cluster. A gossip message has a version associated with it, so that during a gossip exchange, older information is overwritten with the most current state for a particular node.
When a node first starts up, it looks at its configuration file to determine the name of the Cassandra cluster it belongs to and which node(s), called seeds, to contact to obtain information about the other nodes in the cluster. These cluster contact points are configured in the cassandra.yaml configuration file for a node.
Failure detection is a method for locally determining, from gossip state, if another node in the system is up or down. Failure detection information is also used by Cassandra to avoid routing client requests to unreachable nodes whenever possible.
BigDataTraining.IN
has a strong focus and established thought leadership in the area of
Big Data and Analytics. We use a global delivery model to help you to
evaluate and implement solutions tailored to your specific technical and
business context.
http://www.bigdatatraining.in/hadoop-development/training-schedule/
http://www.bigdatatraining.in/contact/
Mail:
info@bigdatatraining.in
Call:
+91 9789968765
044 - 42645495
Visit Us:
#67, 2nd Floor, Gandhi Nagar 1st Main Road, Adyar, Chennai - 20
[Opp to Adyar Lifestyle Super Market]