Saturday 27 July 2013

Apache Pig Hadoop Training in chennai - BigDataTraining.IN

Pig Latin Statements

A Pig Latin statement is an operator that takes a relation as input and produces another relation as output. (This definition applies to all Pig Latin operators except LOAD and STORE which read data from and write data to the file system.) Pig Latin statements can span multiple lines and must end with a semi-colon ( ; ). Pig Latin statements are generally organized in the following manner:
  1. A LOAD statement reads data from the file system.
  2. A series of "transformation" statements process the data.
  3. A STORE statement writes output to the file system; or, a DUMP statement displays output to the screen.

 Pig Latin is a relatively simple language that executes statements. A statement is an operation that takes input (such as a bag, which represents a set of tuples) and emits another bag as its output. A bag is a relation, similar to table, that you'll find in a relational database (where tuples represent the rows, and individual tuples are made up of fields).

A script in Pig Latin often follows a specific format in which data is read from the file system, a number operations are performed on the data (transforming it in one or more ways), and then the resulting relation is written back to the file system.

 BigDataTraining.IN - India's Leading BigData Consulting & Training Provider, Request a Quote!
Pig has a rich set of data types, supporting not only high-level concepts like bags, tuples, and maps, but also simple data types such as ints, longs, floats, doubles, chararrays, and bytearrays. With the simple types, you'll find a range of arithmetic operators (such as add, subtract, multiply, divide, and module) in addition to a conditional operator called bincond that operates similar to the C ternary operator. And as you'd expect, a full suite of comparison operators, including rich pattern matching using regular expressions.
All Pig Latin statements operate on relations (and are called relational operators).  there's an operator for loading data from and storing data in the file system. There's a means to FILTER data by iterating the rows of a relation. This functionality is commonly used to remove data from the relation that is not needed for subsequent operations. Alternatively, if you need to iterate the columns of a relation instead of the rows, you can use the FOREACH operator. FOREACH permits nested operations such as FILTER and ORDER to transform the data during the iteration.
The ORDER operator provides the ability to sort a relation based on one or more fields. The JOIN operator performs an inner or outer join of two or more relations based on common fields. The SPLIT operator provides the ability to split a relation into two or more relations based on a user-defined expression. Finally, the GROUP operator groups the data in one or more relations based on some expression.

 http://www.bigdatatraining.in/contact/
http://www.bigdatatraining.in/hadoop-development/training-schedule/


Hadoop Training Chennai with Hands-On Practical Approach !

 


Mail:
info@bigdatatraining.in

Call:
+91 9789968765
044 - 42645495

Visit Us:
#67, 2nd Floor, Gandhi Nagar 1st Main Road, Adyar, Chennai - 20
[Opp to Adyar Lifestyle Super Market]

No comments: