Go Back on CCA175 Exam
Available in 1, 3, 6 and 12 Months Free Updates Plans
PDF: $15 $60

Test Engine: $20 $80

PDF + Engine: $25 $99

CCA175 Practice Test


Page 1 out of 20 Pages

Problem Scenario 52 : You have been given below code snippet.
val b = sc.parallelize(List(1,2,3,4,5,6,7,8,2,4,2,1,1,1,1,1))
Operation_xyz
Write a correct code snippet for Operation_xyz which will produce below output.
scalaxollection.Map[lnt,Long] = Map(5 -> 1, 8 -> 1, 3 -> 1, 6 -> 1, 1 -> S, 2 -> 3, 4 -> 2, 7 ->
1)






Answer: See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
b.countByValue
countByValue
Returns a map that contains all unique values of the RDD and their respective occurrence
counts. (Warning: This operation will finally aggregate the information in a single
reducer.)
Listing Variants
def countByValue(): Map[T, Long]

Problem Scenario 81 : You have been given MySQL DB with following details. You have
been given following product.csv file
product.csv
productID,productCode,name,quantity,price1001,PEN,Pen Red,5000,1.23
1002,PEN,Pen Blue,8000,1.25
1003,PEN,Pen Black,2000,1.25
1004,PEC,Pencil 2B,10000,0.48
1005,PEC,Pencil 2H,8000,0.49
1006,PEC,Pencil HB,0,9999.99
Now accomplish following activities.
1. Create a Hive ORC table using SparkSql
2. Load this data in Hive table.
3. Create a Hive parquet table using SparkSQL and load data in it.






Answer: See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Create this tile in HDFS under following directory (Without header}
/user/cloudera/he/exam/task1/productcsv
Step 2 : Now using Spark-shell read the file as RDD
// load the data into a new RDD
val products = sc.textFile("/user/cloudera/he/exam/task1/product.csv")
// Return the first element in this RDD
prod u cts.fi rst()
Step 3 : Now define the schema using a case class
case class Product(productid: Integer, code: String, name: String, quantity:lnteger, price:
Float)
Step 4 : create an RDD of Product objects
val prdRDD = products.map(_.split(",")).map(p =>
Product(p(0).tolnt,p(1),p(2),p(3}.tolnt,p(4}.toFloat))
prdRDD.first()
prdRDD.count()
Step 5 : Now create data frame val prdDF = prdRDD.toDF()
Step 6 : Now store data in hive warehouse directory. (However, table will not be created }
import org.apache.spark.sql.SaveMode
prdDF.write.mode(SaveMode.Overwrite).format("orc").saveAsTable("product_orc_table")
step 7: Now create table using data stored in warehouse directory. With the help of hive.
hive
show tables
CREATE EXTERNAL TABLE products (productid int,code string,name string .quantity int,
price float}
STORED AS ore
LOCATION 7user/hive/warehouse/product_orc_table';
Step 8 : Now create a parquet table
import org.apache.spark.sql.SaveMode
prdDF.write.mode(SaveMode.Overwrite).format("parquet").saveAsTable("product_parquet_
table")
Step 9 : Now create table using this
CREATE EXTERNAL TABLE products_parquet (productid int,code string,name string
.quantity int, price float}
STORED AS parquet
LOCATION 7user/hive/warehouse/product_parquet_table';
Step 10 : Check data has been loaded or not.
Select * from products;
Select * from products_parquet;

Problem Scenario 19 : You have been given following mysql database details as well as
other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Now accomplish following activities.
1. Import departments table from mysql to hdfs as textfile in departments_text directory.
2. Import departments table from mysql to hdfs as sequncefile in departments_sequence
directory.
3. Import departments table from mysql to hdfs as avro file in departments avro directory.
4. Import departments table from mysql to hdfs as parquet file in departments_parquet
directory.






Answer: See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Import departments table from mysql to hdfs as textfile
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~username=retail_dba \
-password=cloudera \
-table departments \
-as-textfile \
-target-dir=departments_text
verify imported data
hdfs dfs -cat departments_text/part"
Step 2 : Import departments table from mysql to hdfs as sequncetlle
sqoop import \
-connect jdbc:mysql://quickstart:330G/retaiI_db \
~username=retail_dba \
-password=cloudera \
-table departments \
-as-sequencetlle \
-~target-dir=departments sequence
verify imported data
hdfs dfs -cat departments_sequence/part*
Step 3 : Import departments table from mysql to hdfs as sequncetlle
sqoop import \
-connect jdbc:mysql://quickstart:330G/retaiI_db \
~username=retail_dba \
-password=cloudera \
-table departments \
-as-avrodatafile \
-target-dir=departments_avro
verify imported data
hdfs dfs -cat departments avro/part*
Step 4 : Import departments table from mysql to hdfs as sequncetlle
sqoop import \
-connect jdbc:mysql://quickstart:330G/retaiI_db \
~username=retail_dba \
-password=cloudera \
-table departments \
-as-parquetfile \
-target-dir=departments_parquet
verify imported data
hdfs dfs -cat departmentsparquet/part*

Problem Scenario 58 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 2) val b =
a.keyBy(_.length)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, Seq[String])] = Array((4,ArrayBuffer(lion)), (6,ArrayBuffer(spider)),
(3,ArrayBuffer(dog, cat)), (5,ArrayBuffer(tiger, eagle}}}






Answer: See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
b.groupByKey.collect
groupByKey [Pair]
Very similar to groupBy, but instead of supplying a function, the key-component of each
pair will automatically be presented to the partitioner.
Listing Variants
def groupByKeyQ: RDD[(K, lterable[V]}]
def groupByKey(numPartittons: Int): RDD[(K, lterable[V] )]
def groupByKey(partitioner: Partitioner): RDD[(K, lterable[V])]

Problem Scenario 53 : You have been given below code snippet.
val a = sc.parallelize(1 to 10, 3)
operation1
b.collect
Output 1
Array[lnt] = Array(2, 4, 6, 8,10)
operation2
Output 2
Array[lnt] = Array(1,2, 3)
Write a correct code snippet for operation1 and operation2 which will produce desired
output, shown above.






Answer: See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
valb = a.filter(_%2==0)
a.filter(_ < 4).collect
filter
Evaluates a boolean function for each data item of the RDD and puts the items for which
the function returned true into the resulting RDD.
When you provide a filter function, it must be able to handle all data items contained in the
RDD. Scala provides so-called partial functions to deal with mixed data types (Tip: Partial
functions to deal are very useful if you have some data which may be bad and you do not
want to handle but for the good data (matching data) you want to apply some Kind of map
function. The following article is good. It teaches you about partial functions in a very nice
way and explains why case has to be used for partial functions:article)
Examples for mixed data without partial functions
val b = sc.parallelize(1 to 8)
b.filter(_ < 4)xollect
res15: Arrayjlnt] = Array(1, 2, 3)
val a = sc.parallelize(List("cat'\ "horse", 4.0, 3.5, 2, "dog"))
a.filter(_<4).collect
error: value < is not a member of Any


Page 1 out of 20 Pages