Question # 1 Problem Scenario 62 : You have been given below code snippet.val a = sc.parallelize(List("dogM, "tiger", "lion", "cat", "panther", "eagle"), 2) val b = a.map(x => (x.length, x)) operation1 Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String)] = Array((3,xdogx), (5,xtigerx), (4,xlionx), (3,xcatx), (7,xpantherx), (5,xeaglex))
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : b.mapValuesf'x" + _ + "x").collect mapValues [Pair] : Takes the values of a RDD that consists of two-component tuples, and applies the provided function to transform each value. Tlien,.it.forms newtwo-componend tuples using the key and the transformed value and stores them in a new RDD.
Question # 2 Problem Scenario 59 : You have been given below code snippet. val x = sc.parallelize(1 to 20) val y = sc.parallelize(10 to 30) operationl z.collect Write a correct code snippet for operationl which will produce desired output, shown below. Array[lnt] = Array(16,12, 20,13,17,14,18,10,19,15,11)
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : val z = x.intersection(y) intersection : Returns the elements in the two RDDs which are the same.
Question # 3 Problem Scenario 78 : You have been given MySQL DB with following details user=retail_dba password=cloudera database=retail_db table=retail_db.orders table=retail_db.order_items jdbc URL = jdbc:mysql://quickstart:3306/retail_db Columns of order table : (orderid , order_date , order_customer_id, order_status) Columns of ordeMtems table : (order_item_td , order_item_order_id , order_item_product_id, order_item_quantity,order_item_subtotal,order_item_product_price) Please accomplish following activities. 1. Copy "retail_db.orders" and "retail_db.order_items" table to hdfs in respective directory p92_orders and p92_order_items . 2. Join these data using order_id in Spark and Python 3. Calculate total revenue perday and per customer 4. Calculate maximum revenue customer
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Import Single table . sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=orders --target-dir=p92_orders –m 1 sqoop import -connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=order_items --target-dir=p92_order_orderitems --m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs Step 2 : Read the data from one of the partition, created using above command, hadoop fs -cat p92_orders/part-m-00000 hadoop fs -cat p92 orderitems/part-m-00000 Step 3 : Load these above two directory as RDD using Spark and Python (Open pyspark terminal and do following). orders = sc.textFile(Mp92_orders") orderitems = sc.textFile("p92_order_items") Step 4 : Convert RDD into key value as (orderjd as a key and rest of the values as a value) #First value is orderjd orders Key Value = orders.map(lambda line: (int(line.split(",")[0]), line)) #Second value as an Orderjd orderltemsKeyValue = orderltems.map(lambda line: (int(line.split(",")[1]), line)) Step 5 : Join both the RDD using orderjd joinedData = orderltemsKeyValue.join(ordersKeyValue) #print the joined data for line in joinedData.collect(): print(line) #Format of joinedData as below. #[Orderld, 'All columns from orderltemsKeyValue', 'All columns from ordersKeyValue'] ordersPerDatePerCustomer = joinedData.map(lambda line: ((line[1][1].split(",")[1], line[1][1].split(",M)[2]), float(line[1][0].split(",")[4]))) amountCollectedPerDayPerCustomer = ordersPerDatePerCustomer.reduceByKey(lambda runningSum, amount: runningSum + amount} #(Out record format will be ((date,customer_id), totalAmount} for line in amountCollectedPerDayPerCustomer.collect(): print(line) #now change the format of record as (date,(customer_id,total_amount)) revenuePerDatePerCustomerRDD = amountCollectedPerDayPerCustomer.map(lambda threeElementTuple: (threeElementTuple[0][0], (threeElementTuple[0][1],threeElementTuple[1]))) for line in revenuePerDatePerCustomerRDD.collect(): print(line) #Calculate maximum amount collected by a customer for each day perDateMaxAmountCollectedByCustomer = revenuePerDatePerCustomerRDD.reduceByKey(lambda runningAmountTuple, newAmountTuple: (runningAmountTuple if runningAmountTuple[1] >= newAmountTuple[1] else newAmountTuple})for line in perDateMaxAmountCollectedByCustomer\sortByKey().collect(): print(line)
Question # 4 Problem Scenario 36 : You have been given a file named spark8/data.csv (type,name). data.csv 1,Lokesh 2,Bhupesh 2,Amit 2,Ratan 2,Dinesh 1,Pavan 1,Tejas 2,Sheela 1,Kumar 1,Venkat 1. Load this file from hdfs and save it back as (id, (all names of same type)) in results directory. However, make sure while saving it should be
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create file in hdfs (We will do using Hue). However, you can first create in local filesystem and then upload it to hdfs. Step 2 : Load data.csv file from hdfs and create PairRDDs val name = sc.textFile("spark8/data.csv") val namePairRDD = name.map(x=> (x.split(",")(0),x.split(",")(1))) Step 3 : Now swap namePairRDD RDD. val swapped = namePairRDD.map(item => item.swap) Step 4 : Now combine the rdd by key. val combinedOutput = namePairRDD.combineByKey(List(_), (x:List[String], y:String) => y :: x, (x:List[String], y:List[String]) => x ::: y) Step 5 : Save the output as a Text file and output must be written in a single file. :ombinedOutput.repartition(1).saveAsTextFile("spark8/result.txt")
Question # 5 Problem Scenario 26 : You need to implement near real time solutions for collecting information when submitted in file with below information. You have been given below directory location (if not available than create it) /tmp/nrtcontent. Assume your departments upstream service is continuously committing data in this directory as a new file (not stream of data, because it is near real time solution). As soon as file committed in this directory that needs to be available in hdfs in /tmp/flume location Data echo "I am preparing for CCA175 from ABCTECH.com" > /tmp/nrtcontent/.he1.txt mv /tmp/nrtcontent/.he1.txt /tmp/nrtcontent/he1.txt After few mins echo "I am preparing for CCA175 from TopTech.com" > /tmp/nrtcontent/.qt1.txt mv /tmp/nrtcontent/.qt1.txt /tmp/nrtcontent/qt1.txt Write a flume configuration file named flumes.conf and use it to load data in hdfs with following additional properties. 1. Spool /tmp/nrtcontent 2. File prefix in hdfs sholuld be events 3. File suffix should be Jog 4. If file is not commited and in use than it should have as prefix. 5. Data should be written as text to hdfs
Answer Description Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create directory mkdir /tmp/nrtcontent Step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume6.conf. agent1 .sources = source1 agent1 .sinks = sink1 agent1.channels = channel1 agent1 .sources.source1.channels = channel1 agent1 .sinks.sink1.channel = channel1 agent1 .sources.source1.type = spooldir agent1 .sources.source1.spoolDir = /tmp/nrtcontent agent1 .sinks.sink1 .type = hdfs agent1 .sinks.sink1.hdfs.path = /tmp/flume agent1.sinks.sink1.hdfs.filePrefix = events agent1.sinks.sink1.hdfs.fileSuffix = .log agent1 .sinks.sink1.hdfs.inUsePrefix = _ agent1 .sinks.sink1.hdfs.fileType = Data Stream Step 4 : Run below command which will use this configuration file and append data in hdfs. Start flume service: flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/fIume6.conf --name agent1 Step 5 : Open another terminal and create a file in /tmp/nrtcontent echo "I am preparing for CCA175 from ABCTechm.com" > /tmp/nrtcontent/.he1.txt mv /tmp/nrtcontent/.he1.txt /tmp/nrtcontent/he1.txt After few mins echo "I am preparing for CCA175 from TopTech.com" > /tmp/nrtcontent/.qt1.txt mv /tmp/nrtcontent/.qt1.txt /tmp/nrtcontent/qt1.txt
How to Pass Cloudera CCA175 Exam?
1
Buy Dumps
Purchase our up-to-date Cloudera CCA175 dumps that has real questions with verified answers.
2
Preparation
Prepare these CCA Spark and Hadoop Developer CCA175 exam questions answers at home in your free time.
3
Success
You will get same questions in real CCA Spark and Hadoop Developer Exam exam. Simply prepare at home and get certified.