Question # 1
Problem Scenario 62 : You have been given below code snippet.val a = sc.parallelize(List("dogM, "tiger", "lion", "cat", "panther", "eagle"), 2) val b = a.map(x => (x.length, x)) operation1 Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String)] = Array((3,xdogx), (5,xtigerx), (4,xlionx), (3,xcatx), (7,xpantherx), (5,xeaglex))
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : b.mapValuesf'x" + _ + "x").collect mapValues [Pair] : Takes the values of a RDD that consists of two-component tuples, and applies the provided function to transform each value. Tlien,.it.forms newtwo-componend tuples using the key and the transformed value and stores them in a new RDD.
Question # 2
Problem Scenario 59 : You have been given below code snippet. val x = sc.parallelize(1 to 20) val y = sc.parallelize(10 to 30) operationl z.collect Write a correct code snippet for operationl which will produce desired output, shown below. Array[lnt] = Array(16,12, 20,13,17,14,18,10,19,15,11)
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : val z = x.intersection(y) intersection : Returns the elements in the two RDDs which are the same.
Question # 3
Problem Scenario 78 : You have been given MySQL DB with following details user=retail_dba password=cloudera database=retail_db table=retail_db.orders table=retail_db.order_items jdbc URL = jdbc:mysql://quickstart:3306/retail_db Columns of order table : (orderid , order_date , order_customer_id, order_status) Columns of ordeMtems table : (order_item_td , order_item_order_id , order_item_product_id, order_item_quantity,order_item_subtotal,order_item_product_price) Please accomplish following activities. 1. Copy "retail_db.orders" and "retail_db.order_items" table to hdfs in respective directory p92_orders and p92_order_items . 2. Join these data using order_id in Spark and Python 3. Calculate total revenue perday and per customer 4. Calculate maximum revenue customer
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Import Single table . sqoop import -connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=orders -target-dir=p92_orders –m 1 sqoop import -connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=order_items -target-dir=p92_order_orderitems -m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs Step 2 : Read the data from one of the partition, created using above command, hadoop fs -cat p92_orders/part-m-00000 hadoop fs -cat p92 orderitems/part-m-00000 Step 3 : Load these above two directory as RDD using Spark and Python (Open pyspark terminal and do following). orders = sc.textFile(Mp92_orders") orderitems = sc.textFile("p92_order_items") Step 4 : Convert RDD into key value as (orderjd as a key and rest of the values as a value) #First value is orderjd orders Key Value = orders.map(lambda line: (int(line.split(",")[0]), line)) #Second value as an Orderjd orderltemsKeyValue = orderltems.map(lambda line: (int(line.split(",")[1]), line)) Step 5 : Join both the RDD using orderjd joinedData = orderltemsKeyValue.join(ordersKeyValue) #print the joined data for line in joinedData.collect(): print(line) #Format of joinedData as below. #[Orderld, 'All columns from orderltemsKeyValue', 'All columns from ordersKeyValue'] ordersPerDatePerCustomer = joinedData.map(lambda line: ((line[1][1].split(",")[1], line[1][1].split(",M)[2]), float(line[1][0].split(",")[4]))) amountCollectedPerDayPerCustomer = ordersPerDatePerCustomer.reduceByKey(lambda runningSum, amount: runningSum + amount} #(Out record format will be ((date,customer_id), totalAmount} for line in amountCollectedPerDayPerCustomer.collect(): print(line) #now change the format of record as (date,(customer_id,total_amount)) revenuePerDatePerCustomerRDD = amountCollectedPerDayPerCustomer.map(lambda threeElementTuple: (threeElementTuple[0][0], (threeElementTuple[0][1],threeElementTuple[1]))) for line in revenuePerDatePerCustomerRDD.collect(): print(line) #Calculate maximum amount collected by a customer for each day perDateMaxAmountCollectedByCustomer = revenuePerDatePerCustomerRDD.reduceByKey(lambda runningAmountTuple, newAmountTuple: (runningAmountTuple if runningAmountTuple[1] >= newAmountTuple[1] else newAmountTuple})for line in perDateMaxAmountCollectedByCustomer\sortByKey().collect(): print(line)
Question # 4
Problem Scenario 36 : You have been given a file named spark8/data.csv (type,name). data.csv 1,Lokesh 2,Bhupesh 2,Amit 2,Ratan 2,Dinesh 1,Pavan 1,Tejas 2,Sheela 1,Kumar 1,Venkat 1. Load this file from hdfs and save it back as (id, (all names of same type)) in results directory. However, make sure while saving it should be
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create file in hdfs (We will do using Hue). However, you can first create in local filesystem and then upload it to hdfs. Step 2 : Load data.csv file from hdfs and create PairRDDs val name = sc.textFile("spark8/data.csv") val namePairRDD = name.map(x=> (x.split(",")(0),x.split(",")(1))) Step 3 : Now swap namePairRDD RDD. val swapped = namePairRDD.map(item => item.swap) Step 4 : Now combine the rdd by key. val combinedOutput = namePairRDD.combineByKey(List(_), (x:List[String], y:String) => y :: x, (x:List[String], y:List[String]) => x ::: y) Step 5 : Save the output as a Text file and output must be written in a single file. :ombinedOutput.repartition(1).saveAsTextFile("spark8/result.txt")
Question # 5
Problem Scenario 26 : You need to implement near real time solutions for collecting information when submitted in file with below information. You have been given below directory location (if not available than create it) /tmp/nrtcontent. Assume your departments upstream service is continuously committing data in this directory as a new file (not stream of data, because it is near real time solution). As soon as file committed in this directory that needs to be available in hdfs in /tmp/flume location Data echo "I am preparing for CCA175 from ABCTECH.com" > /tmp/nrtcontent/.he1.txt mv /tmp/nrtcontent/.he1.txt /tmp/nrtcontent/he1.txt After few mins echo "I am preparing for CCA175 from TopTech.com" > /tmp/nrtcontent/.qt1.txt mv /tmp/nrtcontent/.qt1.txt /tmp/nrtcontent/qt1.txt Write a flume configuration file named flumes.conf and use it to load data in hdfs with following additional properties. 1. Spool /tmp/nrtcontent 2. File prefix in hdfs sholuld be events 3. File suffix should be Jog 4. If file is not commited and in use than it should have as prefix. 5. Data should be written as text to hdfs
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create directory mkdir /tmp/nrtcontent Step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume6.conf. agent1 .sources = source1 agent1 .sinks = sink1 agent1.channels = channel1 agent1 .sources.source1.channels = channel1 agent1 .sinks.sink1.channel = channel1 agent1 .sources.source1.type = spooldir agent1 .sources.source1.spoolDir = /tmp/nrtcontent agent1 .sinks.sink1 .type = hdfs agent1 .sinks.sink1.hdfs.path = /tmp/flume agent1.sinks.sink1.hdfs.filePrefix = events agent1.sinks.sink1.hdfs.fileSuffix = .log agent1 .sinks.sink1.hdfs.inUsePrefix = _ agent1 .sinks.sink1.hdfs.fileType = Data Stream Step 4 : Run below command which will use this configuration file and append data in hdfs. Start flume service: flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/fIume6.conf -name agent1 Step 5 : Open another terminal and create a file in /tmp/nrtcontent echo "I am preparing for CCA175 from ABCTechm.com" > /tmp/nrtcontent/.he1.txt mv /tmp/nrtcontent/.he1.txt /tmp/nrtcontent/he1.txt After few mins echo "I am preparing for CCA175 from TopTech.com" > /tmp/nrtcontent/.qt1.txt mv /tmp/nrtcontent/.qt1.txt /tmp/nrtcontent/qt1.txt
Question # 6
Problem Scenario 82 : You have been given table in Hive with following structure (Which you have created in previous exercise). productid int code string name string quantity int price float Using SparkSQL accomplish following activities. 1. Select all the products name and quantity having quantity <= 2000 2. Select name and price of the product having code as 'PEN' 3. Select all the products, which name starts with PENCIL 4. Select all products which "name" begins with 'P\ followed by any two characters, followed by space, followed by zero or more characters
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Copy following tile (Mandatory Step in Cloudera QuickVM) if you have not done it. sudo su root cp /usr/lib/hive/conf/hive-site.xml /usr/lib/sparkVconf/ Step 2 : Now start spark-shell Step 3 ; Select all the products name and quantity having quantity <= 2000 val results = sqlContext.sql(......SELECT name, quantity FROM products WHERE quantity <= 2000......) results.showQ Step 4 : Select name and price of the product having code as 'PEN' val results = sqlContext.sql(......SELECT name, price FROM products WHERE code = 'PEN.......) results. showQ Step 5 : Select all the products , which name starts with PENCIL val results = sqlContext.sql(......SELECT name, price FROM products WHERE upper(name) LIKE 'PENCIL%.......} results. showQ Step 6 : select all products which "name" begins with 'P', followed by any two characters, followed by space, followed byzero or more characters - "name" begins with 'P', followed by any two characters, - followed by space, followed by zero or more characters val results = sqlContext.sql(......SELECT name, price FROM products WHERE name LIKE 'P_ %.......) results. show()
Question # 7
Problem Scenario 75 : You have been given MySQL DB with following details. user=retail_dba password=cloudera database=retail_db table=retail_db.orders table=retail_db.order_items jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following activities. 1. Copy "retail_db.order_items" table to hdfs in respective directory p90_order_items . 2. Do the summation of entire revenue in this table using pyspark. 3. Find the maximum and minimum revenue as well. 4. Calculate average revenue Columns of ordeMtems table : (order_item_id , order_item_order_id , order_item_product_id, order_item_quantity,order_item_subtotal,order_ item_subtotal,order_item_product_price) |
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Import Single table . sqoop import -connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=order_items -target -dir=p90 ordeMtems -m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs Step 2 : Read the data from one of the partition, created using above command. hadoop fs -cat p90_order_items/part-m-00000 Step 3 : In pyspark, get the total revenue across all days and orders. entire TableRDD = sc.textFile("p90_order_items") #Cast string to float extractedRevenueColumn = entireTableRDD.map(lambda line: float(line.split(",")[4])) Step 4 : Verify extracted data for revenue in extractedRevenueColumn.collect(): print revenue #use reduce'function to sum a single column vale totalRevenue = extractedRevenueColumn.reduce(lambda a, b: a + b) Step 5 : Calculate the maximum revenue maximumRevenue = extractedRevenueColumn.reduce(lambda a, b: (a if a>=b else b)) Step 6 : Calculate the minimum revenue minimumRevenue = extractedRevenueColumn.reduce(lambda a, b: (a if a<=b else b)) Step 7 : Caclculate average revenue count=extractedRevenueColumn.count() averageRev=totalRevenue/count
Question # 8
Problem Scenario 29 : Please accomplish the following exercises using HDFS command line options. 1. Create a directory in hdfs named hdfs_commands. 2. Create a file in hdfs named data.txt in hdfs_commands. 3. Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions. 4. Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory. 5. Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file system. 6. Create a file in local filesystem named file1.txt and put it to hdfs
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create directory hdfs dfs -mkdir hdfs_commands Step 2 : Create a file in hdfs named data.txt in hdfs_commands. hdfs dfs -touchz hdfs_commands/data.txt Step 3 : Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions. hdfs dfs -copyToLocal -p hdfs_commands/data.txt/home/cloudera/Desktop/HadoopExam Step 4 : Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory. touch data_local.txt hdfs dfs -moveFromLocal /home/cloudera/Desktop/HadoopExam/dataJocal.txt hdfs_commands/ Step 5 : Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file system. hdfs dfs -touchz hdfscommands/data hdfs.txt hdfs dfs -getfrdfs_commands/data_hdfs.txt /home/cloudera/Desktop/HadoopExam/ Step 6 : Create a file in local filesystem named filel .txt and put it to hdfs touch filel.txt hdfs dfs -put/home/cloudera/Desktop/HadoopExam/file1.txt hdfs_commands/
Question # 9
Problem Scenario 11 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following. 1. Import departments table in a directory called departments. 2. Once import is done, please insert following 5 records in departments mysql table. Insert into departments(10, physics); Insert into departments(11, Chemistry); Insert into departments(12, Maths); Insert into departments(13, Science); Insert into departments(14, Engineering); 3. Now import only new inserted records and append to existring directory . which has been created in first step.
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Clean already imported data. (In real exam, please make sure you dont delete data generated from previous exercise). hadoop fs -rm -R departments Step 2 : Import data in departments directory. sqoop import \ -connect jdbc:mysql://quickstart:3306/retail_db \ -username=retail_dba \ -password=cloudera \ -table departments \ "target-dir/user/cloudera/departments Step 3 : Insert the five records in departments table. mysql -user=retail_dba -password=cloudera retail_db Insert into departments values(10, "physics"); Insert into departments values(11, "Chemistry"); Insert into departments values(12, "Maths"); Insert into departments values(13, "Science"); Insert into departments values(14, "Engineering"); commit; select' from departments; Step 4 : Get the maximum value of departments from last import, hdfs dfs -cat /user/cloudera/departments/part* that should be 7 Step 5 : Do the incremental import based on last import and append the results. sqoop import \ -connect "jdbc:mysql://quickstart.cloudera:330G/retail_db" \ ~username=retail_dba \ -password=cloudera \ -table departments \ -target-dir /user/cloudera/departments \ -append \ -check-column "department_id" \ -incremental append \ -last-value 7 Step 6 : Now check the result. hdfs dfs -cat /user/cloudera/departments/part"
Question # 10
Problem Scenario 20 : You have been given MySQL DB with following details. user=retail_dba password=cloudera database=retail_db table=retail_db.categories jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following activities. 1. Write a Sqoop Job which will import "retaildb.categories" table to hdfs, in a directory name "categories_targetJob".
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Connecting to existing MySQL Database mysql -user=retail_dba - password=cloudera retail_db Step 2 : Show all the available tables show tables; Step 3 : Below is the command to create Sqoop Job (Please note that - import space is mandatory) sqoop job -create sqoopjob \ - import \ -connect "jdbc:mysql://quickstart:3306/retail_db" \ -username=retail_dba \ -password=cloudera \ -table categories \ -target-dir categories_targetJob \ -fields-terminated-by '|' \ -lines-terminated-by '\n' Step 4 : List all the Sqoop Jobs sqoop job -list Step 5 : Show details of the Sqoop Job sqoop job -show sqoopjob Step 6 : Execute the sqoopjob sqoopjob -exec sqoopjob Step 7 : Check the output of import job hdfs dfs -Is categories_target_job hdfs dfs -cat categories_target_job/part*
Get 96 CCA Spark and Hadoop Developer Exam questions Access in less then $0.12 per day.
Cloudera Bundle 1: 1 Month PDF Access For All Cloudera Exams with Updates $100
$400
Buy Bundle 1
Cloudera Bundle 2: 3 Months PDF Access For All Cloudera Exams with Updates $200
$800
Buy Bundle 2
Cloudera Bundle 3: 6 Months PDF Access For All Cloudera Exams with Updates $300
$1200
Buy Bundle 3
Cloudera Bundle 4: 12 Months PDF Access For All Cloudera Exams with Updates $400
$1600
Buy Bundle 4
Disclaimer: Fair Usage Policy - Daily 5 Downloads
CCA Spark and Hadoop Developer Exam Exam Dumps
Exam Code: CCA175
Exam Name: CCA Spark and Hadoop Developer Exam
- 90 Days Free Updates
- Cloudera Experts Verified Answers
- Printable PDF File Format
- CCA175 Exam Passing Assurance
Get 100% Real CCA175 Exam Dumps With Verified Answers As Seen in the Real Exam. CCA Spark and Hadoop Developer Exam Exam Questions are Updated Frequently and Reviewed by Industry TOP Experts for Passing CCA Spark and Hadoop Developer Exam Quickly and Hassle Free.
Cloudera CCA175 Dumps
Struggling with CCA Spark and Hadoop Developer Exam preparation? Get the edge you need! Our carefully created CCA175 dumps give you the confidence to pass the exam. We offer:
1. Up-to-date CCA Spark and Hadoop Developer practice questions: Stay current with the latest exam content.
2. PDF and test engine formats: Choose the study tools that work best for you. 3. Realistic Cloudera CCA175 practice exam: Simulate the real exam experience and boost your readiness.
Pass your CCA Spark and Hadoop Developer exam with ease. Try our study materials today!
Official CCA Spark and Hadoop Developer exam info is available on Cloudera website at https://www.cloudera.com/services-and-support/training/cdhhdp-certification.html
Prepare your CCA Spark and Hadoop Developer exam with confidence!We provide top-quality CCA175 exam dumps materials that are:
1. Accurate and up-to-date: Reflect the latest Cloudera exam changes and ensure you are studying the right content.
2. Comprehensive Cover all exam topics so you do not need to rely on multiple sources.
3. Convenient formats: Choose between PDF files and online CCA Spark and Hadoop Developer Exam practice test for easy studying on any device.
Do not waste time on unreliable CCA175 practice test. Choose our proven CCA Spark and Hadoop Developer study materials and pass with flying colors. Try Dumps4free CCA Spark and Hadoop Developer Exam 2024 material today!
CCA Spark and Hadoop Developer Exams
-
Assurance
CCA Spark and Hadoop Developer Exam practice exam has been updated to reflect the most recent questions from the Cloudera CCA175 Exam.
-
Demo
Try before you buy! Get a free demo of our CCA Spark and Hadoop Developer exam dumps and see the quality for yourself. Need help? Chat with our support team.
-
Validity
Our Cloudera CCA175 PDF contains expert-verified questions and answers, ensuring you're studying the most accurate and relevant material.
-
Success
Achieve CCA175 success! Our CCA Spark and Hadoop Developer Exam exam questions give you the preparation edge.
If you have any question then contact our customer support at live chat or email us at support@dumps4free.com.
|