Available in 1, 3, 6 and 12 Months Free Updates Plans
PDF: $15 $60

Test Engine: $20 $80

PDF + Engine: $25 $99

Professional-Data-Engineer Practice Test


Page 8 out of 23 Pages

Topic 5: Practice Questions

Which Google Cloud Platform service is an alternative to Hadoop with Hive?


A.

Cloud Dataflow


B.

Cloud Bigtable


C.

BigQuery


D.

Cloud Datastore





C.
  

BigQuery



Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data
summarization, query, and analysis.
Google BigQuery is an enterprise data warehouse.

Which of the following is NOT one of the three main types of triggers that Dataflow supports?


A.

Trigger based on element size in bytes


B.

Trigger that is a combination of other triggers


C.

Trigger based on element count


D.

Trigger based on time





A.
  

Trigger based on element size in bytes



There are three major kinds of triggers that Dataflow supports: 1. Time-based triggers 2. Data-driven triggers.
You can set a trigger to emit results from a window when that window has received a certain number of data
elements. 3. Composite triggers. These triggers combine multiple time-based or data-driven triggers in some
logical way

Scaling a Cloud Dataproc cluster typically involves ____.


A.

increasing or decreasing the number of worker nodes


B.

increasing or decreasing the number of master nodes


C.

moving memory to run more applications on a single node





A.
  

increasing or decreasing the number of worker nodes



After creating a Cloud Dataproc cluster, you can scale the cluster by increasing or decreasing the number of
worker nodes in the cluster at any time, even when jobs are running on the cluster. Cloud Dataproc clusters are
typically scaled to:
1) increase the number of workers to make a job run faster
2) decrease the number of workers to save money
3) increase the number of nodes to expand available Hadoop Distributed Filesystem (HDFS) storage

You have a job that you want to cancel. It is a streaming pipeline, and you want to ensure that any data that is
in-flight is processed and written to the output. Which of the following commands can you use on the
Dataflow monitoring console to stop the pipeline job?


A.

Cancel


B.

Drain


C.

Stop


D.

Finish





B.
  

Drain



Using the Drain option to stop your job tells the Dataflow service to finish your job in its current state. Your
job will immediately stop ingesting new data from input sources, but the Dataflow
service will preserve any existing resources (such as worker instances) to finish processing and writing any
buffered data in your pipeline.

What is the recommended action to do in order to switch between SSD and HDD storage for your Google
Cloud Bigtable instance?


A.

create a third instance and sync the data from the two storage types via batch jobs


B.

export the data from the existing instance and import the data into a new instance


C.

run parallel instances where one is HDD and the other is SDD
the selection is final and you must resume using the same storage type


D.

the selection is final and you must resume using the same storage type





B.
  

export the data from the existing instance and import the data into a new instance



Explanation
When you create a Cloud Bigtable instance and cluster, your choice of SSD or HDD storage for the cluster is
permanent. You cannot use the Google Cloud Platform Console to change the type of storage that is used for
the cluster.
If you need to convert an existing HDD cluster to SSD, or vice-versa, you can export the data from the
existing instance and import the data into a new instance. Alternatively, you can write
a Cloud Dataflow or Hadoop MapReduce job that copies the data from one instance to another


Page 8 out of 23 Pages
Previous