Available in 1, 3, 6 and 12 Months Free Updates Plans
PDF: $15 $60

Test Engine: $20 $80

PDF + Engine: $25 $99

Professional-Data-Engineer Practice Test


Page 1 out of 23 Pages

Topic 5: Practice Questions

When you store data in Cloud Bigtable, what is the recommended minimum amount of stored data?


A.

500 TB


B.

1 GB


C.

1 TB
500 GB


D.

500 GB





C.
  

1 TB
500 GB



Explanation
Cloud Bigtable is not a relational database. It does not support SQL queries, joins, or multi-row transactions. It
is not a good solution for less than 1 TB of data.

In order to securely transfer web traffic data from your computer's web browser to the Cloud Dataproc cluster you should use a(n) _____.


A.

VPN connection


B.

Special browser


C.

SSH tunnel


D.

FTP connection





C.
  

SSH tunnel



Explanation
To connect to the web interfaces, it is recommended to use an SSH tunnel to create a secure connection to the master node.

Suppose you have a dataset of images that are each labeled as to whether or not they contain a human face. To create a neural network that recognizes human faces in images using this labeled dataset, what approach would likely be the most effective?


A.

Use K-means Clustering to detect faces in the pixels.


B.

Use feature engineering to add features for eyes, noses, and mouths to the input data.


C.

Use deep learning by creating a neural network with multiple hidden layers to automatically detect features of faces.


D.

Build a neural network with an input layer of pixels, a hidden layer, and an output layer with two
categories.





C.
  

Use deep learning by creating a neural network with multiple hidden layers to automatically detect features of faces.



Traditional machine learning relies on shallow nets, composed of one input and one output layer, and at most
one hidden layer in between. More than three layers (including input and output) qualifies as “deep” learning.
So deep is a strictly defined, technical term that means more than one hidden layer.
In deep-learning networks, each layer of nodes trains on a distinct set of features based on the previous layer’s
output. The further you advance into the neural net, the more complex the features your nodes can recognize,
since they aggregate and recombine features from the
previous layer.
A neural network with only one hidden layer would be unable to automatically recognize high-level features
of faces, such as eyes, because it wouldn't be able to "build" these features using previous hidden layers that
detect low-level features, such as lines.
Feature engineering is difficult to perform on raw image data.
K-means Clustering is an unsupervised learning method used to categorize unlabeled data.

You are planning to use Google's Dataflow SDK to analyze customer data such as displayed below. Your
project requirement is to extract only the customer name from the data source and then write to an output
PCollection.
Tom,555 X street
Tim,553 Y street
Sam, 111 Z street
Which operation is best suited for the above data processing requirement?


A.

ParDo


B.

Sink API


C.

Source API


D.

Data extraction





A.
  

ParDo



You are planning to use Google's Dataflow SDK to analyze customer data


A.

month-by-month


B.

minute-by-minute


C.

week-by-week


D.

hour-by-hour





B.
  

minute-by-minute



Explanation
One of the advantages of Cloud Dataproc is its low cost. Dataproc charges for what you really use with
minute-by-minute billing and a low, ten-minute-minimum billing period.


Page 1 out of 23 Pages