Available in 1, 3, 6 and 12 Months Free Updates Plans
PDF: $15 $60

Test Engine: $20 $80

PDF + Engine: $25 $99

Professional-Data-Engineer Practice Test


Page 7 out of 23 Pages

Topic 5: Practice Questions

What are two methods that can be used to denormalize tables in BigQuery?


A.

1) Split table into multiple tables; 2) Use a partitioned table


B.

1) Join tables into one table; 2) Use nested repeated fields


C.

1) Use a partitioned table; 2) Join tables into one table


D.

1) Use nested repeated fields; 2) Use a partitioned table





B.
  

1) Join tables into one table; 2) Use nested repeated fields



Explanation
The conventional method of denormalizing data involves simply writing a fact, along with all its dimensions,
into a flat table structure. For example, if you are dealing with sales transactions, you would write each
individual fact to a record, along with the accompanying dimensions such as order and customer information.
The other method for denormalizing data takes advantage of BigQuery’s native support for nested and
repeated structures in JSON or Avro input data. Expressing records using nested and repeated structures can
provide a more natural representation of the underlying data. In the case of the sales order, the outer part of a
JSON structure would contain the order and customer information, and the inner part of the structure would
contain the individual line items of the order, which would be represented as nested, repeated elements.

To run a TensorFlow training job on your own computer using Cloud Machine Learning Engine, what would your command start with?


A.

gcloud ml-engine local train


B.

gcloud ml-engine jobs submit training


C.

gcloud ml-engine jobs submit training local


D.

You can't run a TensorFlow program on your own computer using Cloud ML Engine .





A.
  

gcloud ml-engine local train



gcloud ml-engine local train - run a Cloud ML Engine training job locally
This command runs the specified module in an environment similar to that of a live Cloud ML Engine
Training Job.
This is especially useful in the case of testing distributed models, as it allows you to validate that you are
properly interacting with the Cloud ML Engine cluster configuration.

Cloud Bigtable is a recommended option for storing very large amounts of
____________________________?


A.

multi-keyed data with very high latency


B.

multi-keyed data with very low latency


C.

single-keyed data with very low latency


D.

single-keyed data with very high latency





C.
  

single-keyed data with very low latency



Cloud Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns,
allowing you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is
known as the row key. Cloud Bigtable is ideal for storing very large amounts of single-keyed data with very
low latency. It supports high read and write throughput at low latency, and it is an ideal data source for
MapReduce operations.

Which of these statements about BigQuery caching is true?


A.

By default, a query's results are not cached.


B.

BigQuery caches query results for 48 hours.


C.

Query results are cached even if you specify a destination table.


D.

There is no charge for a query that retrieves its results from cache.





D.
  

There is no charge for a query that retrieves its results from cache.



When query results are retrieved from a cached results table, you are not charged for the query.
BigQuery caches query results for 24 hours, not 48 hours.
Query results are not cached if you specify a destination table.
A query's results are always cached except under certain conditions, such as if you specify a destination table

When a Cloud Bigtable node fails, ____ is lost.


A.

all data


B.

no data


C.

the last transaction


D.

the time dimension





B.
  

no data



Explanation
A Cloud Bigtable table is sharded into blocks of contiguous rows, called tablets, to help balance the workload
of queries. Tablets are stored on Colossus, Google's file system, in SSTable format. Each tablet is associated
with a specific Cloud Bigtable node.
Data is never stored in Cloud Bigtable nodes themselves; each node has pointers to a set of tablets that are
stored on Colossus. As a result:
Rebalancing tablets from one node to another is very fast, because the actual data is not copied. Cloud
Bigtable simply updates the pointers for each node.
Recovery from the failure of a Cloud Bigtable node is very fast, because only metadata needs to be migrated to
the replacement node.
When a Cloud Bigtable node fails, no data is lost


Page 7 out of 23 Pages
Previous