An insurance company has raw data in JSON format that is sent without a predefined schedule through an
Amazon Kinesis Data Firehose delivery stream to an Amazon S3 bucket. An AWS Glue crawler is scheduled
to run every 8 hours to update the schema in the data catalog of the tables stored in the S3 bucket. Data
analysts analyze the data using Apache Spark SQL on Amazon EMR set up with AWS Glue Data Catalog as
the metastore. Data analysts say that, occasionally, the data they receive is stale. A data engineer needs to
provide access to the most up-to-date data.
Which solution meets these requirements?
A.
Create an external schema based on the AWS Glue Data Catalog on the existing Amazon Redshift
cluster to query new data in Amazon S3 with Amazon Redshift Spectrum.
B.
Use Amazon CloudWatch Events with the rate (1 hour) expression to execute the AWS Glue crawler
every hour.
C.
Using the AWS CLI, modify the execution schedule of the AWS Glue crawler from 8 hours to 1 minute.
D.
Run the AWS Glue crawler from an AWS Lambda function triggered by an S3:ObjectCreated:* event
notification on the S3 bucket.
Create an external schema based on the AWS Glue Data Catalog on the existing Amazon Redshift
cluster to query new data in Amazon S3 with Amazon Redshift Spectrum.
A media content company has a streaming playback application. The company wants to collect and analyze
the data to provide near-real-time feedback on playback issues. The company needs to consume this data and
return results within 30 seconds according to the service-level agreement (SLA). The company needs the
consumer to identify playback issues, such as quality during a specified timeframe. The data will be emitted as
JSON and may change schemas over time.
Which solution will allow the company to collect data for processing while meeting these requirements?
A.
Send the data to Amazon Kinesis Data Firehose with delivery to Amazon S3. Configure an S3 event
trigger an AWS Lambda function to process the data. The Lambda function will consume the data and
process it to identify potential playback issues. Persist the raw data to Amazon S3.
B.
Send the data to Amazon Managed Streaming for Kafka and configure an Amazon Kinesis Analytics for Java application as the consumer. The application will consume the data and process it to identify
potential playback issues. Persist the raw data to Amazon DynamoDB.
C.
Send the data to Amazon Kinesis Data Firehose with delivery to Amazon S3. Configure Amazon S3 to
trigger an event for AWS Lambda to process. The Lambda function will consume the data and process it to identify potential playback issues. Persist the raw data to Amazon DynamoDB.
D.
Send the data to Amazon Kinesis Data Streams and configure an Amazon Kinesis Analytics for Java
application as the consumer. The application will consume the data and process it to identify potential
playback issues. Persist the raw data to Amazon S3.
Send the data to Amazon Managed Streaming for Kafka and configure an Amazon Kinesis Analytics for Java application as the consumer. The application will consume the data and process it to identify
potential playback issues. Persist the raw data to Amazon DynamoDB.
A company is migrating its existing on-premises ETL jobs to Amazon EMR. The code consists of a series of jobs written in Java. The company needs to reduce overhead for the system administrators without changing the underlying code. Due to the sensitivity of the data, compliance requires that the company use root device volume encryption on all nodes in the cluster. Corporate standards require that environments be provisioned though AWS CloudFormation when possible.
Which solution satisfies these requirements?
A.
Install open-source Hadoop on Amazon EC2 instances with encrypted root device volumes. Configure
the cluster in the CloudFormation template.
B.
Use a CloudFormation template to launch an EMR cluster. In the configuration section of the cluster,
define a bootstrap action to enable TLS.
C.
Create a custom AMI with encrypted root device volumes. Configure Amazon EMR to use the custom
AMI using the CustomAmild property in the CloudFormation template
D.
Use a CloudFormation template to launch an EMR cluster. In the configuration section of the cluster,
define a bootstrap action to encrypt the root device volume of every node.
Create a custom AMI with encrypted root device volumes. Configure Amazon EMR to use the custom
AMI using the CustomAmild property in the CloudFormation template
A retail company is building its data warehouse solution using Amazon Redshift. As a part of that effort, the company is loading hundreds of files into the fact table created in its Amazon Redshift cluster. The company wants the solution to achieve the highest throughput and optimally use cluster resources when loading data into the company’s fact table. How should the company meet these requirements?
A.
Use multiple COPY commands to load the data into the Amazon Redshift cluster.
B.
Use S3DistCp to load multiple files into the Hadoop Distributed File System (HDFS) and use an HDFS
connector to ingest the data into the Amazon Redshift cluster.
C.
Use LOAD commands equal to the number of Amazon Redshift cluster nodes and load the data in
parallel into each node.
D.
Use a single COPY command to load the data into the Amazon Redshift cluster
Use S3DistCp to load multiple files into the Hadoop Distributed File System (HDFS) and use an HDFS
connector to ingest the data into the Amazon Redshift cluster.
A company wants to improve the data load time of a sales data dashboard. Data has been collected as .csv files
and stored within an Amazon S3 bucket that is partitioned by date. The data is then loaded to an Amazon
Redshift data warehouse for frequent analysis. The data volume is up to 500 GB per day.
Which solution will improve the data loading performance?
A.
Compress .csv files and use an INSERT statement to ingest data into Amazon Redshift.
B.
Split large .csv files, then use a COPY command to load data into Amazon Redshift.
C.
Use Amazon Kinesis Data Firehose to ingest data into Amazon Redshift.
D.
Load the .csv files in an unsorted key order and vacuum the table in Amazon Redshift
Use Amazon Kinesis Data Firehose to ingest data into Amazon Redshift.
Page 1 out of 13 Pages |