A company is building a data lake and needs to ingest data from a relational database that has time-series data. The company wants to use managed services to accomplish this. The process needs to be scheduled daily and bring incremental data only from the source into Amazon S3.
What is the MOST cost-effective approach to meet these requirements?
A.
Use AWS Glue to connect to the data source using JDBC Drivers. Ingest incremental records only using job bookmarks.
B.
Use AWS Glue to connect to the data source using JDBC Drivers. Store the last updated key in an
Amazon DynamoDB table and ingest the data using the updated key as a filter.
C.
Use AWS Glue to connect to the data source using JDBC Drivers and ingest the entire dataset. Use
appropriate Apache Spark libraries to compare the dataset, and find the delta.
D.
Use AWS Glue to connect to the data source using JDBC Drivers and ingest the full data. Use AWS
DataSync to ensure the delta only is written into Amazon S3.
Use AWS Glue to connect to the data source using JDBC Drivers. Store the last updated key in an
Amazon DynamoDB table and ingest the data using the updated key as a filter.
A financial company uses Apache Hive on Amazon EMR for ad-hoc queries. Users are complaining of
sluggish performance.A data analyst notes the following:
Approximately 90% of queries are submitted 1 hour after the market opens.
Hadoop Distributed File System (HDFS) utilization never exceeds 10%.
Which solution would help address the performance issues?
A.
Create instance fleet configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch CapacityRemainingGB metric. Create an automatic scaling policy to scale in the instance fleet based on the CloudWatch CapacityRemainingGB metric.
B.
Create instance fleet configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch YARNMemoryAvailablePercentage metric. Create an automatic scaling policy to scale in the instance fleet based on the CloudWatch
YARNMemoryAvailablePercentage metric.
C.
Create instance group configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch CapacityRemainingGB metric. Create an automatic scaling policy to scale in the instance groups based on the CloudWatch
CapacityRemainingGB metric.
D.
Create instance group configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch YARNMemoryAvailablePercentage metric. Create an automatic scaling policy to scale in the instance groups based on the CloudWatch
YARNMemoryAvailablePercentage metric.
Create instance group configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch CapacityRemainingGB metric. Create an automatic scaling policy to scale in the instance groups based on the CloudWatch
CapacityRemainingGB metric.
A technology company is creating a dashboard that will visualize and analyze time-sensitive data. The data will come in through Amazon Kinesis Data Firehose with the butter interval set to 60 seconds. The dashboard must support near-real-time data.
Which visualization solution will meet these requirements?
A.
Select Amazon Elasticsearch Service (Amazon ES) as the endpoint for Kinesis Data Firehose. Set up a Kibana dashboard using the data in Amazon ES with the desired analyses and visualizations.
B.
Select Amazon S3 as the endpoint for Kinesis Data Firehose. Read data into an Amazon SageMaker
Jupyter notebook and carry out the desired analyses and visualizations
C.
Select Amazon Redshift as the endpoint for Kinesis Data Firehose. Connect Amazon QuickSight with
SPICE to Amazon Redshift to create the desired analyses and visualizations
D.
Select Amazon S3 as the endpoint for Kinesis Data Firehose. Use AWS Glue to catalog the data and
Amazon Athena to query it. Connect Amazon QuickSight with SPICE to Athena to create the desired
analyses and visualizations.
Select Amazon Elasticsearch Service (Amazon ES) as the endpoint for Kinesis Data Firehose. Set up a Kibana dashboard using the data in Amazon ES with the desired analyses and visualizations.
A company’s marketing team has asked for help in identifying a high performing long-term storage service for
their data based on the following requirements:
The data size is approximately 32 TB uncompressed.
There is a low volume of single-row inserts each day.
There is a high volume of aggregation queries each day.
Multiple complex joins are performed.
The queries typically involve a small subset of the columns in a table.
Which storage service will provide the MOST performant solution?
A.
Amazon Aurora MySQL
B.
Amazon Redshift
C.
Amazon Neptune
D.
Amazon Elasticsearch
Amazon Redshift
A company has 1 million scanned documents stored as image files in Amazon S3. The documents contain
typewritten application forms with information including the applicant first name, applicant last name,
application date, application type, and application text. The company has developed a machine learning
algorithm to extract the metadata values from the scanned documents. The company wants to allow internal
data analysts to analyze and find applications using the applicant name, application date, or application text.
The original images should also be downloadable. Cost control is secondary to query performance.
Which solution organizes the images and metadata to drive insights while meeting the requirements?
A.
For each image, use object tags to add the metadata. Use Amazon S3 Select to retrieve the files based on the applicant name and application date
B.
Index the metadata and the Amazon S3 location of the image file in Amazon Elasticsearch Service.
Allow the data analysts to use Kibana to submit queries to the Elasticsearch cluster.
C.
Index the metadata and the Amazon S3 location of the image file in Amazon Elasticsearch Service.
Allow the data analysts to use Kibana to submit queries to the Elasticsearch cluster.
D.
Store the metadata and the Amazon S3 location of the image files in an Apache Parquet file in Amazon
S3, and define a table in the AWS Glue Data Catalog. Allow data analysts to use Amazon Athena to
submit custom queries.
For each image, use object tags to add the metadata. Use Amazon S3 Select to retrieve the files based on the applicant name and application date
Page 2 out of 13 Pages |
Previous |