Which solution will meet these requirements with the LEAST operational overhead?
A. Use an Amazon Athena CREATE TABLE AS SELECT (CTAS) statement to create a table based on the transaction date from data in the central S3 bucket. Query the objects from the table.
B. Create a new S3 bucket for processed data. Set up S3 replication from the central S3 bucket to the new S3 bucket. Use S3 Object Lambda to query the objects based on transaction date.
C. Create a new S3 bucket for processed data. Use AWS Glue for Apache Spark to create a job to query the CSV objects based on transaction date. Configure the job to store the results in the new S3 bucket. Query the objects from the new S3 bucket.
D. Create a new S3 bucket for processed data. Use Amazon Data Firehose to transfer the data from the central S3 bucket to the new S3 bucket. Configure Firehose to run an AWS Lambda function to query the data based on transaction date.
Explanation:
Scenario:The ML engineer needs a low-overhead solution to query thousands of existing
and new CSV objects stored in Amazon S3 based on a transaction date.
Why Athena?
Serverless:Amazon Athena is a serverless query service that allows direct
querying of data stored in S3 using standard SQL, reducing operational overhead.
Ease of Use:By using the CTAS statement, the engineer can create a table with
optimized partitions based on the transaction date. Partitioning improves query
performance and minimizes costs by scanning only relevant data.
Low Operational Overhead:No need to manage or provision additional
infrastructure. Athena integrates seamlessly with S3, and CTAS simplifies table
creation and optimization.
Steps to Implement:
Organize Data in S3:Store CSV files in a bucket in a consistent format and
directory structure if possible.
Configure Athena:Use the AWS Management Console or Athena CLI to set up
Athena to point to the S3 bucket.
Run CTAS Statement:
CREATE TABLE processed_data
WITH (
format = 'PARQUET',
external_location = 's3://processed-bucket/',
partitioned_by = ARRAY['transaction_date']
) AS
SELECT *
FROM input_data;
This creates a new table with data partitioned by transaction date.
Query the Data:Use standard SQL queries to fetch data based on the transaction
date.
A company stores time-series data about user clicks in an Amazon S3 bucket. The raw
data consists of millions of rows of user activity every day. ML engineers access the data to
develop their ML models.
The ML engineers need to generate daily reports and analyze click trends over the past 3
days by using Amazon Athena. The company must retain the data for 30 days before
archiving the data.
Which solution will provide the HIGHEST performance for data retrieval?
Explanation: Partitioning the time-series data by date prefix in the S3 bucket significantly improves query performance in Amazon Athena by reducing the amount of data that needs to be scanned during queries. This allows the ML engineers to efficiently analyze trends over specific time periods, such as the past 3 days. Applying S3 Lifecycle policies to archive partitions older than 30 days to S3 Glacier FlexibleRetrieval ensures cost-effective data retention and storage management while maintaining high performance for recent data retrieval.
A company needs to host a custom ML model to perform forecast analysis. The forecast
analysis will occur with predictable and sustained load during the same 2-hour period every
day.
Multiple invocations during the analysis period will require quick responses. The company
needs AWS to manage the underlying infrastructure and any auto scaling activities.
Which solution will meet these requirements?
A. Schedule an Amazon SageMaker batch transform job by using AWS Lambda.
B. Configure an Auto Scaling group of Amazon EC2 instances to use scheduled scaling.
C. Use Amazon SageMaker Serverless Inference with provisioned concurrency.
D. Run the model on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster on Amazon EC2 with pod auto scaling.
Explanation: SageMaker Serverless Inference is ideal for workloads with predictable, intermittent demand. By enabling provisioned concurrency, the model can handle multiple invocations quickly during the high-demand 2-hour period. AWS manages the underlying infrastructure and scaling, ensuring the solution meets performance requirements with minimal operational overhead. This approach is cost-effective since it scales down when not in use.
A company runs an Amazon SageMaker domain in a public subnet of a newly created
VPC. The network is configured properly, and ML engineers can access the SageMaker
domain.
Recently, the company discovered suspicious traffic to the domain from a specific IP
address. The company needs to block traffic from the specific IP address.
Which update to the network configuration will meet this requirement?
A. Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.
B. Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network Ad for the subnet where the domain is located.
C. Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.
D. Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.
Explanation: Network ACLs (Access Control Lists) operate at the subnet level and allow for rules to explicitly deny traffic from specific IP addresses. By creating an inbound rule in the network ACL to deny traffic from the suspicious IP address, the company can block traffic to the Amazon SageMaker domain from that IP. This approach works because network ACLs are evaluated before traffic reaches the security groups, making them effective for blocking traffic at the subnet level.
A company uses Amazon SageMaker Studio to develop an ML model. The company has a
single SageMaker Studio domain. An ML engineer needs to implement a solution that
provides an automated alert when SageMaker compute costs reach a specific threshold.
Which solution will meet these requirements?
A. Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Cost Explorer to send an alert when the threshold is reached.
B. Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Budgets to send an alert when the threshold is reached.
C. Add resource tagging by editing each user's IAM profile. Configure AWS Cost Explorer to send an alert when the threshold is reached.
D. Add resource tagging by editing each user's IAM profile. Configure AWS Budgets to send an alert when the threshold is reached.
Explanation:
Adding resource tagging to the SageMaker user profile enables tracking and monitoring of
costs associated with specific SageMaker resources.
AWS Budgets allows setting thresholds and automated alerts for costs and usage, making
it the ideal service to notify the ML engineer when compute costs reach a specified limit.
This solution is efficient and integrates seamlessly with SageMaker and AWS cost
management tools.
Page 1 out of 14 Pages |