Q106. You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week. You want to make the process more efficient by minimizing computation time and manual intervention. What should you do?
Z-score normalization is a technique that transforms the values of a numeric variable into standardized units, such that the mean is zero and the standard deviation is one. Z-score normalization can help to compare variables with different scales and ranges, and to reduce the effect of outliers and skewness. The formula for z-score normalization is:
z = (x – mu) / sigma
where x is the original value, mu is the mean of the variable, and sigma is the standard deviation of the variable.
Dataflow is a service that allows you to create and run data processing pipelines on Google Cloud. You can use Dataflow to preprocess raw data prior to model training and prediction, such as applying z-score normalization on data stored in BigQuery. However, using Dataflow for this task may not be the most efficient option, as it involves reading and writing data from and to BigQuery, which can be time-consuming and costly. Moreover, using Dataflow requires manual intervention to update the pipeline whenever new training data is added.
A more efficient way to perform z-score normalization on data stored in BigQuery is to translate the normalization algorithm into SQL and use it with BigQuery. BigQuery is a service that allows you to analyze large-scale and complex data using SQL queries. You can use BigQuery to perform z-score normalization on your data using SQL functions such as AVG(), STDDEV_POP(), and OVER(). For example, the following SQL query can normalize the values of a column called temperature in a table called weather:
SELECT (temperature – AVG(temperature) OVER ()) / STDDEV_POP(temperature) OVER () AS normalized_temperature FROM weather; By using SQL to perform z-score normalization on BigQuery, you can make the process more efficient by minimizing computation time and manual intervention. You can also leverage the scalability and performance of BigQuery to handle large and complex datasets. Therefore, translating the normalization algorithm into SQL for use with BigQuery is the best option for this use case.
Q107. You are training a deep learning model for semantic image segmentation with reduced training time. While using a Deep Learning VM Image, you receive the following error: The resource
‘projects/deeplearning-platforn/zones/europe-west4-c/acceleratorTypes/nvidia-tesla-k80’ was not found. What should you do?
The error message indicates that the selected GPU type (nvidia-tesla-k80) is not available in the selected region (europe-west4-c). This can happen when the GPU type is not supported in the region, or when the GPU quota is exhausted in the region. To avoid this error, you should ensure that the required GPU is available in the selected region before creating a Deep Learning VM Image. You can use the following steps to check the GPU availability and quota:
* To check the GPU availability, you can use the gcloud compute accelerator-types list command with the –filter flag to specify the GPU type and the region. For example, to check the availability of nvidia-tesla-k80 in europe-west4-c, you can run:
gcloud compute accelerator-types list –filter=”name=nvidia-tesla-k80 AND zone:europe-west4-c”
* If the command returns an empty result, it means that the GPU type is not supported in the region. You can either choose a different GPU type or a different region that supports the GPU type. You can use the same command without the –filter flag to list all the available GPU types and regions. For example, to list all the available GPU types in europe-west4-c, you can run:
gcloud compute accelerator-types list –filter=”zone:europe-west4-c”
* To check the GPU quota, you can use the gcloud compute regions describe command with the –format flag to specify the region and the quota metric. For example, to check the quota for nvidia-tesla-k80 in europe-west4-c, you can run:
gcloud compute regions describe europe-west4-c –format=”value(quotas.NVIDIA_K80_GPUS)”
* If the command returns a value of 0, it means that the GPU quota is exhausted in the region. You can either request more quota from Google Cloud or choose a different region that has enough quota for the GPU type.
References:
* Troubleshooting | Deep Learning VM Images | Google Cloud
* Checking GPU availability
* Checking GPU quota
Q109. You created an ML pipeline with multiple input parameters. You want to investigate the tradeoffs between different parameter combinations. The parameter options are
* input dataset
* Max tree depth of the boosted tree regressor
* Optimizer learning rate
You need to compare the pipeline performance of the different parameter combinations measured in F1 score, time to train and model complexity. You want your approach to be reproducible and track all pipeline runs on the same platform. What should you do?
The best option for investigating the tradeoffs between different parameter combinations is to create an experiment in Vertex AI Experiments, create a Vertex AI pipeline with a custom model training job as part of the pipeline, configure the pipeline’s parameters to include those you are investigating, and submit multiple runs to the same experiment using different values for the parameters. This option allows you to leverage the power and flexibility of Google Cloud to compare the pipeline performance of the different parameter combinations measured in F1 score, time to train, and model complexity. Vertex AI Experiments is a service that can track and compare the results of multiple machine learning runs. Vertex AI Experiments can record the metrics, parameters, and artifacts of each run, and display them in a dashboard for easy visualization and analysis. Vertex AI Experiments can also help users optimize the hyperparameters of their models by using different search algorithms, such as grid search, random search, or Bayesian optimization1. Vertex AI Pipelines is a service that can orchestrate machine learning workflows using Vertex AI. Vertex AI Pipelines can run preprocessing and training steps on custom Docker images, and evaluate, deploy, and monitor the machine learning model. A custom model training job is a type of pipeline step that can train a custom model by using a user-provided script or container. A custom model training job can accept pipeline parameters as inputs, which can be used to control the training logic or data source. By creating an experiment in Vertex AI Experiments, creating a Vertex AI pipeline with a custom model training job as part of the pipeline, configuring the pipeline’s parameters to include those you are investigating, and submitting multiple runs to the same experiment using different values for the parameters, you can create a reproducible and trackable approach to investigate the tradeoffs between different parameter combinations.
The other options are not as good as option D, for the following reasons:
* Option A: Using BigQuery ML to create a boosted tree regressor and use the hyperparameter tuning capability, configuring the hyperparameter syntax to select different input datasets, max tree depths, and optimizer learning rates, and choosing the grid search option would not be able to handle different input datasets as a hyperparameter, and would not be as flexible and scalable as using Vertex AI Experiments and Vertex AI Pipelines. BigQuery ML is a service that can create and train machine learning models by
* using SQL queries on BigQuery. BigQuery ML can perform hyperparameter tuning by using the ML.FORECAST or ML.PREDICT functions, and specifying the hyperparameters option. BigQuery ML can also use different search algorithms, such as grid search, random search, or Bayesian optimization, to find the optimal hyperparameters. However, BigQuery ML can only tune the hyperparameters that are related to the model architecture or training process, such as max tree depth or learning rate. BigQuery ML cannot tune the hyperparameters that are related to the data source, such as input dataset. Moreover, BigQuery ML is not designed to work with Vertex AI Experiments or Vertex AI Pipelines, which can provide more features and flexibility for tracking and orchestrating machine learning workflows2.
* Option B: Creating a Vertex AI pipeline with a custom model training job as part of the pipeline, configuring the pipeline’s parameters to include those you are investigating, and using the Bayesian optimization method with F1 score as the target to maximize in the custom training step would not be able to track and compare the results of multiple runs, and would require more skills and steps than using Vertex AI Experiments and Vertex AI Pipelines. Vertex AI Pipelines is a service that can orchestrate machine learning workflows using Vertex AI. Vertex AI Pipelines can run preprocessing and training steps on custom Docker images, and evaluate, deploy, and monitor the machine learning model.
A custom model training job is a type of pipeline step that can train a custom model by using a user-provided script or container. A custom model training job can accept pipeline parameters as inputs, which can be used to control the training logic or data source. However, using the Bayesian optimization method with F1 score as the target to maximize in the custom training step would require writing code, implementing the optimization algorithm, and defining the objective function. Moreover, this option would not be able to track and compare the results of multiple runs, as Vertex AI Pipelines does not have a built-in feature for recording and displaying the metrics, parameters, and artifacts of each run3.
* Option C: Creating a Vertex AI Workbench notebook for each of the different input datasets, running different local training jobs with different combinations of the max tree depth and optimizer learning rate parameters, and appending the results to a BigQuery table would not be able to track and compare the results of multiple runs on the same platform, and would require more skills and steps than using Vertex AI Experiments and Vertex AI Pipelines. Vertex AI Workbench is a service that provides an integrated development environment for data science and machine learning. Vertex AI Workbench allows users to create and run Jupyter notebooks on Google Cloud, and access various tools and libraries for data analysis and machine learning. However, creating a Vertex AI Workbench notebook for each of the different input datasets, running different local training jobs with different combinations of the max tree depth and optimizer learning rate parameters, and appending the results to a BigQuery table would require creating multiple notebooks, writing code, setting up local environments, connecting to BigQuery, loading and preprocessing the data, training and evaluating the model, and writing the results to a BigQuery table. Moreover, this option would not be ableto track and compare the results of multiple runs on the same platform, as BigQuery is a separate service from Vertex AI Workbench, and does not have a dashboard for visualizing and analyzing the metrics, parameters, and artifacts of each run4.
References:
* Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML Systems, Week 3: MLOps
* Google Cloud Professional Machine Learning Engineer Exam Guide, Section 1: Architecting low-code ML solutions, 1.1 Developing ML models by using BigQuery ML
* Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 3: Data Engineering for ML, Section 3.2: BigQuery for ML
* Vertex AI Experiments
* Vertex AI Pipelines
* BigQuery ML
* Vertex AI Workbench
Q111. You need to design a customized deep neural network in Keras that will predict customer purchases based on their purchase history. You want to explore model performance using multiple model architectures, store training data, and be able to compare the evaluation metrics in the same dashboard. What should you do?
Kubeflow Pipelines is a service that allows you to create and run machine learning workflows on Google Cloud using various features, model architectures, and hyperparameters. You can use Kubeflow Pipelines to scale up your workflows, leverage distributed training, and access specialized hardware such as GPUs and TPUs1. An experiment in Kubeflow Pipelines is a workspace where you can try different configurations of your pipelines and organize your runs into logical groups. You can use experiments to compare the performance of different models and track the evaluation metrics in the same dashboard2.
For the use case of designing a customized deep neural network in Keras that will predict customer purchases based on their purchase history, the best option is to create an experiment in Kubeflow Pipelines to organize multiple runs. This option allows you to explore model performance using multiple model architectures, store training data, and compare the evaluation metrics in the same dashboard. You can use Keras to build and train your deep neural network models, and then package them as pipeline components that can be reused and combined with other components. You can also use Kubeflow Pipelines SDK to define and submit your pipelines programmatically, and use Kubeflow Pipelines UI to monitor and manage your experiments.
Therefore, creating an experiment in Kubeflow Pipelines to organize multiple runs is the best option for this use case.
References:
* Kubeflow Pipelines documentation
* Experiment | Kubeflow
Q120. You work for a bank with strict data governance requirements. You recently implemented a custom model to detect fraudulent transactions You want your training code to download internal data by using an API endpoint hosted in your projects network You need the data to be accessed in the most secure way, while mitigating the risk of data exfiltration. What should you do?
The best option for accessing internal data in the most secure way, while mitigating the risk of data exfiltration, is to enable VPC Service Controls for peerings, and add Vertex AI to a service perimeter. This option allows you to leverage the power and simplicity of VPC Service Controls to isolate and protect your data and services on Google Cloud. VPC Service Controls is a service that can create a secure perimeter around your Google Cloud resources, such as BigQuery, Cloud Storage, and Vertex AI. VPC Service Controls can help you prevent unauthorized access and data exfiltration from your perimeter, and enforce fine-grained access policies based on context and identity. Peerings are connections that can allow traffic to flow between different networks. Peerings can help you connect your Google Cloud network with other Google Cloud networks or external networks, and enable communication between your resources and services. By enabling VPC Service Controls for peerings, you can allow your training code to download internal data by using an API endpoint hosted in your project’s network, and restrict the data transfer to only authorized networks and services. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex AI can support various types of models, such as linear regression, logistic regression, k-means clustering, matrix factorization, and deep neural networks. Vertex AI can also provide various tools and services for data analysis, model development, model deployment, model monitoring, and model governance. By adding Vertex AI to a service perimeter, you can isolate and protect your Vertex AI resources, such as models, endpoints, pipelines, and feature store, and prevent data exfiltration from your perimeter1.
The other options are not as good as option A, for the following reasons:
* Option B: Creating a Cloud Run endpoint as a proxy to the data, and using Identity and Access Management (IAM) authentication to secure access to the endpoint from the training job would require more skills and steps than enabling VPC Service Controls for peerings, and adding Vertex AI to a service perimeter. Cloud Run is a service that can run your stateless containers on a fully managed environment or on your own Google Kubernetes Engine cluster. Cloud Run can help you deploy and scale your containerized applications quickly and easily, and pay only for the resources you use. A Cloud Run endpoint is a URL that can expose your containerized application to the internet or to other Google Cloud services. A Cloud Run endpoint can help you access and invoke your application from anywhere, and handle the load balancing and traffic routing. A proxy is a server that can act as an intermediary between a client and a target server. A proxy can help you modify, filter, or redirect the requests and responses between the client and the target server, and provide additional functionality or security. IAM is a service that can manage access control for Google Cloud resources. IAM can help you define who (identity) has what access (role) to which resource, and enforce the access policies. By creating a Cloud Run endpoint as a proxy to the data, and using IAM authentication to secure access to the endpoint from the training job, you can access internal data by using an API endpoint hosted in your project’s network, and restrict the data access to only authorized identities and roles. However, creating a Cloud Run endpoint as a proxy to the data, and using IAM authentication to secure access to the endpoint from the training job would require more skills and steps than enabling VPC Service Controls for peerings, and adding Vertex AI to a service perimeter. You would need to write code, create and configure the Cloud Run endpoint, implement the proxy logic, deploy and monitor the Cloud Run endpoint, and set up the IAM policies. Moreover, this option would not prevent data exfiltration from your network, as the Cloud Run endpoint can be accessed from outside your network2.
* Option C: Configuring VPC Peering with Vertex AI and specifying the network of the training job would not allow you to access internal data by using an API endpoint hosted in your project’s network,
* and could cause errors or poor performance. VPC Peering is a service that can create a peering connection between two VPC networks. VPC Peering can help you connect your Google Cloud network with another Google Cloud network or an external network, and enable communication between your resources and services. By configuring VPC Peering with Vertex AI and specifying the network of the training job, you can allow your training code to access Vertex AI resources, such as models, endpoints, pipelines, and feature store, and use the same network for the training job. However, configuring VPC Peering with Vertex AI and specifying the network of the training job would not allow you to access internal data by using an API endpoint hosted in your project’s network, and could cause errors or poor performance. You would need to write code, create and configure the VPC Peering connection, and specify the network of the training job. Moreover, this option would not isolate and protect your data and services on Google Cloud, as the VPC Peering connection can expose your network to other networks and services3.
* Option D: Downloading the data to a Cloud Storage bucket before calling the training job would not allow you to access internal data by using an API endpoint hosted in your project’s network, and could increase the complexity and cost of the data access. Cloud Storage is a service that can store and manage your data on Google Cloud. Cloud Storage can help you upload and organize your data, and track the data versions and metadata. A Cloud Storage bucket is a container that can hold your data on Cloud Storage. A Cloud Storage bucket can help you store and access your data from anywhere, and provide various storage classes and options. By downloading the data to a Cloud Storage bucket before calling the training job, you can access the data from Cloud Storage, and use it as the input for the training job.
However, downloading the data to a Cloud Storage bucket before calling the training job would not allow you to access internal data by using an API endpoint hosted in your project’s network, and could increase the complexity and cost of the data access. You would need to write code, create and configure the Cloud Storage bucket, download the data to the Cloud Storage bucket, and call the training job. Moreover, this option would create an intermediate data source on Cloud Storage, which can increase the storage and transfer costs, and expose the data to unauthorized access or data exfiltration4.
References:
* Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML Systems, Week 1: Data Engineering
* Google Cloud Professional Machine Learning Engineer Exam Guide, Section 1: Framing ML problems,
1.2 Defining data needs
* Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 2: Data Engineering, Section 2.2: Defining Data Needs
* VPC Service Controls
* Cloud Run
* VPC Peering
* Cloud Storage