Google Cloud Platform - freeCodeCamp.org

How to Use Google Dataproc – Example with PySpark and Jupyter Notebook

Sameer Shukla — Tue, 03 May 2022 15:14:31 +0000

In this article, I'll explain what Dataproc is and how it works.

Dataproc is a Google Cloud Platform managed service for Spark and Hadoop which helps you with Big Data Processing, ETL, and Machine Learning. It provides a Hadoop cluster and supports Hadoop ecosystems tools like Flink, Hive, Presto, Pig, and Spark.

Dataproc is an auto-scaling cluster which manages logging, monitoring, cluster creation of your choice and job orchestration. You'll need to manually provision the cluster, but once the cluster is provisioned you can submit jobs to Spark, Flink, Presto, and Hadoop.

Dataproc has implicit integration with other GCP products like Compute Engine, Cloud Storage, Bigtable, BigQuery, Cloud Monitoring, and so on. The jobs supported by Dataproc are MapReduce, Spark, PySpark, SparkSQL, SparkR, Hive and Pig.

Apart from that, Dataproc allows native integration with Jupyter Notebooks as well, which we'll cover later in this article.

In the article, we are going to cover:

Dataproc cluster types and how to set Dataproc up
How to submit a PySpark job to Dataproc
How to create a Notebook instance and execute PySpark jobs through Jupyter Notebook.

How to Create a Dataproc Cluster

Dataproc has three cluster types:

Standard
Single Node
High Availability

The Standard cluster can consist of 1 master and N worker nodes. The Single Node has only 1 master and 0 worker nodes. For production purposes, you should use the High Availability cluster which has 3 master and N worker nodes.

For our learning purposes, a single node cluster is sufficient which has only 1 master Node.

Creating Dataproc clusters in GCP is straightforward. First, we'll need to enable Dataproc, and then we'll be able to create the cluster.

Start Dataproc cluster creation

When you click "Create Cluster", GCP gives you the option to select Cluster Type, Name of Cluster, Location, Auto-Scaling Options, and more.

Parameters required for Cluster

Since we've selected the Single Node Cluster option, this means that auto-scaling is disabled as the cluster consists of only 1 master node.

The Configure Nodes option allows us to select the type of machine family like Compute Optimized, GPU and General-Purpose.

In this tutorial, we'll be using the General-Purpose machine option. Through this, you can select Machine Type, Primary Disk Size, and Disk-Type options.

The Machine Type we're going to select is n1-standard-2 which has 2 CPU’s and 7.5 GB of memory. The Primary Disk size is 100GB which is sufficient for our demo purposes here.

Master Node Configuration

We've selected the cluster type of Single Node, which is why the configuration consists only of a master node. If you select any other Cluster Type, then you'll also need to configure the master node and worker nodes.

From the Customise Cluster option, select the default network configuration:

Use the option "Scheduled Deletion" in case no cluster is required at a specified future time (or say after a few hours, days, or minutes).

Schedule Deleting Setting

Here, we've set "Timeout" to be 2 hours, so the cluster will be automatically deleted after 2 hours.

We'll use the default security option which is a Google-managed encryption key. When you click "Create", it'll start creating the cluster.

You can also create the cluster using the ‘gcloud’ command which you'll find on the ‘EQUIVALENT COMMAND LINE’ option as shown in image below.

And you can create a cluster using a POST request which you'll find in the ‘Equivalent REST’ option.

gcloud and REST option for Cluster creation

After few minutes the cluster with 1 master node will be ready for use.

Cluster Up and Running

You can find details about the VM instances if you click on "Cluster Name":

How to Submit a PySpark Job

Let’s briefly understand how a PySpark Job works before submitting one to Dataproc. It’s a simple job of identifying the distinct elements from the list containing duplicate elements.

#! /usr/bin/python

import pyspark

#Create List
numbers = [1,2,1,2,3,4,4,6]

#SparkContext
sc = pyspark.SparkContext()

# Creating RDD using parallelize method of SparkContext
rdd = sc.parallelize(numbers)

#Returning distinct elements from RDD
distinct_numbers = rdd.distinct().collect()

#Print
print('Distinct Numbers:', distinct_numbers)

Upload the .py file to the GCS bucket, and we'll need its reference while configuring the PySpark Job.

Job GCS Location

Submitting jobs in Dataproc is straightforward. You just need to select “Submit Job” option:

Job Submission

For submitting a Job, you'll need to provide the Job ID which is the name of the job, the region, the cluster name (which is going to be the name of cluster, "first-data-proc-cluster"), and the job type which is going to be PySpark.

Parameters required for Job Submission

You can get the Python file location from the GCS bucket where the Python file is uploaded – you'll find it at gsutil URI.

No other additional parameters are required, and we can now submit the job:

After execution, you should be able to find the distinct numbers in the logs:

Logs

How to Create a Jupyter Notebook Instance

You can associate a notebook instance with Dataproc Hub. To do that, GCP provisions a cluster for each Notebook Instance. We can execute PySpark and SparkR types of jobs from the notebook.

To create a notebook, use the "Workbench" option like below:

Make sure you go through the usual configurations like Notebook Name, Region, Environment (Dataproc Hub), and Machine Configuration (we're using 2 vCPUs with 7.5 GB RAM). We're using the default Network settings, and in the Permission section, select the "Service account" option.

Parameters required for Notebook Cluster Creation

Click Create:

Notebook Cluster Up & Running

The "OPEN JUPYTYERLAB" option allows users to specify the cluster options and zone for their notebook.

Once the provisioning is completed, the Notebook gives you a few kernel options:

Click on PySpark which will allow you to execute jobs through the Notebook.

A SparkContext instance will already be available, so you don't need to explicitly create SparkContext. Apart from that, the program remains the same.

Code snapshot on Notebook

Conclusion

Working on Spark and Hadoop becomes much easier when you're using GCP Dataproc. The best part is that you can create a notebook cluster which makes development simpler.

Google Cloud Platform Tutorial: From Zero to Hero with GCP

freeCodeCamp — Fri, 09 Oct 2020 15:24:46 +0000

By Sergio Fuentes Navarro

Do you have the knowledge and skills to design a mobile gaming analytics platform that collects, stores, and analyzes large amounts of bulk and real-time data?

Well, after reading this article, you will.

I aim to take you from zero to hero in Google Cloud Platform (GCP) in just one article. I will show you how to:

Get started with a GCP account for free
Reduce costs in your GCP infrastructure
Organize your resources
Automate the creation and configuration of your resources
Manage operations: logging, monitoring, tracing, and so on.
Store your data
Deploy your applications and services
Create networks in GCP and connect them with your on-premise networks
Work with Big Data, AI, and Machine Learning
Secure your resources

Once I have explained all the topics in this list, I will share with you a solution to the system I described.

If you do not understand some parts of it, you can go back to the relevant sections. And if that is not enough, visit the links to the documentation that I have provided.

Are you up for a challenge? I have selected a few questions from old GCP Professional Certification exams. They will test your understanding of the concepts explained in this article.

I recommend trying to solve both the design and the questions on your own, going back to the guide if necessary. Once you have an answer, compare it to the proposed solution.

Try to go beyond what you are reading and ask yourself what would happen if requirement X changed:

Batch vs streaming data
Regional vs global solution
A small number of users vs huge volume of users
Latency is not a problem vs real-time applications

And any other scenarios you can think of.

At the end of the day, you are not paid just for what you know but for your thought process and the decisions you make. That is why it is vitally important that you exercise this skill.

At the end of the article, I'll provide more resources and next steps if you want to continue learning about GCP.

How to get started with Google Cloud Platform for free

GCP currently offers a 3 month free trial with $300 US dollars of free credit. You can use it to get started, play around with GCP, and run experiments to decide if it is the right option for you.

You will NOT be charged at the end of your trial. You will be notified and your services will stop running unless you decide to upgrade your plan.

I strongly recommend using this trial to practice. To learn, you have to try things on your own, face problems, break things, and fix them. It doesn't matter how good this guide is (or the official documentation for that matter) if you do not try things out.

Why would you migrate your services to Google Cloud Platform?

Consuming resources from GCP, like storage or computing power, provides the following benefits:

No need to spend a lot of money upfront for hardware
No need to upgrade your hardware and migrate your data and services every few years
Ability to scale to adjust to the demand, paying only for the resources you consume
Create proof of concepts quickly since provisioning resources can be done very fast
Secure and manage your APIs
Not just infrastructure: data analytics and machine learning services are available in GCP

GCP makes it easy to experiment and use the resources you need in an economical way.

How to optimize your VMs to reduce costs in GCP

In general, you will only be charged for the time your instances are running. Google will not charge you for stopped instances. However, if they consume resources, like disks or reserved IPs, you might incur charges.

Here are some ways you can optimize the cost of running your applications in GCP.

Custom Machine Types

GCP provides different machine families with predefined amounts of RAM and CPUs:

General-purpose. Offers the best price-performance ratio for a variety of workloads.
Memory-optimized. Ideal for memory-intensive workloads. They offer more memory per core than other machine types.
Compute-optimized. They offer the highest performance per core and and are optimized for compute-intensive workloads
Shared-core. These machine types timeshare a physical core. This can be a cost-effective method for running small applications.

Besides, you can create your custom machine with the amount of RAM and CPUs you need.

Preemptible VM's

You can use preemptible virtual machines to save up to 80% of your costs. They are ideal for fault-tolerant, non-critical applications. You can save the progress of your job in a persistent disk using a shut-down script to continue where you left off.

Google may stop your instances at any time (with a 30-second warning) and will always stop them after 24 hours.

To reduce the chances of getting your VMs shut down, Google recommends:

Using many small instances and
Running your jobs during off-peak times.

Note: Start-up and shut-down scripts apply to non-preemptible VMS as well. You can use them the control the behavior of your machine when it starts or stops. For instance, to install software, download data, or backup logs.

There are two options to define these scripts:

When you are creating your instance in the Google Console, there is a field to paste your code.
Using the metadata server URL to point your instance to a script stored in Google Cloud Storage.

This latter is preferred because it is easier to create many instances and to manage the script.

Sustained Use Discounts

The longer you use your virtual machines (and Cloud SQL instances), the higher the discount - up to 30%. Google does this automatically for you.

Committed Use Discounts

You can get up to 57% discount if you commit to a certain amount of CPU and RAM resources for a period of 1 to 3 years.

To estimate your costs, use the Price Calculator. This helps prevent any surprises with your bills and create budget alerts.

How to manage resources in GCP

In this section, I will explain how you can manage and administer your Google Cloud resources.

Resource Hierarchy

There are four types of resources that can be managed through Resource Manager:

The organization resource. It is the root node in the resource hierarchy. It represents an organization, for example, a company.
The projects resource. For example, to separate projects for production and development environments. They are required to create resources.
The folder resource. They provide an extra level of project isolation. For example, creating a folder for each department in a company.
Resources. Virtual machines, database instances, load balancers, and so on.

There are quotas that limit the maximum number of resources you can create to prevent unexpected spikes in billing. However, most quotas can be increased by opening a support ticket.

Resources in GCP follow a hierarchy via a parent/child relationship, similar to a traditional file system, where:

Permissions are inherited as we descend the hierarchy. For example, permissions granted and the organization level will be propagated to all the folders and projects.
More permissive parent policies always overrule more restrictive child policies.

This hierarchical organization helps you manage common aspects of your resources, such as access control and configuration settings.

You can create super admin accounts that have access to every resource in your organization. Since they are very powerful, make sure you follow Google's best practices.

Labels

Labels are key-value pairs you can use to organize your resources in GCP. Once you attach a label to a resource (for instance, to a virtual machine), you can filter based on that label. This is useful also to break down your bills by labels.

Some common use cases:

Environments: prod, test, and so on.
Team or product owners
Components: backend, frontend, and so on.
Resource state: active, archive, and so on.

Labels vs Network tags

These two similar concepts seem to generate some confusion. I have summarized the differences in this table:

Labels	Network tags
Applied to any GCP resource	Applied only for VPC resources
Just organize resources	Affect how resources work (ex: through application of firewall rules)

Cloud IAM

Simply put, Cloud IAM controls who can do what on which resource. A resource can be a virtual machine, a database instance, a user, and so on.

It is important to notice that permissions are not directly assigned to users. Instead, they are bundled into roles, which are assigned to members. A policy is a collection of one or more bindings of a set of members to a role.

Identities

In a GCP project, identities are represented by Google accounts, created outside of GCP, and defined by an email address (not necessarily @gmail.com). There are different types:

Google accounts*. To represent people: engineers, administrators, and so on.
Service accounts. Used to identify non-human users: applications, services, virtual machines, and others. The authentication process is defined by account keys, which can be managed by Google or by users (only for user-created service accounts).
Google Groups are a collection of Google and service accounts.
G Suite Domain* is the type of account you can use to identify organizations. If your organization is already using Active Directory, it can be synchronized with Cloud IAM using Cloud Identity.
allAuthenticatedUsers. To represent any authenticated user in GCP.
allUsers. To represent anyone, authenticated or not.

Regarding service accounts, some of Google's best practices include:

Not using the default service account
Applying the Principle of Least Privilege. For instance:
Restrict who can act as a service account
Grant only the minimum set of permissions that the account needs
Create service accounts for each service, only with the permissions the account needs

Roles

A role is a collection of permissions. There are three types of roles:

Primitive. Original GCP roles that apply to the entire project. There are three concentric roles: Viewer, Editor, and Owner. Editor contains Viewer and Owner contains Editor.
Predefined. Provides access to specific services, for example, storage.admin.
Custom. lets you create your own roles, combining the specific permissions you need.

When assigning roles, follow the principle of least privilege, too. In general, prefer predefined over primitive roles.

Cloud Deployment Manager

Cloud Deployment Manager automates repeatable tasks like provisioning, configuration, and deployments for any number of machines.

It is Google's Infrastructure as Code service, similar to Terraform - although you can deploy only GCP resources. It is used by GCP Marketplace to create pre-configured deployments.

You define your configuration in YAML files, listing the resources (created through API calls) you want to create and their properties. Resources are defined by their name (VM-1, disk-1), type (compute.v1.disk, compute.v1.instance) and properties (zone:europe-west4, boot:false).

To increase performance, resources are deployed in parallel. Therefore you need to specify any dependencies using references. For instance, do not create virtual machine VM-1 until the persistent disk disk-1 has been created. In contrast, Terraform would figure out the dependencies on its own.

You can modularize your configuration files using templates so that they can be independently updated and shared. Templates can be defined in Python or Jinja2. The contents of your templates will be inlined in the configuration file that references them.

Deployment Manager will create a manifest containing your original configuration, any templates you have imported, and the expanded list of all the resources you want to create.

Cloud Operations (formerly Stackdriver)

Operations provide a set of tools for monitoring, logging, debugging, error reporting, profiling, and tracing of resources in GCP (AWS and even on-premise).

Cloud Logging

Cloud Logging is GCP's centralized solution for real-time log management. For each of your projects, it allows you to store, search, analyze, monitor, and alert on logging data:

By default, data will be stored for a certain period of time. The retention period varies depending on the type of log. You cannot retrieve logs after they have passed this retention period.
Logs can be exported for different purposes. To do this, you create a sink, which is composed of a filter (to select what you want to log) and a destination: Google Cloud Storage (GCS) for long term retention, BigQuery for analytics, or Pub/Sub to stream it into other applications.
You can create log-based metrics in Cloud Monitoring and even get alerted when something goes wrong.

Logs are a named collection of log entries. Log entries record status or events and includes the name of its log, for example, compute.googleapis.com/activity. There are two main types of logs:

First, User Logs:

These are generated by your applications and services.
They are written to Cloud Logging using the Cloud Logging API, client libraries, or logging agents installed on your virtual machines.
They stream logs from common third-party applications like MySQL, MongoDB, or Tomcat.

Second, Security logs, divided into:

Audit logs, for administrative changes, system events, and data access to your resources. For example, who created a particular database instance or to log a live migration. Data access logs must be explicitly enabled and may incur additional charges. The rest are enabled by default, cannot be disabled, and are free of charges.
Access Transparency logs, for actions taken by Google staff when they access your resources for example to investigate an issue you reported to the support team.

VPC Flow Logs

They are specific to VPC networks (which I will introduce later). VPC flow logs record a sample of network flows sent from and received by VM instances, which can be later access in Cloud Logging.

They can be used to monitor network performance, usage, forensics, real-time security analysis, and expense optimization.

Note: you may want to log your billing data for analysis. In this case, you do not create a sink. You can directly export your reports to BigQuery.

Cloud Monitoring

Cloud Monitoring lets you monitor the performance of your applications and infrastructure, visualize it in dashboards, create uptime checks to detect resources that are down and alert you based on these checks so that you can fix problems in your environment. You can monitor resources in GCP, AWS, and even on-premise.

It is recommended to create a separate project for Cloud Monitoring since it can keep track of resources across multiple projects.

Also, it is recommended to install a monitoring agent in your virtual machines to send application metrics (including many third-party applications) to Cloud Monitoring. Otherwise, Cloud Monitoring will only display CPU, disk traffic, network traffic, and uptime metrics.

Alerts

To receive alerts, you must declare an alerting policy. An alerting policy defines the conditions under which a service is considered unhealthy. When the conditions are met, a new incident will be created and notifications will be sent (via email, Slack, SMS, PagerDuty, etc).

A policy belongs to an individual workspace, which can contain a maximum of 500 policies.

Trace

Trace helps find bottlenecks in your services. You can use this service to figure out how long it takes to handle a request, which microservice takes the longest to respond, where to focus to reduce the overall latency, and so on.

It is enabled by default for applications running on Google App Engine (GAE) - Standard environment - but can be used for applications running on GCE, GKE, and Google App Engine Flexible.

Error Reporting

Error Reporting will aggregate and display errors produced in services written in Go, Java, Node.js, PHP, Python, Ruby, or .NET. running on GCE, GKE, GAP, Cloud Functions, or Cloud Run.

Debug

Debug lets you inspect the application's state without stopping your service. Currently supported for Java, Go, Node.js and Python. It is automatically integrated with GAE but can be used on GCE, GKE, and Cloud Run.

Profile

Profiler that continuously gathers CPU usage and memory-allocation information from your applications. To use it, you need to install a profiling agent.

How to store data in GCP

In this section, I will cover both Google Cloud Storage (for any type of data, including files, images, video, and so on), the different database services available in GCP, and how to decide which storage option works best for you.

Google Cloud Storage (GCS)

GCS is Google's storage service for unstructured data: pictures, videos, files, scripts, database backups, and so on.

Objects are placed in buckets, from which they inherit permissions and storage classes.

Storage classes provide different SLAs for storing your data to minimize costs for your use case. A bucket's storage class can be changed (under some restrictions), but it will affect new objects added to the bucket only.

In addition to Google's console, you can interact with GCS from your command line, using gsutil. You can use specify:

Multithreaded updates when you need to upload a large number of small files. The command looks like gsutil -m cp files gs://my-bucket)
Parallel updates when you need to upload large files. For more details and restrictions, visit this link.

Another option to upload files to GCS is Storage Transfer Service (STS), a service that imports data to a GCS bucket from:

An AWS S3 bucket
A resource that can be accessed through HTTP(S)
Another Google Cloud Storage bucket

If you need to upload huge amounts of data (from hundreds of terabytes up to one petabyte) consider Data Transfer Appliance: ship your data to a Google facility. Once they have uploaded the data to GCS, the process of data rehydration reconstitutes the files so that they can be accessed again.

Object lifecycle management

You can define rules that determine what will happen to an object (will it be archived or deleted) when a certain condition is met.

For example, you could define a policy to automatically change the storage class of an object from Standard to Nearline after 30 days and to delete it after 180 days.

This is the way a rule can be defined:

{
   "lifecycle":{
      "rule":[
         {
            "action":{
               "type":"Delete"
            },
            "condition":{
               "age":30,
               "isLive":true
            }
         },
         {
            "action":{
               "type":"Delete"
            },
            "condition":{
               "numNewerVersions":2
            }
         },
         {
            "action":{
               "type":"Delete"
            },
            "condition":{
               "age":180,
               "isLive":false
            }
         }
      ]
   }
}

It will be applied through gsutils or a REST API call. Rules can be created also through the Google Console.

Permissions in GCS

In addition to IAM roles, you can use Access Control Lists (ACLs) to manage access to the resources in a bucket.

Use IAM roles when possible, but remember that ACLs grant access to buckets and individual objects, while IAM roles are project or bucket wide permissions. Both methods work in tandem.

To grant temporary access to users outside of GCP, use Signed URLs.

Bucket lock

Bucket locks allow you to enforce a minimum retention period for objects in a bucket. You may need this for auditing or legal reasons.

Once a bucket is locked, it cannot be unlocked. To remove, you need to first remove all objects in the bucket, which you can only do after they all have reached the retention period specified by the retention policy. Only then, you can delete the bucket.

You can include the retention policy when you are creating the bucket or add a retention policy to an existing bucket (it retroactively applies to existing objects in the bucket too).

Fun fact: the maximum retention period is 100 years.

Relational Managed Databases in GCP

Cloud SQL and Cloud Spanner are two managed database services available in GCP. If you do not want to deal with all the work necessary to maintain a database online, they are a great option. You can always spin a virtual machine and manage your own database.

Cloud SQL

Cloud SQL provides access to a managed MySQL or PostgreSQL database instance in GCP. Each instance is limited to a single region and has a maximum capacity of 30 TB.

Google will take care of the installation, backups, scaling, monitoring, failover, and read replicas. For availability reasons, replicas must be defined in the same region but a different zone from the primary instances.

Data can be easily imported (first uploading the data to Google Cloud Storage and then to the instance) and exported using SQL dumps or CSV files format. Data can be compressed to reduce costs (you can directly import .gz files). For "lift and shift" migrations, this is a great option.

If you need global availability or more capacity, consider using Cloud Spanner.

Cloud Spanner

Cloud Spanner is globally available and can scale (horizontally) very well.

These two features make it capable of supporting different use cases than Cloud SQL and more expensive too. Cloud Spanner is not an option for lift and shift migrations.

NoSQL Managed Databases in GCP

Similarly, GCP provides two managed NoSQL databases, Bigtable and Datastore, as well as an in-memory database service, Memorystore.

Datastore

Datastore is a completely no-ops, highly-scalable document database ideal for web and mobile applications: game states, product catalogs, real-time inventory, and so on. It's great for:

User profiles - mobile apps
Game save states

By default, Datastore has a built-in index that improves performance on simple queries. You can create your own indices, called composite indexes, defined in YAML format.

If you need extreme throughput (huge number of reads/writes per second), use Bigtable instead.

Bigtable

Bigtable is a NoSQL database ideal for analytical workloads where you can expect a very high volume of writes, reads in the milliseconds, and the ability to store terabytes to petabytes of information. It's great for:

Financial analysis
IoT data
Marketing data

Bigtable requires the creation and configuration of your nodes (as opposed to the fully-managed Datastore or BigQuery). You can add or remove nodes to your cluster with zero downtime. The simplest way to interact with Bigtable is the command-line tool cbt.

Bigtable's performance will depend on the design of your database schema.

You can only define one key per row and must keep all the information associated with an entity in the same row. Think of it as a hash table.
Tables are sparse: if there is no information associated with a column, no space is required.
To make reads more efficient, try to store related entities in adjacent rows.

Since this topic is worth an article on its own, I recommend you read the documentation.

Memorystore

It provides a managed version of Redis and Memcache (in-memory databases), resulting in very fast performance. Instances are regional, like Cloud SQL, and have a capacity of up to 300 GB.

How to choose your database

Google loves decision trees. This one will help you choose the right database your your projects. For unstructured data consider GCS or process it using Dataflow (discussed later).

How does networking work in GCP?

Virtual Private Cloud (VPC) - see the docs here

You can use the same network infrastructure that Google uses to run its services: YouTube, Search, Maps, Gmail, Drive, and so on.

Google infrastructure is divided into:

Regions: Independent geographical areas, at least 100 miles apart from each other, where Google hosts datacenters. It consists of 3 or more zones. For example, us-central1.
Zones: Multiple individual datacenters within a region. For example, us-central1-a.
Edge Points of Presence: points of connection between Google's network and the rest of the internet.

GCP infrastructure is designed in a way that all traffic between regions travels through a global private network, resulting in better security and performance.

On top of this infrastructure, you can build networks for your resources, Virtual Private Clouds. They are software-defined networks, where all the traditional network concepts apply:

Subnets. Logical partitions of a network defined using CIDR notation. They belong to one region only but can span multiple zones. If you have multiple subnets (including your on-premise networks if they are connected to GCP), make sure the CIDR ranges do not overlap.
IP addresses. Can be internal (for private communication within GCP) or external (to communicate with the rest of the internet). For external IP addresses, you can use an ephemeral IP or pay for a static IP. In general, you need an external IP address to connect to GCP services. However, in some cases, you can configure private access for instances that only have an internal IP.
Firewalls rules, to allow or deny traffic to your virtual machines, both incoming (ingress) and outgoing (egress). By default, all ingress traffic is denied and all egress traffic is allowed. Firewall rules are defined at the VPC level but they apply to individual instances or groups of instances using network tags or IP ranges.
Common issue: If you know your VMs are working correctly but you cannot access them through HTTP(s) or cannot SSH into them, have a look at your firewall rules.

You can create hybrid networks connecting your on-premise infrastructure to your VPC.

When you create a project, a default network will be created with subnets in each region (auto mode). You can delete this network, but you need to create at least one network to be able to create virtual machines.

You can also create your custom networks, where no subnets are created by default and you have full control over subnet creation (custom mode).

The main goal of a VPC is the separation of network resources. A GCP project is a way to organize resources and manage permissions.

Users of project A need permissions to access resources in project B. All users can access any VPC defined in any project to which they belong. Within the same VPC, resources in subnet 1 need to be granted access to resources in subnet 2.

In terms of IAM roles, there is a distinction between who can create network resources (Network admin, to create subnets, virtual machines, and so on) and who is responsible for the security of the resources (Security Admin, to create firewall rules, SSL certificates, and so on).

The Compute Instance Admin role combines both roles.

As usual, there are quotas and limits to what you can do in a VPC, amongst them:

The maximum number of VPCs in a project.
The maximum number of virtual machines per VPC.
No broadcast or multicast.
VPCs cannot use IPv6 to communicate internally, although global load balancers support IPv6 traffic.

Shared VPC

Shared VPCs are a way to share resources between different projects within the same organization. This allows you to control billing and manage access to the resources in different projects, following the principle of least privilege. Otherwise you'd have to put all the resources in a single project.

To design a shared VPC, projects fall under three categories:

Host project. It is the project that hosts the common resources. There can only be one host project.
Service project: Projects that can access the resources in the host project. A project cannot be both host and service.
Standalone project. Any project that does not make use of the shared VPC.

You will only be able to communicate between resources created after you define your host and service projects. Any existing resources before this will not be part of the shared VPC.

VPC Network Peering

Shared VPCs can be used when all the projects belong to the same organization. However, if:

You need private communication across VPCs.
The VPCs are in projects that may belong to different organizations.
Want decentralized control, that is, no need to define host projects, server projects, and so on.
Want to reuse existing resources.

VPC Network peering is the right solution.

In the next section, I will discuss how to connect your VPC(s) with networks outside of GCP.

How to connect on-premise and GCP infrastructures

There are three options to connect your on-premise infrastructure to GCP:

Cloud VPN
Cloud Interconnect
Cloud Peering

Each of them with different capabilities, use cases, and prices that I will describe in the following sections.

Cloud VPN

With Cloud VPN, your traffic travels through the public internet over an encrypted tunnel. Each tunnel has a maximum capacity of 3 Gb per second and you can use a maximum of 8 for better performance. These two characteristics make VPN the cheapest option.

You can define two types of routes between your VPC and your on-premise networks:

Static routes. You have to manually define and update them, for example when you add a new subnet. This is not the preferred option.
Dynamic routes. Routes are automatically handled (defined and updated) for you using Cloud Router. This is the preferred option when BGP is available.

Your traffic gets encrypted and decrypted by VPN Gateways (in GCP, they are regional resources).

To have a more robust connection, consider using multiple VPN gateways and tunnels. In case of failure, this redundancy guarantees that traffic will still flow.

Cloud Interconnect

With Cloud VPN, traffic travels through the public internet. With Cloud Interconnect, there is a direct physical connection between your on-premises network and your VPC. This option will be more expensive but will provide the best performance.

There are two types of interconnect available, depending on how you want your connection to GCP to materialize:

Dedicated interconnect. There is "a direct cable" connecting your infrastructure and GCP. This is the fastest option, with a capacity of 10 to 200 Gb per second. However, it is not available everywhere: at the time of this writing, only in 62 locations in the world.
Partner interconnect. You connect through a service provider. This option is more geographically available, but the not as fast as a dedicated interconnects: from 50 Mb per second to 10 Gb per second.

Cloud Peering

Cloud peering is not a GCP service, but you can use it to connect your network to Google's network and access services like Youtube, Drive, or GCP services.

A common use case is when you need to connect to Google but don't want to do it over the public internet.

Other networking services

Load Balancers (LB)

In GCP, load balancers are pieces of software that distribute user requests among a group of instances.

A load balancer may have multiple backends associated with it, having rules to decide the appropriate backend for a given request.

There are different types of load balancers. They differ in the type of traffic (HTTP vs TCP/UDP - Layer 7 or Layer 4), whether they handle external or internal traffic, and whether their scope is regional or global:

HTTP(s). Global LB that handles HTTP(s) requests, distributing traffic to multiple regions based on user location (to the closest region with available instances) or URL maps (the LB can be configured to forward requests to URL/news to a backend service and URL/videos to a different one). It can receive both IPv4 and IPv6 traffic (but this one is terminated at the LB level and proxied as IPv4 to the backends) and has native support for WebSockets.
SSL Proxy LB. Global LB that handles encrypted TCP traffic, managing SSL certificates for you.
TCP Proxy LB. Global LB that handles unencrypted TCP traffic. Like SSL Proxy LB, by default, it will not preserve the client's IP, but this can be changed.
Network Load Balancer. Regional LB that handles TCP/UDP external traffic, based on IP address and port.
Internal Load Balancer. Like a Network LB, but for internal traffic.

For the visual learners:

Cloud DNS

Cloud DNS is Google's managed Domain Name System (DNS) host, both for internal and external (public) traffic. It will map URLs like https://www.freecodecamp.org/ to an IP address. It is the only service in GCP with 100% SLA - it is available 100% of the time.

Google Cloud CDN

Cloud DNS is Google's Content Delivery Network. If you have data that does not change often (images, videos, CSS, etc.) it makes sense to cache it close to your users. Cloud CDN provides 90 Edges Point of Presence (POP) to cache the data close to your end-users.

After the first request, static data can be stored in a POP, usually much closer to your user than your main servers. Thus, in subsequent requests, you can retrieve the data faster from the POP and reduce the load on your backend servers.

Where can you run your applications in GCP?

I will present 4 places where your code can run in GCP:

Google Compute Engine
Google Kubernetes Engine
App Engine
Cloud Functions

Note: there is a 5th option: Firebase is Google's mobile platform that helps you quickly develop apps.

Compute Engine (GCE)

Compute engine allows you to spin up virtual machines in GCP. This section will be longer since GCE provides the infrastructure where GKE and GAE run.

In the introduction, I talked about the different types of VMs you can create in GCE. Now, I will cover where to store the data, how to back it up, and how to create instances with all the data and configuration you need.

Where to store your VM's data: disks

Your data can be stored in Persistent disks, Local SSDs, or in Cloud Storage.

Persistent Disk

Persistent disks provide durable and reliable block storage. They are not local to the machine. Rather, they are networked attached, which has its pros and cons:

Disks can be resized, attached, or detached from a VM even if the instance is in use.
They have high reliability.
Disks can survive the instance after its deletion.
If you need more space, simply attach more disks.
Larger disks will provide higher performance.
Being networked attached, they are less performant than local options. SSD persistent disks are also available for more demanding workloads.

Every instance will need one boot disk and it must be of this type.

Local SSD

Local SSDs are attached to a VM to which they provide high-performance ephemeral storage. As of now, you can attach up to eight 375GB local SSDs to the same instance. However, this data will be lost if the VM is killed.

Local SSDs can only be attached to a machine when it is created, but you can attach both local SSDs and persistent disks to the same machine.

Both types of disks are zonal resources.

Cloud Storage

We have extensively covered GCS in a previous section. GCS is not a filesystem, but you can use GCS-Fuse to mount GCS buckets as filesystems in Linux or macOS systems. You can also let apps download and upload data to GCS using standard filesystem semantics.

How to back up your VM's data: Snapshots

Snapshots are backups of your disks. To reduce space, they are created incrementally:

Back up 1 contains all your disk content
Back up 2 only contains the data that has changed since back up 1
Back up 3 only contains the data that has changed since back up 2, and so on

This is enough to restore the state of your disk.

Even though snapshots can be taken without stopping the instance, it is best practice to at least reduce its activity, stop writing data to disk, and flush buffers. This helps you make sure you get an accurate representation of the content of the disk.

Images

Images refer to the operating system images needed to create boot disks for your instances. There are two types of images:

Public images. They are provided and maintained by Google, open-source communities, and third-party vendors. Ready for you to use as soon as you create your project. Available to anyone
Custom images. Images that you have created.
They are linked to the project in which you created them but you can share them with other projects.
You can create images from persistent disks and other images, both from the same project or shared from another project.
Related images can be grouped in image families to simplify the management of the different image versions.
For Linux-based images, you can share them also by exporting them to Cloud Storage as a tar.gz file.

You might be asking yourself what is the difference between an image and a snapshot. Mainly, their purpose. Snapshots are taken as incremental backups of a disk while images are created to spin up new virtual machines and configure instance templates.

Note on images vs startup scripts:

For simple setups, startup scripts are also an option. They can be used to test changes quickly, but the VMs will take longer to be ready compared to using an image where all the needed software is installed, configured, and so on.

Instance groups

Instance groups let you treat a group of instances as a single unit and they come in two flavors:

Unmanaged instance group. Formed by a heterogeneous group of instances that required individual configuration settings.
Managed instance group (MIG). This is the preferred option when possible. All the machines look the same, making it easy to configure them, create them in multiple zones (high availability), replace them if they become unhealthy (auto-healing), balance the traffic among them, and create new instances if they traffic increases (horizontal scaling).

To create a MIGs, you need to define an instance template, specifying your machine type, zone, OS image, startup and shutdown scripts, and so on. Instance templates are immutable.

To update a MIG, you need to create a new template and use the Managed Instance Group Updated to deploy the new version to every machine in the group.

This functionality can be used to create canary tests, deploying your changes to a small fraction of your machines first.

Visit this link to know more about Google's recommendations to ensure an application deployed via a managed instance group can handle the load even if an entire zone fails.

Security best practices for GCE

To increase the security of your infrastructure in GCE, have a look at:

Shielded VMs
Prevent instances from being reached from the public internet
[Trusted images](https://cloud.google.com/compute/docs/images/restricting-image-access#:~:text=Use the Trusted image feature,images%2C disks%2C and snapshots.) to make sure your users can only create disks from images in specific projects

App Engine

App Engine is a great choice when you want to focus on the code and let Google handle your infrastructure. You just need to choose the region where your app will be deployed (this cannot be changed once it is set). Amongst its main use cases are websites, mobile apps, and game backends.

You can easily update the version of your app that is running via the command line or the Google Console.

Also, if you need to deploy a risky update to your application, you can split the traffic between the old and the risky versions for a canary deployment. Once you are happy with the results, you can route all the traffic to the new version.

There are two App Engine environments:

Standard. This version can quickly scale up or down (even to zero instances) to adjust to the demand. Currently, only a few programming languages are supported (Go, Java, PHP, and Python) and you do not have access to a VPC (including VPN connections). It can be scaled down to zero instances.
Flexible. Your code runs in Docker containers in GCE, hence more flexible than the Standard environment. However, creating new instances is slower and it cannot be scaled down to zero instances. It is suited for more consistent traffic.

Regardless of the environment, there are no up-front costs and you only pay for what you use (billed per second).

Memcache is a built-in App Engine, giving you the possibility to choose between a shared cache (default, free option) or a dedicated cache for better performance.

Visit this link to know more about the best practices you should follow to maximize the performance of your app.

Google Kubernetes Engine (GKE)

Kubernetes is an open-source container orchestration system, developed by Google.

Kubernetes is a very extensive topic in itself and I will not cover here. You just need to know that GKE makes it easy to run and manage your Kubernetes clusters on GCP.

Google also provides Container Registry to store your container images - think of it as your private Docker Hub.

Note: You can use Cloud Build to run your builds in GCP and, among other things, produce Docker images and store them in Container Registry. Cloud Build can import your code from Google Cloud Storage, Cloud Source Repository, GitHub, or Bitbucket.

Cloud Functions

Cloud Functions are the equivalent of Lambda functions in AWS. Cloud functions are serverless. They let you focus on the code and not worry about the infrastructure where it is going to run.

With Cloud Functions it is easy to respond to events such as uploads to a GCS bucket or messages in a Pub/Sub topic. You are only charged for the time your function is running in response to an event.

How to work with Big Data in GCP

BigQuery

BigQuery is Google's serverless data warehousing and provides analytics capabilities for petabyte-scale databases.

BigQuery automatically backs up your tables, but you can always export them to GCS to be on the safe side - incurring extra costs.

Data can be ingested in batches (for instance, from a GCS bucket) or from a stream in multiple formats: CSV, JSON, Parquet, or Avro (most performant). Also, you can query data that resides in external sources, called federated sources, for example, GCS buckets.

You can interact with your data in BigQuery using SQL via the

Google Console.
Command-line, running commands like bq query 'SELECT field FROM ....
REST API.
Code using client libraries.

User-Defined Functions allow you to combine SQL queries with JavaScript functions to create complex operations.

BigQuery is a columnar data store: records are stored in columns. Tables are collections of columns and datasets are collections of tables.

Jobs are actions to load, export, query, or copy data that BigQuery runs on your behalf.

Views are virtual tables defined by a SQL query and are useful sharing data with others when you want to control exactly what they have access to.

Two important concepts related to tables are:

Partitioned tables. To limit the amount of data that needs to be queried, tables can be divided into partitions. This can be done based on ingest time or including a timestamp or date column or an integer range. This way it is easy to query for certain periods without querying the full table. To reduce costs, you can define an expiration period after which the partition will be deleted.
Clustered tables. Data are clustered by column (for instance, order_id). When you query your table, only the rows associated with this column will be read. BigQuery will perform this clustering automatically based on one or more columns.

Using IAM roles, you can control access at a project, dataset, or view level, but not at the table level. Roles are complex for BigQuery, so I recommend checking the documentation.

For instance, the jobUser role only lets you run jobs while the user role lets you run jobs and create datasets (but not tables).

Your costs depend on how much data you store and stream into BigQuery and how much data you query. To reduce costs, BigQuery automatically caches previous queries (per user). This behavior can be disabled.

When you don't edit data for 90 days, it automatically moves to a cheaper storage class. You pay for what you use, but it is possible to opt for a flat rate (only if you need more than the 2000 slots that are allocated by default).

Check these links to see how to optimize your performance and costs.

Cloud Pub/Sub

Pub/Sub is Google's fully-managed message queue, allowing you to decouple publishers (adding messages to the queue) and subscribers (consuming messages from the queue).

Although it is similar to Kafka, Pub/Sub is not a direct substitute. They can be combined in the same pipeline (Kafka deployed on-premise or even in GKE). There are open-source plugins to connect Kafka to GCP, like Kafka Connect.

Pub/Sub guarantees that every message will be delivered at least once but it does not guarantee that messages will be processed in order. It is usually connected to Dataflow to process the data, ensure that the messages are processed in order, and so on.

Pub/Sub support both push and pull modes:

Push. Messages are sent to subscribers, resulting in lower latency.
Pull. Subscribers pull messages from topics, better suited for a large volume of messages.

Cloud Pub/Sub vs Cloud Task

Cloud Tasks is another fully-managed service to execute tasks asynchronously and manage messages between services. However, there are differences between Cloud Tasks and Pub/Sub:

In Pub/Sub, publishers and subscribers are decoupled. Publishers know nothing about their subscribers. When they publish a message, they implicitly cause one or multiple subscribers to react to a publishing event.
In Cloud Tasks, the publisher stays in control of the execution. Besides, Cloud Tasks provide other features unavailable for Pub/Sub like scheduling specific delivery times, delivery rate controls, configurable retries, access and management of individual tasks in a queue, task/message creation deduplication.

For more details, check out this link.

Cloud Dataflow

Cloud Dataflow is Google's managed service for stream and batch data processing, based on Apache Beam.

You can define pipelines that will transform your data, for example before it is ingested in another service like BigQuery, BigTable, or Cloud ML. The same pipeline can process both stream and batch data.

A common pattern is to stream data into Pub/Sub, let's say from IoT devices, process it in Dataflow, and store it for analysis in BigQuery.

But Pub/Sub does not guarantee that the order in which messages are pushed to the topics will be the order in which the messages are consumed. However, this can be done with Dataflow.

Cloud Dataproc

Cloud Dataproc is Google's managed the Hadoop and Spark ecosystem. It lets you create and manage your clusters easily and turn them off when you are not using them, to reduce costs.

Dataproc can only be used to process batch data, while Dataflow can handle also streaming data.

Google recommends using Dataproc for a lift and leverage migration of your on-premise Hadoop clusters to the cloud:

Reduce costs turning your cluster off when you are not using it.
Leverage Google's infrastructure
Use some preemptible virtual machines to reduce costs
Add larger (SSD) persistent disks to improve performance
BigQuery can replace Hive and BigTable can replace HBase
Cloud Storage replaces HDFS. Just upload your data to GCS and change the prefixes hdfs:// to gs://

Otherwise, you should choose Cloud Dataflow.

Dataprep

Cloud Dataprep provides you with a web-based interface to clean and prepare your data before processing. The input and output formats include, among others, CSV, JSON, and Avro.

After defining the transformations, a Dataflow job will run. The transformed data can be exported to GCS, BigQuery, etc.

Cloud Composer

Cloud Composer is Google's fully-managed Apache Airflow service to create, schedule, monitor, and manage workflows. It handles all the infrastructure for you so that you can concentrate on combining the services I have described above to create your own workflows.

Under the hood, a GKE cluster will be created with Airflow in it and GCS will be used to store files.

AI and Machine Learning in GCP

Covering the basics of machine learning would take another article. So here, I assume you are familiar with it and will show you how to train and deploy your models in GCP.

We'll also look at what APIs are available to leverage Google's machine learning capabilities in your services, even if you are not an expert in this area.

AI Platform

AI Platform provides you with a fully-managed platform to use machine learning libraries like Tensorflow. You just need to focus on your model and Google will handle all the infrastructure needed to train it.

After your model is trained, you can use it to get online and batch predictions.

Cloud AutoML

Google lets you use your data to train their models. You can leverage models to build applications that are based on natural language processing (for example, document classification or sentiment analysis applications), speech processing, machine translation, or video processing (video classification or object detection).

How to explore and visualize your data in GCP

Cloud Data Studio

Data Studio lets you create visualizations and dashboards based on data that resides in Google services (YouTube Analytics, Sheets, AdWords, local upload), Google Cloud Platform (BigQuery, Cloud SQL, GCS, Spanner), and many third-party services, storing your reports in Google Drive.

Data Studio is not part of GCP, but G-Suite, thus its permissions are not managed using IAM.

There are no additional costs for using Data Studio, other than the storage of the data, queries in BigQuery, and so on. Caching can be used to improve performance and reduce costs.

Cloud Datalab

Datalab lets you explore, analyze, and visualize data in BigQuery, ML Engine, Compute Engine, Cloud Storage, and Stackdriver.

It is based on Jupyter notebooks and supports Python, SQL, and Javascript code. Your notebooks can be shared via the Cloud Source Repository.

Cloud Datalab itself is free of charge, but it will create a virtual machine in GCE for which you will be billed.

Security in GCP

Encryption on Google Cloud Platform

Google Cloud encrypts data both at rest (data stored on disk) and in transit (data traveling in the network), using AES implemented via Boring SSL.

You can manage the encryption keys yourself (both storing them in GCP or on-premise) or let Google handle them.

Encryption at rest

GCP encrypts data stored at rest by default. Your data will be divided into chunks. Each chunk is distributed across different machines and encrypted with a unique key, called a data encryption key (DEK).

Keys are generated and managed by Google but you can also manage the keys yourself, as we will see later in this guide.

Encryption in Transit

To add an extra security layer, all communications between two GCP services or from your infrastructure to GCP are encrypted at one or more network layers. Your data would not be compromised if your messages were to be intercepted.

Cloud Key Management Service (KMS)

As I mentioned earlier, you can let Google manage the keys for you or you can manage them yourself.

Google KMS is the service that allows you to manage your encryption keys. You can create, rotate, and destroy symmetric encryption keys. All keys related activity is registered in logs. These keys are referred to as customer-managed encryption keys.

In GCS, they are used to encrypt:

The object's data.
The object's CRC32C checksum.
The object's MD5 hash.

And Google uses server-side keys to handle the rest of the metadata, including the object's name.

The DEKs used to encrypt your data are also encrypted using key encryption keys (KEKs), in a process called envelope encryption. By default, KEKs are rotated every 90 days.

It is important to note that KMS does not store secrets. KMS is a central repository for KEKs. Only the keys that GCP needs to encrypt secrets that are stored somewhere else, for instance in Secrets management.

Note: For GCE and GCS, you have the possibility of keeping your keys on-premise and let Google retrieve them to encrypt and decrypt your data. These are known as customer-supplied keys.

Identity-Aware Proxy (IAP)

Identity-Aware Proxy allows you to control the access GCP applications via HTTPs without installing any VPN software or adding extra code in your application to handle login.

Your applications are visible to the public internet, but only accessible to authorized users, implementing a zero-trust security access model.

Furthermore, with TCP forwarding you can prevent services like SSH to be exposed to the public internet.

Cloud Armor

Cloud Armor protects your infrastructure from distributed denial of service (DDoS) attacks. You define rules (for example to whitelist or deny certain IP addresses or CIDR ranges) to create security policies, which are enforced at the Point of Presence level (closer to the source of the attack).

Cloud Armor gives you the option of previewing the effects of your policies before activating them.

Cloud Data Loss Prevention

Data Loss Prevention is a fully-managed service designed to help you discover, classify, and protect sensitive data, like:

Personable Identifiable Information (PII): name, Social Security number, driver's license number, bank account number, passport number, email address, and so on.
Secrets
Credentials

DLP is integrated with GCS, BigQuery, and Datastore. Also, the source of the data can be outside of GCP.

You can specify what type of data you're interested in, called info type, define your own types (based on dictionaries of words and phrases or based on regex expressions), or let Google use the default which can be time-consuming for large amounts of data.

For each result, DLP will return the likelihood of that piece of data matches a certain info type: LIKELIHOOD_UNSPECIFIED, VERY_UNLIKELY, UNLIKELY, POSSIBLE, LIKELY, VERY_LIKELY.

After detecting a piece of PII, DLP can transform it so that it cannot be mapped back to the user. DLP uses multiple techniques to de-identify your sensitive data like tokenization, bucketing, and date shifting. DLP can detect and redact sensitive data in images too.

VPC Service Control

VPC Service Control helps prevent data exfiltration. It allows you to define a perimeter around resources you want to protect. You can define what services and from what networks these resources can be accessed.

Cloud Web Security Scanner

Cloud Web Security Scanner scanner applications running in Compute Engine, GKE, and App Engine for common vulnerabilities such as passwords in plain text, invalid headers, outdated libraries, and cross-site scripting attacks. It simulates a real user trying to click on your buttons, inputting text in your text fields, and so on.

It is part of Cloud Security Command Center.

More GCP resources

If you're interested in learning more about GCP, I recommend checking the free practice exams for the different certifications. Whether you are preparing for a GCP or not you can use them to find gaps in your knowledge:

Note: Some questions are based on case studies. Links to the case studies will be provided in the exams so that you have the full context to properly understand and answer the question.

Time to test your knowledge

I've extracted 10 questions from some of the exams above. Some of them are pretty straightforward. Others require deep thought and deciding what is the best solution when more than one option is a viable solution.

Question 1

Your customer is moving their corporate applications to Google Cloud. The security team wants detailed visibility of all resources in the organization. You use the Resource Manager to set yourself up as the Organization Administrator.

Which Cloud Identity and Access Management (Cloud IAM) roles should you give to the security team while following Google's recommended practices?

A. Organization viewer, Project owner

B. Organization viewer, Project viewer

C. Organization administrator, Project browser

D. Project owner, Network administrator

Question 2

Your company wants to try out the cloud with low risk. They want to archive approximately 100 TB of their log data to the cloud and test the serverless analytics features available to them there, while also retaining that data as a long-term disaster recovery backup.

Which two steps should they take? (Choose two)

A. Load logs into BigQuery.

B. Load logs into Cloud SQL.

C. Import logs into Cloud Logging.

D. Insert logs into Cloud Bigtable.

E. Upload log files into Cloud Storage.

Question 3

Your company wants to track whether someone is present in a meeting room reserved for a scheduled meeting.

There are 1000 meeting rooms across 5 offices on 3 continents. Each room is equipped with a motion sensor that reports its status every second.

You want to support the data ingestion needs of this sensor network. The receiving infrastructure needs to account for the possibility that the devices may have inconsistent connectivity.

Which solution should you design?

A. Have each device create a persistent connection to a Compute Engine instance and write messages to a custom application.

B. Have devices poll for connectivity to Cloud SQL and insert the latest messages on a regular interval to a device-specific table.

C. Have devices poll for connectivity to Cloud Pub/Sub and publish the latest messages on a regular interval to a shared topic for all devices.

D. Have devices create a persistent connection to an App Engine application fronted by Cloud Endpoints, which ingest messages and write them to Cloud Datastore.

Question 4

To reduce costs, the Director of Engineering has required all developers to move their development infrastructure resources from on-premises virtual machines (VMs) to Google Cloud.

These resources go through multiple start/stop events during the day and require the state to persist.

You have been asked to design the process of running a development environment in Google Cloud while providing cost visibility to the finance department.

Which two steps should you take? (Choose two)

A. Use persistent disks to store the state. Start and stop the VM as needed.

B. Use the --auto-delete flag on all persistent disks before stopping the VM.

C. Apply the VM CPU utilization label and include it in the BigQuery billing export.

D. Use BigQuery billing export and labels to relate cost to groups.

E. Store all state in a Local SSD, snapshot the persistent disks and terminate the VM.

Question 5

The database administration team has asked you to help them improve the performance of their new database server running on Compute Engine.

The database is used for importing and normalizing the company’s performance statistics. It is built with MySQL running on Debian Linux. They have an n1-standard-8 virtual machine with 80 GB of SSD zonal persistent disk which they can't restart until the next maintenance event.

What should they change to get better performance from this system as soon as possible and in a cost-effective manner?

A. Increase the virtual machine’s memory to 64 GB.

B. Create a new virtual machine running PostgreSQL.

C. Dynamically resize the SSD persistent disk to 500 GB.

D. Migrate their performance metrics warehouse to BigQuery.

Question 6

Your organization has a 3-tier web application deployed in the same Google Cloud Virtual Private Cloud (VPC).

Each tier (web, API, and database) scales independently of the others. Network traffic should flow through the web to the API tier, and then on to the database tier. Traffic should not flow between the web and the database tier.

How should you configure the network with minimal steps?

A. Add each tier to a different subnetwork.

B. Set up software-based firewalls on individual VMs.

C. Add tags to each tier and set up routes to allow the desired traffic flow.

D. Add tags to each tier and set up firewall rules to allow the desired traffic flow.

Question 7

You are developing an application on Google Cloud that will label famous landmarks in users’ photos. You are under competitive pressure to develop a predictive model quickly. You need to keep service costs low.

What should you do?

A. Build an application that calls the Cloud Vision API. Inspect the generated MID values to supply the image labels.

B. Build an application that calls the Cloud Vision API. Pass client image locations as base64-encoded strings.

C. Build and train a classification model with TensorFlow. Deploy the model using the AI Platform Prediction. Pass client image locations as base64-encoded strings.

D. Build and train a classification model with TensorFlow. Deploy the model using the AI Platform Prediction. Inspect the generated MID values to supply the image labels.

Question 8

You set up an autoscaling managed instance group to serve web traffic for an upcoming launch.

After configuring the instance group as a backend service to an HTTP(S) load balancer, you notice that virtual machine (VM) instances are being terminated and re-launched every minute. The instances do not have a public IP address.

You have verified that the appropriate web response is coming from each instance using the curl command. You want to ensure that the backend is configured correctly.

What should you do?

A. Ensure that a firewall rule exists to allow source traffic on HTTP/HTTPS to reach the load balancer.

B. Assign a public IP to each instance and configure a firewall rule to allow the load balancer to reach the instance public IP.

C. Ensure that a firewall rule exists to allow load balancer health checks to reach the instances in the instance group.

D. Create a tag on each instance with the name of the load balancer. Configure a firewall rule with the name of the load balancer as the source and the instance tag as the destination.

Question 9

You created a job that runs daily to import highly sensitive data from an on-premises location to Cloud Storage. You also set up a streaming data insert into Cloud Storage via a Kafka node that is running on a Compute Engine instance.

You need to encrypt the data at rest and supply your own encryption key. Your key should not be stored in the Google Cloud.

What should you do?

A. Create a dedicated service account and use encryption at rest to reference your data stored in Cloud Storage and Compute Engine data as part of your API service calls.

B. Upload your own encryption key to Cloud Key Management Service and use it to encrypt your data in Cloud Storage. Use your uploaded encryption key and reference it as part of your API service calls to encrypt your data in the Kafka node hosted on Compute Engine.

C. Upload your own encryption key to Cloud Key Management Service and use it to encrypt your data in your Kafka node hosted on Compute Engine.

D. Supply your own encryption key, and reference it as part of your API service calls to encrypt your data in Cloud Storage and your Kafka node hosted on Compute Engine.

Question 10

You are designing a relational data repository on Google Cloud to grow as needed. The data will be transactionally consistent and added from any location in the world. You want to monitor and adjust node count for input traffic, which can spike unpredictably.

What should you do?

A. Use Cloud Spanner for storage. Monitor storage usage and increase node count if more than 70% utilized.

B. Use Cloud Spanner for storage. Monitor CPU utilization and increase node count if more than 70% utilized for your time span.

C. Use Cloud Bigtable for storage. Monitor data stored and increase node count if more than 70% is utilized.

D. Use Cloud Bigtable for storage. Monitor CPU utilization and increase node count if more than 70% utilized for your time span.

Answers

B
A, E
C
A, D
C
D
B
C
D
B

Back to the initial proposition

At the beginning of this article, I said you'd learn how to design a mobile gaming analytics platform that collects, stores, and analyzes vast amounts of player-telemetry both from bulks of data and real-time events.

So, do you think you can do it?

Take a pen and a piece of paper and try to come up with your own solution based on the services I have described here. If you get stuck, the following questions might help:

The platform needs to collect real-time events from the game:
Where might be the game running?
How can you ingest streaming data from the game into GCP?
How can you store it?
How can you collect and store the uploads of batches of data?
Can you analyze all the ingested data as it comes? Does it need to be processed?
What services can you use to analyze the data? How would this change if low-latency was now a new requirement?

I have purposely defined the problem in a very vague way. This is what you can expect when you are facing this sort of challenge: uncertainty. It is part of your job to gather requirements and document your assumptions.

Do not worry if your solution does not look like Google's. This is just one possible solution. Learning to design complex systems is a skill that takes a lifetime to master. Luckily, you're headed in the right direction.

Conclusion

This guide will help you get started on GCP and give you a broad perspective of what you can do with it.

By no means will you be an expert after finishing this guide, or any other guide for that matter. The only way to really learn is by practicing.

You are going to learn infinitely more by doing than by reading or watching. I strongly recommend using your free trial and Code Labs if you are serious about learning.

You can visit my blog www.yourdevopsguy.com and follow me on Twitter for more high-quality technical content.

Disclaimer: At the time of publishing this article, I don't work or have ever worked for Google. I wanted to organize and summarize the knowledge I have acquired learned via the Google documentation, YouTube videos, the courses that I have taken and most importantly through hands-on practice using GCP daily on my job.

All of this information is free out there. The figures, numbers, and versions that you see here come from the documentation at the time I am publishing this article. To make sure you are using up-to-date data, please visit the official documentation.

How to Pass Almost Every Google Cloud Platform Professional Certification Exam

freeCodeCamp — Mon, 15 Jun 2020 19:59:54 +0000

By Ivam Luz

Are you interested in becoming a Google Cloud Platform certified professional?

Last year, I took five out of the seven (at the time of this writing) of the GCP professional exams:

In this post, I'll share some information about the exams, my strategies for passing them, as well as a link to the study guides I created along the way. These guides have been battle tested by more than a hundred professionals (so far) who successfully got certified with their help.

About the certification exams

Professional Cloud Architect

Professional Cloud Architect certification logo

Length: 2 hours
Registration fee: $200 (plus tax where applicable)
Languages: English, Japanese.
Exam format: Multiple choice and multiple select, taken remotely or in person at a test enter. Locate a test center near you.
Exam Delivery Method:
• Take the online-proctored exam from a remote location, review the online testing requirements.
• Take the onsite-proctored exam at a testing center, locate a test center near you.
Prerequisites: None
Recommended experience: 3+ years of industry experience including 1+ years designing and managing solutions using GCP.

Reference: https://cloud.google.com/certification/cloud-architect

Professional Data Engineer

Professional Data Engineer certification logo

Length: 2 hours
Registration fee: $200 (plus tax where applicable)
Languages: English, Japanese.
Exam format: Multiple choice and multiple select taken remotely or in person at a test center. Locate a test center near you.
Exam Delivery Method:
• Take the online-proctored exam from a remote location, review the online testing requirements.
• Take the onsite-proctored exam at a testing center, Locate a test center near you.
Prerequisites: None
Recommended experience: 3+ years of industry experience including 1+ years designing and managing solutions using GCP.

Reference: https://cloud.google.com/certification/data-engineer

Professional Cloud Security Engineer

Professional Cloud Security Engineer certification logo

Length: 2 hours
Registration fee: $200 (plus tax where applicable)
Languages: English.
Exam format: Multiple choice and multiple select, taken in person at a test center. Locate a test center near you.
Prerequisites: None
Recommended experience: 3+ years of industry experience including 1+ years designing and managing solutions using GCP.

Reference: https://cloud.google.com/certification/cloud-security-engineer

Professional Cloud Network Engineer

Professional Cloud Network Engineer certification logo

Length: 2 hours
Registration fee: $200 (plus tax where applicable)
Languages: English.
Exam format: Multiple choice and multiple select, taken in person at a test center. Locate a test center near you.
Prerequisites: None
Recommended experience: 3+ years of industry experience including 1+ years designing and managing solutions using GCP.

Reference: https://cloud.google.com/certification/cloud-network-engineer

Professional Cloud Developer

Professional Cloud Developer certification logo

Honestly, this one caught me by surprise. Back in December 2019, when I took the exam, there wasn't anything saying it was in beta. By that time, this was the information available at the exam page:

Length: 2 hours
Registration fee: $200 (plus tax where applicable)
Languages: English, Japanese.
Exam format: Multiple choice and multiple select, taken in person at a test center. Locate a test center near you.
Prerequisites: None
Recommended experience: 3+ years of industry experience including 1+ years designing and managing solutions using GCP.

At the time of this writing, though, the information available on the exam page is very different, as you can see below:

Beta certification exams are newly developed assessments. We gather performance statistics on the questions and use these statistics to create the certification standards for the final exams. If you pass, you are Google Cloud Certified.

Save 40% on the cost of certification
Prove early adoption by claiming a low certificate number if you pass
Get exclusive Google-branded apparel
Refer to our FAQs for more details

Specifics about the beta

Length: 4 hours
Registration fee: $120 USD (40% discount on retail price of $200 USD) (plus tax where applicable)
Languages: English.
Exam format: Multiple choice and multiple select, taken in person at a test center. Locate a test center near you.
Prerequisites: None
Recommended experience: 3+ years of industry experience including 1+ years designing and managing solutions using GCP.
Beta exam preparation resources:
To take the upcoming beta exam, use the revised exam guide.

Reference: https://cloud.google.com/certification/cloud-developer

The Preparation Process

Now that you have all the basic information about the exams, it's time to study and get ready to pass them. My preparation process for the five exams I took involved the following steps:

Read the exam overviews:
• Professional Cloud Architect exam overview
• Professional Data Engineer exam overview
• Professional Cloud Security Engineer exam overview
• Professional Cloud Network Engineer exam overview
• Professional Cloud Developer exam overview
Read the exam guides:
• Professional Cloud Architect exam guide
• Professional Data Engineer exam guide
• Professional Cloud Security Engineer exam guide
• Professional Cloud Network Engineer exam guide
• Professional Cloud Developer exam guide
Next, visit the products page of the platform and identify each product that may be related to the topics listed on the exam guides. For GCP, you can find this list here.
For each of the products identified in the prior step, visit its Documentation/Concepts page and start reading about each of the concepts that are relevant for the given product. Check the GCE concepts page, for example.
You’ll probably notice some products seem to overlap with each other, and you might find it difficult to know when to use one or the other. Google Search is your best friend here. :)
For the Cloud Architect exam, after going through each product and its concepts, read the sample study cases provided by Google and try to design potential solutions that could address the requirements described on them.
• Mountkirk games study case
• Dress4Win study case
• TerramEarth study case

For the Data Engineer exam, the sample study cases have been recently removed and weren't part of the exam anymore, at least until June 2019.

All the other exams didn't make use of sample study cases, as of 2019.

Finally, take the practice exams. The practice exams provide an explanation for each of the questions after you finish them. They also help you get an idea of the format of the questions you’ll face on each exam and will help you know how prepared you are.
• Professional Cloud Architect practice exam
• Professional Data Engineer practice exam
• Professional Cloud Security Engineer practice exam
• Professional Cloud Network Engineer practice exam
• Professional Cloud Developer practice exam
After finishing the practice exams, take notes of the topics that didn’t go well and re-read the relevant documentation collected on step 4 above.
Take the practice exams again (you can take them as many times as you want) and keep repeating steps from 6 to 8 until you feel confident to take the real exams.

Study guides

From my own experience, I can tell you it's a lot of work. For this reason, I decided to contribute back to the community and share the study guides I created throughout my preparation process.

The study guides are contained in the spreadsheet linked below, each on a separate tab.

https://docs.google.com/spreadsheets/d/1LUtqhOEjUMySCfn3zj8Arhzcmazr3vrPzy7VzJwIshE/edit#gid=0

To use it, create your own copy. Once you do it, the spreadsheet will be made writable to you and you’ll be able to update the Status column, which you’ll help you to track your progress through the material:

A screenshot of the spreadsheet with reference material for both professional certification exams.

You can freely copy, change and distribute this material. The only thing I kindly ask from you is that you keep a reference to the original material and give me proper credits, if you feel it's helpful.

Even though the Cloud Developer exam is back in beta, I believe the guide I created is still relevant, as it covers a lot of the topics (probably even more than what's needed_.

Disclaimer

I'm sharing these guides with the only intent of helping people aiming to take the Google Cloud Professional certification exams. Be advised there is no guarantee that following the guides will make you pass the exams. Use them at your own discretion.

Tips for taking your exams

Know what each product does, what it’s good for and what it’s not good for, as well as its billing characteristics.
As you can see above, except for the Cloud Developer exam, which seems to be back in beta, you have 2 hours to finish the exams. Keep in mind that good time management is crucial for your success.
Don’t spend too much time on questions you don’t know. If you aren’t sure about an answer, mark the question to be reviewed later and move on to the next questions.
Practice as much as possible using the practice exams.

Conclusion

In my opinion, the greatest value of a certification** is to help you know which subjects are important to learn as a professional, if you are willing to work with a specific technology.

The Professional Cloud Security Engineer certification exam helped me a lot in guiding my studies to learn more about the security aspects of the Google Cloud Platform. It helped me learn about specific concerns** we should have when considering or using many of the platform products from a security standpoint.

The Cloud Network Engineer certification was the hardest one I took out of the five. I believe it’s due to the fact my whole career was focused on Software Development so far.

I recognize that, just because I got certified, it doesn’t mean I am now a network specialist (and, honestly, I don’t really intend to be). As some people say, a certification is just a “piece of paper”, right? On the other hand, I certainly learned a lot during this process and having some networking skills in my belt certainly makes me a better professional.

In fact, some of these skills have already helped me solve some infrastructure issues from one of my clients.

Besides all that, certifications are still highly valued by the market and may help you stand out from the crowd.

As a final tip, if you aren't sure which certification you should take first, this is the order I'd recommend (unless you have specific needs related to your job or are strongly focused on a specific area):

Professional Cloud Architect
Professional Cloud Security Engineer
Professional Data Engineer
Professional Cloud Network Engineer
Professional Cloud Developer (because it's back in beta, otherwise it would probably be number 3 in this list)

I hope this article and the referenced study guides help you in your journey to become a Google Cloud certified professional and I wish you all the success in your career!

_Photo by [Unsplash](https://unsplash.com/@mahdigp?utm_source=ghost&utm_medium=referral&utm_campaign=api-credit">Mahdi Dastmard / Final Verdict:

Normally in Laravel, we call the event from the respective controller directly. But for using Google cloud tasks, we create a Task API which in turn calls the route that in turn calls the event to process our data. So in short we create multiple APIs for our Laravel events and jobs which are then called (by Google Task API) based on the route that you pass in Step 3 and Step 6.

Since I am using Google Cloud Tasks, I don’t need to worry about the supervisor or managing the jobs in the queue table, as everything is taken care by the Google Task Queue. All I have to do is monitor the Task Queue if there is any failed task.

Using Google Cloud API, I can create multiple queues for different target applications I deploy on Google App Engine irrespective of whether the environment is Standard or Flexible.

How to secure and manage secrets using Google Cloud KMS

freeCodeCamp — Mon, 07 Jan 2019 22:22:23 +0000

By Ramesh Lingappa

Let’s jump right in. We all know it’s a bad idea to store application secrets within our code. So why we are storing there it still? Let’s take an example.

We could store those secrets in a file and add it to the gitignore so it’s not added to version control. But there are a couple of hurdles:

How do we manage those secrets?
What happens when the local copy is deleted?
How do we share it with other developers?
How do we manage versioning of those secrets during changes and an audit log of who changed what?

A lot of questions! So we end up storing it within the code, since it’s too much complexity to deal with.

For a big application or application which needs a higher level of security, we can use Production grade secret management services like Hashicorp Vault.

In this article, we will look at a decent approach in dealing with secrets while still achieving better security. We are going to achieve this using Google KMS + Git + IAM+ automation.

The idea is not new. This is what we are going to do:

We are going to store the encrypted version of plaintext in version control using Google KMS
We will use KMS IAM to allow appropriate users to manage secrets for each environment by granting encrypt/decrypt roles
We’ll deploy the application with encrypted secret files
We will allow permission for the server to decrypt secrets for each environment
At runtime, we’ll load encrypted files, decrypt using KMS APIs and use it.

Cloud KMS is a cloud-hosted key management service that lets you manage cryptographic keys for your cloud services. You can generate, use, rotate, and destroy cryptographic keys. Cloud KMS is integrated with Cloud IAM and Cloud Audit Logging so that you can manage permissions on individual keys and monitor how these are used.

So Cloud KMS will encrypt and decrypt our secrets so we don’t have to store the keys. Only an authorised user or a service account can perform encrypt or decrypt operations.

Let’s get started!

Step1: Preparing Secrets

For our use-case, we are going to have application secrets for each environment, prod stag and dev . We do so by creating a new folder called credentials under the root project folder and then create one folder for each environment.

credentials per each environment

Make sure this folder is not tracked under version control by adding the following line in the .gitignore file:

/credentials/

Here I am using a properties file, but it could be anything like JSON, YAML etc. Now you can add any sensitive information in these files. I have added the following:

# dev credentialsoauth_client_id=1234oauth_client_secret=abcdapi_key=api_123# ...

Okay, our secrets are ready for hiding.

Step2: Creating KMS Secret Keys

We need to create encryption keys for each environment in order to use this service. For us, each environment will be a different google cloud project (recommended). It’s better this way since it gives isolation and access control (more on this later).

So go ahead and create a key for each environment using this link Creating Symmetric Keys(recommended). It has step by step instructions (different ways) to create those keys. We are creating those keys using the command line like below:

// create key-ring (think of this as grouping)gcloud kms keyrings create [KEYRING_NAME] \--location [LOCATION] \--project live-project-id

// create the encryption keygcloud kms keys create [KEY_NAME] \--location [LOCATION] \--keyring [KEYRING_NAME] \--purpose encryption \--project live-project-id

Here I am creating a key for production using the production project id. Repeat this process for each environment by replacing the Project ID for stag and other environments.

Note: You need to have four pieces of information for each key: location keyring cryptokey and project. This information is not sensitive so you can store it in your code or build scripts

Step3: Assigning Permission to use these keys

Here comes the beauty of the KMS IAM system: in order to use each key, we need to explicitly grant access for an individual user or a service account. This makes it very powerful since now we can define who can manage secrets, who can view those secrets, and more.

Check out Using IAM with Cloud KMS for more information. With this, we can achieve the following:

Production Environment:

No one should be able to see the secrets except the few people who can make changes to secrets. We can do so by granting them the role:

cloudkms.cryptoKeyEncrypterDecrypter

So in this way, even though the encrypted credentials are stored in version control, other developers won't be able to use them. Note, even those developers can make live deployments without ever needing to know the secrets (more on this later).

Staging Environment:

Every developer can see the secrets and use them in development, but only a few people can make changes to secrets. We can do so by granting them the role:

// for read onlycloudkms.cryptoKeyDecrypter

// for managingcloudkms.cryptoKeyEncrypterDecrypter

Likewise, you can grant key roles for different environments depending on the need. For the exact commands, refer to Granting Permission in the docs.

Step4: Encrypting Secrets

We are done with prep work, and now it’s time to hide some secrets. Assuming you have the encrypter role, with that you can encrypt a file using the following command:

gcloud kms encrypt --location global \  --keyring secrets-key-ring --key quickstart \  --plaintext-file credentials/stag/credentials.properties \  --ciphertext-file credentials-encrypted/stag/credentials.properties.encrypted

Since it’s a shell gcloud command, you can easily integrate it with any build system to encrypt all files under the credentials folder. For example, I am using gradle for this:

Basically, there are two helper functions:

kmsEncryptSecrets takes the src folder to encrypt each file within it and write it to the target folder with .enc (encrypted) extension, and
kmsDecryptSecrets which does the reverse process.

So each time we modify secrets, you can call the encrypt helper method with a simple task:

Now the encrypted folder will look like below:

encrypted credential files

This folder can be added to version control so each time an authorised user changes secrets, a new encrypted file is generated and logs the history in version control.

Similarly, there is a Decrypt Task for the reverse process.

Step4: Using Encrypted Secrets in deployment

Now that we are done encrypting secrets and properly managing them in version control, let's look at how it can be used at runtime, meaning when the app is actually running in staging or production. We can do that in two ways:

1. Decrypting secrets and passing during deployment:

So during deployment, an authorised user can simply decrypt those encrypted secrets and add it to the deployment (eg: build directory), thus making it available for the code at runtime. We are not going to cover this deeply.

This approach is good when deployer needed to be very restrictive or process is automated using CD pipeline.

2. Passing encrypted secrets during deployment and decrypting at runtime:

Here we are not going to decrypt and send raw secrets during deployment. Instead, we are simply passing encrypted secrets. And during runtime we will decrypt those secrets and use them.

Note: this works best within the Google Cloud Platform. Otherwise you need to generate a service account so you can use this approach with external providers.

This approach is even more secure since we are not relying on any intermediate user action or a pipeline, but instead only on authorised servers that can decrypt content at runtime.

For example, we can allow the staging server (service account) the ability to decrypt staging secrets and not the ability to decrypt production secrets.

With this approach, even any developer who doesn’t have access to decrypt production secrets can able to perform production deployment and everything still works fine.

Step 5: Using secrets at runtime

We are going to use the second approach (passing encrypted secrets).

For the demo, assume we are going to deploy to AppEngine since it has a default service account generated already. We will grant it the access to decrypt secrets like below:

gcloud kms keys add-iam-policy-binding secrets-enc-key \ --project kms-demo \--location global \--keyring secrets-key-ring \--member serviceAccount:kms-demo@appspot.gserviceaccount.com \--project kms-demo \--role roles/cloudkms.cryptoKeyDecrypter

Thus when the server starts, we could simply load the encrypted file and use the KMS client libraries to decrypt its content.

Step6: KMS Audit Logs

Finally, you can see audit logs for operations on each key by enabling KMS audit logging (not enabled by default). Thus we can now keep track of all operations performed for future auditing.

You can enable the audit log using gcloud, but we have seen enough of the command line way. Alternatively, we can enable this configuration using the Cloud Console UI. From the left menu, choose IAM & admin -> Audit Logs.

Click Cloud Key Management Service and enable Data Read and Data Write and hit Save.

Google IAM Audit Log Console

That's it! Now if any encrypt, decrypt or any other sorts of operations are performed, an audit log is generated and you can check those in the Logging section under Cloud KMS CryptoKey.

Audit Logs for IAM operations

As you can see, it has audit logs for all sorts of operations including failures like Invalid permissions, or requests etc. It shows which user performed what operation using which key (or if it was done under a service account). That's a pretty neat solution. For more info, read Using Cloud Audit Logging with Cloud KMS.

Conclusion

With this approach, we can store, manage and use application secrets or any sensitive information securely and also track changes using version control. The techniques discussed in this article can be used with any language, and it can use used fully or partially in other platforms as well like iOS, Android, external servers etc.

For a list of kms commands, refer to KMS Commands. Also, check out the sample application for the complete code:

ramesh-dev/gae-dynamic-config-demo
_AppEngine Dynamic Configuration Demo. Contribute to ramesh-dev/gae-dynamic-config-demo development by creating an…_github.com

Here are some reference links:

The voice memo’s BFF — how to make Speech2Text easy with Machine Learning

freeCodeCamp — Wed, 19 Dec 2018 17:07:43 +0000

By Rafael Belchior

Do you think recording voice memos is inconvenient because you have to transcribe them? Do you waste your precious voice memos because you never write them down? Do you feel like you are not unlocking the full potential of what you record?

Yeah, that sucks. ?

Write, write, write.

I’m a Computer Science masters student. As I think that all work and no play makes me a dull boy, I’ve decided to invest some time in doing something different. Where? In the student’s group to which I belong, by interviewing a professor.

I’ve talked to professor Rui Henriques, a teacher assistant @ Técnico Lisboa and researcher @ INESC-ID. He is an expert in Data Mining and Bioinformatics. The 20 minutes interview turned into almost a full hour conversation.

Rui is not only a brilliant academic but also a very honest, cheerful and easy going person, which made it very easy. I learned a lot while talking to him, and I’m sure you also can. The interview will be online soon enough. ?

Anyway, I had a problem and a need. I wanted to save time by not having to transcribe the whole interview. The idea was to invest only twenty to sixty minutes in order to skyrocket performance when it comes to transcribing. This is not limited to interviews, of course. You can transcribe audio notes taken from several sources like classes, writing notes, thoughts, your shopping list, or your most philosophical pieces.

So, how do we do that?

I’m also lecturing on It Infrastructure Management and Administration @ Técnico Lisboa. In classes, we have used Google Cloud Engine. I remembered a service called Google Speech-To-Text, which we could use in this case. And no, Google is not paying me to write this ?

So, how to turn an interview of 55 minutes into easily editable text? How to reduce our efforts and focus on what matters? ?

? By the way, to make the most out of this method, please cut noise and try to record with a loud, clear voice. ?

Step 1: Installing the required software

I use Vagrant to manage virtual machines. The advantage is that to use the environment you need to instantiate the Speech-To-Text service. In this article, I show step by step how to configure these tools (read it up to the section “The Experiment”). If you prefer to do this on your local machine, go directly to the third step.

Step 2: Start the virtual machine

Now, open your console and run:

$ vagrant up --provision && vagrant ssh

The virtual machine is booting, installing all the required dependencies. This may take a while.

Wait a bit. Done. Nice. Kudos to you ?

Step 3: Getting the support files

Fork this repository containing the support files and then clone it to your computer. Put it in the folder that is being synced with your guest machine.

Step 4: Creating an account at Google Cloud Engine

You can require a free grant ($300) for this experiment ? After creating the account, go to Google Console. Create a project. You can name it “easy-interview” if you are confident enough. You should see something like this:

After that, go to “APIs & Services”, in order to activate the API we need to get the job done.

Click on “Create Credentials”. Choose “Cloud Speech API”. On “Are you planning to use this API with App Engine or Compute Engine?” say “No”. On step 2, “Create a service account” name the service “transcribing”. The role is Project => Owner. Key type: JSON.

By now, you should have downloaded a file called “file.txt”. It contains the credentials you need to use the service. Rename the file to “terraform-credentials.json”. Copy it to the folder containing the support files. As that folder is synced with your virtual machine, you will have access to those files from the guest machine. Now, run:

$ gcloud auth login

Follow the instructions. Authenticate yourself following the link that is shown. Now, analyze the request.json file:

{  "config": {      "encoding":"FLAC",      "sampleRateHertz": 16000,      "languageCode": "en-US",      "enableWordTimeOffsets": false  },  "audio": {      "uri":"gs://cloud-samples-tests/speech/brooklyn.flac"  }}

Make sure to tune the parameters to fit your case. Beware that there are limitations on the encoding that you can use. If your file is in a different format than flac or wav, you will need to convert it. You can convert audio files with Audacity, a free, open-source audio software. After converting the audio, you have to upload it to Google Storage. For that, you have to create a bucket.

The settings may be:

After that, upload your file to the bucket. On the Bucket menu, you should be able to access the URI associated with your file. The format is gs://BUCKET/FILE.EXTENSION. Take that URI and replace it on the file my-request.json.

Your file should look something like this:

{  "config": {      "encoding":"FLAC",      "sampleRateHertz": 16000,      "languageCode": "pt-PT",      "enableWordTimeOffsets": false  },  "audio": {      "uri":"gs://easy-interview/interview.flac"  }}

Before we use the API, we need to load the credentials. Run the script load-credentials.sh to load them:

$ source load-credentials.sh

This has set the GOOGLE_APPLICATION_CREDENTIAL environment variable. Next, to test if the connection is successful, run:

$ curl -s -H "Content-Type: application/json" \    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \    https://speech.googleapis.com/v1/speech:recognize \    -d @test-request.json

You should be able to see a response with some transcribed text. Note that we ran test-request.json, which is just for testing purposes. Now, to make the call with your data, run:

$ curl -s -H "Content-Type: application/json" \    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \    https://speech.googleapis.com/v1/speech:longrunningrecognize \    -d @my-request.json >> name.out

If you run more name.out, you will see that the response contains a field called name. That name corresponds to the operation name that was created to meet the request. Now you have to wait a bit until the operation completes. Run (replace NAME with your operation’s name):

$ curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \     -H "Content-Type: application/json; charset=utf-8" \     "https://speech.googleapis.com/v1/operations/NAME" >> result.out

While the operation doesn’t finish, your result.out will have a content similar to this:

{
“name”: “8254262642733152416”,
“metadata”: {
“@type”: “type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata”,
“progressPercent”: 33,
“startTime”: “2018–12–08T01:15:08.969852Z”,
“lastUpdateTime”: “2018–12–08T01:19:25.105683Z”
}
}

For a 60mb file, encoded with flac , it took about 12 minutes. You will have a file called results.out with your precious content. It will be in your host machine as well. I’ve written a very simple Python script that parses results.out. The script redirects the output to a file named results-parsed.out. To execute it, run:

$ python parse.py

If you don’t like the results, tune the parameters and try again.

Enjoy your content! You are done ? To finish this experiment, exit the machine:

$ gcemgmt: exit

Now, stop the virtual machine:

$ vagrant halt

Don’t forget to delete the files that you uploaded to Google Cloud.

Well done!?

Well, this took me several hours to write, but at least I didn’t have to transcribe the whole interview. ?

Bottomline

Firstly, I would ❤️to hear your opinion! Do you record lots of voice memos? Do you find this procedure useful? Do you have a different one?

If you liked this article, please click the ? button on the left. Do you have a friend or family member that would benefit from this solution? Share this article!

Keep Rocking ?

Entrepreneurship ?

Top 8 lessons I’ve learned in European Innovation Academy 2017
_Imagine you are seeing the opportunity to improve yourself at every level. Would you take it?_blog.startuppulse.net

DevOps101 ☄️

DevOps101 — Improve Your Workflow! First Steps on Vagrant
_And make clients and developers happier._hackernoon.com DevOps101 — Infrastructure as Code With Vagrant
_And deploying a simple IT infrastructure (Two LAMP web servers and a client machine)._hackernoon.com

Blockchain For Students ⛓️

Blockchain For Students 101 -The Basics (Part 1)
_Are you ready to dig deep into this life-changing technology?_hackernoon.com

How to set up JHipster microservices with Istio service mesh on Kubernetes

freeCodeCamp — Sat, 17 Nov 2018 16:01:37 +0000

By Deepu K Sasidharan

You can find a more up to date version of this post that uses JHipster 6 and latest Istio & Kubernetes versions here.

Istio is the coolest kid on the DevOps and Cloud block now. For those of you who aren’t following close enough — Istio is a service mesh for distributed application architectures, especially the ones that you run on the cloud with Kubernetes. Istio plays extremely nice with Kubernetes, so nice that you might think that it’s part of Kubernetes.

If you are still wondering, what the heck is a service mesh or Istio? then let's have an overview of Istio.

Istio provides the following functionality in a distributed application architecture:

Service discovery — Traditionally provided by platforms like Netflix Eureka or Consul.
Automatic load balancing — You might have used Netflix Zuul for this.
Routing, circuit breaking, retries, fail-overs, fault injection — Think of Netflix Ribbon, Hytrix and so on.
Policy enforcement for access control, rate limiting, A/B testing, traffic splits, and quotas — Again you might have used Zuul to do some of these.
Metrics, logs, and traces — Think of ELK or Stack driver
Secure service-to-service communication

Below is the architecture of Istio.

Istio architecture

It can be classified into 2 distinct planes.

Data plane: Is made of Envoy proxies deployed as sidecars to the application containers. They control all the incoming and outgoing traffic to the container.

Control plane: It uses Pilot to manages and configure the proxies to route traffic. It also configures Mixer to enforce policies and to collect telemetry. It also has other components like Citadel, to manage security, and Galley, to manage configurations.

Istio also configures an instance of Grafana, Prometheus and Jaeger for Monitoring and Observability. You can use this or use your existing monitoring stack as well.

I hope this provides an overview of Istio, now let's focus on the goal of this article.

Devoxx 2018

I did a talk at Devoxx 2018 along with Julien Dubois doing the same demo and promised that I’d write a detailed blog about it.

You can watch the video to see JHipster + Istio in action.

You can watch the slides on Speaker Deck as well.

https://speakerdeck.com/deepu105/jhipster-5-whats-new-and-noteworthy

Preparing the Kubernetes cluster

First, let us prepare a Kubernetes cluster to deploy Istio and our application containers. Follow the instructions for any one of the platforms you prefer.

Prerequisites

kubectl: The command line tool to interact with Kubernetes. Install and configure it.

Create a cluster on Azure Kubernetes Service(AKS)

If you are going to use Azure, then install Azure CLI to interact with Azure. Install and log in with your Azure account (you can create a free account if you don’t have one already).

First let us create a resource group. You can use any region you like here instead of East US.

$ az group create --name eCommerceCluster --location eastus

Create the Kubernetes cluster:

$ az aks create \
--resource-group eCommerceCluster \
--name eCommerceCluster \
--node-count 4 \
--kubernetes-version 1.11.4 \
--enable-addons monitoring \
--generate-ssh-keys

The node-count flag is important as the setup requires at least four nodes with the default CPU to run everything. You can try to use a higher kubernetes-version if it is supported, else stick to 1.11.4

The cluster creation could take while so sit back and relax. ?

Once the cluster is created, fetch its credentials to be used from kubectl by running the below command. It automatically injects the credentials to your kubectl configuration under ~/.kube/config

$ az aks get-credentials \
--resource-group eCommerceCluster \
--name eCommerceCluster

You can view the created cluster in the Azure portal:

Kubernetes cluster in AKS

Run kubectl get nodes to see it in the command line and to verify that kubectl can connect to your cluster.

Cluster Nodes

Proceed to the Install and setup Istio section.

Create a cluster on Google Kubernetes Engine(GKE)

If you are going to use Google Cloud Platform(GCP) then install Gcloud CLIto interact with GCP. Install and log in with your GCP account (you can create a free account if you don’t have one already).

First, we need a GCP project, you can either use an existing project that you have or create a new one using GCloud CLI with below command:

$ gcloud projects create jhipster-demo-deepu

Set the project you want to use as the default project:

$ gcloud config set project jhipster-demo-deepu

Now let us create a cluster for our application with the below command:

$ gcloud container clusters create hello-hipster \

   --cluster-version 1.10 \

   --num-nodes 4 \

   --machine-type n1-standard-2

The num-nodes and machine-type flags are important as the setup requires at least four nodes with a bigger CPU to run everything. You can try to use a higher cluster-version if it is supported, else stick to 1.10.

The cluster creation could take while so sit back and relax.

$ gcloud container clusters get-credentials hello-hipster

You can view the created cluster in the GCP GUI.

Kubernetes cluster on GKE

Run kubectl get nodes to see it in the command line and to verify that kubectl can connect to your cluster.

Cluster Nodes

Install and setup Istio

Install Istio on your machine by following these steps:

$ cd ~/

$ export ISTIO_VERSION=1.0.2

$ curl -L https://git.io/getLatestIstio | sh -

$ ln -sf istio-$ISTIO_VERSION istio

$ export PATH=~/istio/bin:$PATH

Make sure to use version 1.0.2 since the latest version seems to have issues connecting to the MySQL database containers.

Now let us install Istio on our Kubernetes cluster by applying the provided Kubernetes manifests and helm templates from Istio.

$ kubectl apply -f ~/istio/install/kubernetes/helm/istio/templates/crds.yaml
$ kubectl apply -f ~/istio/install/kubernetes/istio-demo.yaml \
    --as=admin --as-group=system:masters

Wait for the pods to run, these will be deployed to the istio-system namespace.

$ watch kubectl get pods -n istio-system

Once the pods are in running status, exit the watch loop and run the below to get the Ingress gateway service details. This is the only service that is exposed to an external IP.

$ kubectl get svc istio-ingressgateway -n istio-system

NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP
istio-ingressgateway   LoadBalancer   10.27.249.83   35.195.81.130

The external IP is very important here, let us save this to an environment variable so that we can use it in further commands.

$ export \
  INGRESS_IP=$(kubectl -n istio-system get svc \
  istio-ingressgateway \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

Now our Kubernetes cluster is ready for Istio. ?

For advanced Istio setup options refer to https://istio.io/docs/setup/kubernetes/

Creating the microservice application stack

In one of my previous posts, I showcased how to create a full stack microservice architecture using JHipster and JDL. You can read the post here if you want to learn more details about it. For this exercise, we will use the same application but we will not use the Eureka service discovery option we used earlier. Also, note that the store application is further split into Gateway and Product applications.

Architecture

Here is the architecture of the microservice that we are going to create and deploy today.

Microservice architecture with Istio

It has a gateway application and three microservice applications. Each of them has its own database. You can see that each application has an Envoy proxy attached to the pod as a sidecar. Istio control plane components are also deployed to the same cluster along with Prometheus, Grafana, and Jaeger.

The Ingress gateway from Istio is the only entry point for traffic and it routes traffic to all microservices accordingly. Telemetry is collected from all the containers running in the cluster, including the applications, databases, and Istio components.

Compared to the architecture of the original application here, you can clearly see that we replaced the JHipster registry and Netflix OSS components with Istio. The ELK monitoring stack is replaced with Prometheus, Grafana and Jaeger configured by Istio. Here is the original architecture diagram without Istio for a quick visual comparison.

Microservice architecture with Netflix OSS

Application JDL

Let’s take a look at the modified JDL declaration. You can see that we have declared serviceDiscoveryType no here since we will be using Istio for that.


application {
  config {
    baseName store
    applicationType gateway
    packageName com.jhipster.demo.store
    serviceDiscoveryType no
    authenticationType jwt
    prodDatabaseType mysql
    cacheProvider hazelcast
    buildTool gradle
    clientFramework react
    useSass true
    testFrameworks [protractor]
  }
  entities *
}


application {
  config {
    baseName product
    applicationType microservice
    packageName com.jhipster.demo.product
    serviceDiscoveryType no
    authenticationType jwt
    prodDatabaseType mysql
    cacheProvider hazelcast
    buildTool gradle
    serverPort 8081
  }
  entities Product, ProductCategory, ProductOrder, OrderItem
}

application {
  config {
    baseName invoice
    applicationType microservice
    packageName com.jhipster.demo.invoice
    serviceDiscoveryType no
    authenticationType jwt
    prodDatabaseType mysql
    buildTool gradle
    serverPort 8082
  }
  entities Invoice, Shipment
}

application {
  config {
    baseName notification
    applicationType microservice
    packageName com.jhipster.demo.notification
    serviceDiscoveryType no
    authenticationType jwt
    databaseType mongodb
    cacheProvider no
    enableHibernateCache false
    buildTool gradle
    serverPort 8083
  }
  entities Notification
}

/**
 * Entities for Store Gateway
 */

// Customer for the store
entity Customer {
    firstName String required
    lastName String required
    gender Gender required
    email String required pattern(/^[^@\s]+@[^@\s]+\.[^@\s]+$/)
    phone String required
    addressLine1 String required
    addressLine2 String
    city String required
    country String required
}

enum Gender {
    MALE, FEMALE, OTHER
}

relationship OneToOne {
    Customer{user(login) required} to User
}

service Customer with serviceClass
paginate Customer with pagination


/**
 * Entities for product microservice
 */


// Product sold by the Online store 
entity Product {
    name String required
    description String
    price BigDecimal required min(0)
    size Size required
    image ImageBlob
}

enum Size {
    S, M, L, XL, XXL
}

entity ProductCategory {
    name String required
    description String
}

entity ProductOrder {
    placedDate Instant required
    status OrderStatus required
    code String required
    invoiceId Long
    customer String required
}

enum OrderStatus {
    COMPLETED, PENDING, CANCELLED
}

entity OrderItem {
    quantity Integer required min(0)
    totalPrice BigDecimal required min(0)
    status OrderItemStatus required
}

enum OrderItemStatus {
    AVAILABLE, OUT_OF_STOCK, BACK_ORDER
}

relationship ManyToOne {
    OrderItem{product(name) required} to Product
}

relationship OneToMany {
   ProductOrder{orderItem} to OrderItem{order(code) required} ,
   ProductCategory{product} to Product{productCategory(name)}
}

service Product, ProductCategory, ProductOrder, OrderItem with serviceClass
paginate Product, ProductOrder, OrderItem with pagination
microservice Product, ProductOrder, ProductCategory, OrderItem with product


/**
 * Entities for Invoice microservice
 */


// Invoice for sales
entity Invoice {
    code String required
    date Instant required
    details String
    status InvoiceStatus required
    paymentMethod PaymentMethod required
    paymentDate Instant required
    paymentAmount BigDecimal required
}

enum InvoiceStatus {
    PAID, ISSUED, CANCELLED
}

entity Shipment {
    trackingCode String
    date Instant required
    details String
}

enum PaymentMethod {
    CREDIT_CARD, CASH_ON_DELIVERY, PAYPAL
}

relationship OneToMany {
    Invoice{shipment} to Shipment{invoice(code) required}
}

service Invoice, Shipment with serviceClass
paginate Invoice, Shipment with pagination
microservice Invoice, Shipment with invoice


/**
 * Entities for notification microservice
 */


entity Notification {
    date Instant required
    details String
    sentDate Instant required
    format NotificationType required
    userId Long required
    productId Long required
}

enum NotificationType {
    EMAIL, SMS, PARCEL
}

microservice Notification with notification

/**
 * Deployments
 */

deployment {
  deploymentType kubernetes
  appsFolders [store, invoice, notification, product]
  dockerRepositoryName "deepu105"
  serviceDiscoveryType no
  istio true
  kubernetesServiceType Ingress
  kubernetesNamespace jhipster
  ingressDomain "34.90.236.124.nip.io"
}

Deployment JDL

JHipster version 5.7.0 introduced support for deployment declaration straight in the JDL

We have the below in our JDL which declares our Kubernetes deployment:

deployment {
  deploymentType kubernetes
  appsFolders [store, invoice, notification, product]
  dockerRepositoryName "deepu105"
  serviceDiscoveryType no
  istio autoInjection
  istioRoute true
  kubernetesServiceType Ingress
  kubernetesNamespace jhipster
  ingressDomain "35.195.81.130.nip.io"
}

The serviceDiscoveryType is disabled and we have enabled Istio with autoInjection support — the Envoy sidecars are injected automatically for the selected applications. Istio routes are also generated for the applications by enabling istioRoute option.

The kubernetesServiceType is set as Ingress, which is very important as Istio can only work with an Ingress controller service type. For Ingress, we need to set the domain DNS and this is where the Istio ingress gateway IP is needed. Now we need a DNS for our IP. For real usecases, you should map a DNS for the IP, but for testing and demo purposes we can use a wildcard DNS service like nip.io to resolve our IP. Just append nip.io to our IP and use that as the ingress domain.

Generate the applications and deployment manifests

Now that our JDL is ready, let us scaffold our applications and Kubernetes manifests. Create a new directory and save the above JDL in the directory. Let us name it app-istio.jdl and then run the import-jdl command.

$ mkdir istio-demo && cd istio-demo
$ jhipster import-jdl app-istio.jdl

This will generate all the applications and install the required NPM dependencies in each of them. Once the applications are generated the deployment manifests will be generated and some useful instruction will be printed to the console.

Generation output

Open the generated code in your favorite IDE/Editor and explore the code.

Deploy to Kubernetes cluster using Kubectl

Now let us build and deploy our applications. Run the ./gradlew bootWar -Pprod jibDockerBuild command in the store, product, invoice, and notification folders to build the docker images. Once the images are built, push them to the docker repo with these commands:

$ docker image tag store deepu105/store

$ docker push deepu105/store

$ docker image tag invoice deepu105/invoice

$ docker push deepu105/invoice

$ docker image tag notification deepu105/notification

$ docker push deepu105/notification

$ docker image tag product deepu105/product

$ docker push deepu105/product

Once the images are pushed, navigate into the generated Kubernetes directory and run the provided startup script. (If you are on windows you can run the steps in kubectl-apply.sh manually one by one.)

$ cd kubernetes
$ ./kubectl-apply.sh

Run watch kubectl get pods -n jhipster to monitor the status.

Deployed applications

Once all the pods are in running status we can explore the deployed applications

Application gateway

The store gateway application is the entry point for our microservices. Get the URL for the store app by running echo store.$INGRESS_IP.nip.io, we already stored the INGRESS_IP to environment variables while creating the Istio setup. Visit the URL in your favorite browser and explore the application. Try creating some entities for the microservices:

Store gateway application

Monitoring

Istio setup includes Grafana and Prometheus configured to collect and show metrics from our containers. Let's take a look.

By default, only the Ingress gateway is exposed to external IP and hence we will use kubectl port forwarding to set up a secure tunnel to the required services

Let us create a tunnel for Grafana:

$ kubectl -n istio-system \
port-forward $(kubectl -n istio-system get pod \

-l app=grafana -o jsonpath='{.items[0].metadata.name}') 3000:3000

Open localhost:3000 to view the Grafana dashboard.

Grafana dashboard for the Store application

Grafana uses the metrics scrapped by Prometheus. We can look at Prometheus directly by creating a tunnel for it and opening localhost:9090:

$ kubectl -n istio-system \
port-forward $(kubectl -n istio-system get pod -l \

app=prometheus -o jsonpath='{.items[0].metadata.name}') 9090:9090

Prometheus dashboard

Observability

Istio configures Jaeger for distributed tracing and service graph for service observability. Let us take a look at them.

Create a tunnel for Jaeger and open localhost:16686

$ kubectl -n istio-system \
port-forward $(kubectl -n istio-system get pod -l \

app=jaeger -o jsonpath='{.items[0].metadata.name}') 16686:16686

Jaeger tracing dashboard

You can make some requests in the application and find it in the tracing dashboard by querying for the service. Click on the request to see tracing details:

Tracing for product category listing request

Let us now create a tunnel for the service graph and open it in localhost:8080/force/forcegraph.html:

$ kubectl -n istio-system \
port-forward $(kubectl -n istio-system get pod -l \

app=servicegraph -o jsonpath='{.items[0].metadata.name}') 8088:8088

Istio service graph

Conclusion

Istio provides building blocks to build distributed microservices in a more Kubernetes-native way and takes the complexity and responsibility of maintaining those blocks away from you. This means you do not have to worry about maintaining the code or deployments for service discovery, tracing and so on.

Istio documentation says

Deploying a microservice-based application in an Istio service mesh allows one to externally control service monitoring and tracing, request (version) routing, resiliency testing, security and policy enforcement, etc., in a consistent way across the services, for the application as a whole.

Werner Vogels (CTO of AWS) quoted at AWS Re:Invent

“In the future, all the code you ever write will be business logic.”

Istio Service mesh helps with that statement. This lets you worry only about the applications that you are developing and with JHipster that future is truly here and you just need to worry about writing your business logic.

While this is great, it is not a silver bullet. Keep in mind that Istio is fairly new compared to other stable and battle-tested solutions like JHipster Registry (Eureka) or Consul.

Also, another thing to keep in mind is the resource requirements. The same microservices with JHipster Registry or Consul can be deployed to a 2 node cluster with 1 vCPU and 3.75 GB of memory per node in GCP while you need a 4 node cluster with 2 vCPUs and 7.5 GB of memory per node for Istio enabled deployments. The default Kubernetes manifest from Istio doesn’t apply any request limits for resources, and by adding and tuning those, the minimum requirement could be reduced. But still I don’t think you can get it as low as that is needed for the JHipster registry option.

In a real-world use case, the advantages of not having to maintain the complex parts of your infra vs having to pay for more resources might be a decision that has to be taken based on your priorities and goals.

A huge shout out to Ray Tsang for helping me figure out an optimal cluster size for this application. Also a huge thank you from myself and the community to both Ray and Srinivasa Vasu for adding the Istio support to JHipster.

JHipster provides a great Kubernetes setup to start with which you can further tweak as per your needs and platform. The Istio support is recent and will improve further over time, but it's still a great starting point especially to learn.

To learn more about JHipster and Full stack development, check out my book “Full Stack Development with JHipster” on Amazon and Packt.

There is a great Istio tutorial from Ray Tsang here.

If you like JHipster don’t forget to give it a star on Github.

If you like this article, please leave some claps (Did you know that you can clap multiple times in Medium? ?) I hope to write more about Istio in the near future.

You can follow me on Twitter and LinkedIn.

My other related posts:

Continuous Deployment for Node.js on the Google Cloud Platform

freeCodeCamp — Wed, 15 Aug 2018 01:19:51 +0000

By Gautam Arora

Google Cloud Platform (GCP) provides a host of options for Node developers to easily deploy our apps. Want a managed hosting solution like Heroku? App Engine, check! Want to host a containerized app? Kubernetes Engine, check! Want to deploy serverless app? Cloud Functions, check!

Recently at work, I’ve been enjoying using our in-house continuous deployment service that quickly builds, tests, and deploys new commits pushed to GitHub. So when I read about Google’s new Cloud Build service, I wanted to take it for a spin and see if I could recreate a similar seamless deployment experience for myself. Further, in a conversation with Fransizka from the Google Cloud team, she identified this as an area where a tutorial would be helpful. So here we go…

But wait, what is Cloud Build?

Cloud Build is a managed build service in GCP that can pull code from a variety of sources, run a series of build steps to create a build image for the application, and then deploy this image to a fleet of servers.

Cloud Build works well with Google’s own source code repository, Bit Bucket or GitHub. It can create a build image using a Docker configuration file (Dockerfile) or Cloud Build’s own configuration file (cloudconfig.yaml). It can deploy applications (and APIs) to App Engine, Kubernetes Engine, and Cloud Functions. A really cool feature is Build Triggers. These can be setup to watch for a new commit in the code repository and trigger a new build and deploy.

Before we jump into the deep end…

This post shares the detailed steps and code to setup the continuous deployment for Node apps on GCP. It assumes that you’re familiar with developing simple Node applications, working with the command line, and have some high level understanding of deploying apps to cloud services like Heroku, AWS, Azure or GCP.

For each of the sections, a companion GitHub code repository is provided for you to follow along. Don’t sweat it though — feel free to skim over the article to learn about the high level ideas, and you can bookmark it and come to it later if you plan to set this up. The real fun of having a setup like this is that you get to deploy applications quickly.

Continuous Deployment for App Engine Standard

Deploying a Node app to App Engine is quite simple. Create a new project in Google Cloud Console, add an app.yaml configuration file in our code directory (which describes the node runtime we want to use — I’ve used Node 8), and run gcloud app deploy on our terminal — and done!

If you want to try this out for yourself, here are a couple of resources:

So, what we’ve done so far by following the quickstart guide above:

Created a new project in Google Cloud Console
Deployed our Node app to App Engine using gcloud app deploy

….now how can we automate setup such that code changes get deployed automatically on push to GitHub?

Here is what we need to do:

Put our code on GitHub
Head over to GitHub to create a new repository
Then follow the instructions to push code from your machine to GitHub
Enable Cloud Build
Enable the Cloud Build API for our project
Enable the App Engine API for for our project.
Grant App Engine IAM to Cloud Build Service account by going to the IAM page, find this service account @cloudbuild.gserviceaccount.com, edit it and give it the App Engine Admin role.
Create a Cloud Build configuration file
Create a new file cloudbuild.yaml that looks like this:

steps:- name: 'gcr.io/cloud-builders/npm'  args: ['install']- name: 'gcr.io/cloud-builders/npm'  args: ['test']- name: "gcr.io/cloud-builders/gcloud"  args: ["app", "deploy"]timeout: "1600s"

This configuration has three build steps (each line starting with a hyphen is a build step) that will run npm install, then npm test and if all looks good then deploy our code to App Engine.

Each build step is just like a command we run on our machine. But in this case, since this file is in yaml and each step is split over 2 lines of name and args, it can look like a bit of a mind-bender.

Let’s try this: for the line starting with “name”, read its last word and then read the values in the “args” line. I hope this file makes more sense now!

Run a Build manually (optional, just for verification)
We can now deploy our application from our machine using Cloud Build
Run the cloud build command on your terminal: gcloud builds submit — config cloudbuild.yaml .This command starts a build on Cloud Build using the configuration file we created above.
Head over to the Cloud Builds page to see the build being kicked off.
Wait for the build to end, and then test out your Node application using the App Engine URL for this app.
You can make changes to your Node app and call this command again and to start more builds if you would like.
Create a Build Trigger
Head over to the Cloud Build Triggers page and select Create Trigger
On the Build Trigger setup page, choose GitHub as the Source Code Repository. This will require you to authorize GCP to access your GitHub repositories, which you will need to approve. Once done, select the GitHub repository for your Node app that you had pushed to GitHub earlier.
Create a trigger named continuous deployment, and for the trigger type choose Branch with regex for branch name as master. This will ensure that the builds, test, and deploy will only run for push to the master branch and not any branch.
For the build configuration file, select cloudbuild.yaml
Now click the Build Trigger button
Run a Build automatically by pushing a commit to GitHub
With our build trigger created, make a simple commit to your node application, like change “Hello, World!” to “Hello, GCP!” and commit and push this code to GitHub
Head back the the Cloud Builds page and you will notice that a build was automatically triggered (if it isn’t, give it a few more seconds or click the refresh button on the page)
Once the build is complete and you see a green check, you can visit your application using its App Engine URL and see that your changes are now live!

Here is a screenshot for builds being triggered through a GitHub push for our app:

Too good to be true?? Run this last step a few times times to test it out a few more times. Our first application now gets deployed to App Engine on every commit to master ?

_Photo by [Unsplash](https://unsplash.com/photos/g5FyZvIzUS4?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText" rel="noopener" target="_blank" title="">Willian Justen de Vasconcellos on Continuous Deployment for Kubernetes Engine

Great, so we’ve setup our application to deploy to App Engine on GitHub push, but what if we wanted the same setup for our containerized applications? Let’s give it a spin!

At a high level, deploying a Node app to Kubernetes engine has two main tasks. First, get our app ready: Containerize the application with Docker, build it, and push the Docker image to Google Container Registry. Then, setup things on the GCP end: create a Kubernetes Cluster, create a Deployment with your application image, and then create a Service to allow access to your running application.

If you want to try this out for yourself, here are a few resources:

So, what we’ve done so far by using the guides above:

Created another new project in Google Cloud Console
Created a Kubernetes Cluster, Deployment, and Service
Deployed our Containerized Node app to Kubernetes Engine using kubectl

…but what we want is an continuous deployment setup such that a new commit kicks off a build and deployment.

Here is what we need to do:

Put our code on GitHub
We will follow the same steps as we did in the section earlier on App Engine. Create a new repository and push code from our machine to GitHub.
Enable Cloud Build
Enable the Cloud Build API for our project
Enable the Kubernetes Engine API for our project
Grant Kubernetes Engine IAM to Cloud Service account by going to the IAM page for this service account @cloudbuild.gserviceaccount.com, edit it, and give it the Kubernetes Engine Admin role
Create a Cloud Build Configuration file
Create a new file cloudbuild.yaml that looks like this:

steps:- name: 'gcr.io/cloud-builders/npm'  args: ['install']- name: 'gcr.io/cloud-builders/npm'  args: ['test']- name: 'gcr.io/cloud-builders/docker'  args: ["build", "-t", "gcr.io/$PROJECT_ID/my-image:$REVISION_ID", "."]- name: 'gcr.io/cloud-builders/docker'  args: ["push", "gcr.io/$PROJECT_ID/image:$REVISION_ID"]- name: 'gcr.io/cloud-builders/kubectl' args: - 'set' - 'image' - 'deployment/my-deployment' - 'my-container=gcr.io/$PROJECT_ID/image:$REVISION_ID' env: - 'CLOUDSDK_COMPUTE_ZONE=us-east1-b' - 'CLOUDSDK_CONTAINER_CLUSTER=my-cluster'

This configuration has five build steps that will run npm install and then npm test to make sure our application works, then it will create a Docker image and push to GCR and then deploy our application to our Kubernetes cluster. The values _my-cluster, my-deployment and my-container_ in this file refer to resources in the Kubernetes cluster we have created (as per the guide we followed above). _$REVISION_ID_ is a variable value that Cloud Build injects into the configuration based on GitHub commit that triggers this build.

Run a Build manually (optional, for verification)
We can now deploy our application from our machine using Cloud Build
Run the cloud build command on your terminal: gcloud builds submit — config cloudbuild.yaml --substitutions=REVISION_ID=1 .

We’re also passing the revision id in this command, since we are manually running this build vs it being triggered by GitHub.

Head over to the Cloud Builds page to see the build in action.
At the end of the build, you can test out your Node application using the Kubernetes Service URL
You can make changes to your Node app and call this command again to kickoff more builds if you would like
Create a Build Trigger
The steps for setting this up are the same as that from the section above for App Engine. Go to Cloud Build Triggers page for this project, select the right GitHub repository, create a trigger called continuous deployment just for the master branch and you’re done.
Run a Build automatically by pushing to GitHub
This is also the same as the section above for App Engine — make a change, add, commit and push to GitHub which will kickoff a build that you can see on your Cloud Builds page. Once the builds completes, you will be able to see the updated app using the Kubernetes Service URL.

Here is a screenshot for a build being triggered through a GitHub push for our app:

The steps in this section were pretty much the same as the App Engine section. The main differences were that we had to containerize our application with Docker, spin up our Kubernetes cluster, and then have a Cloud Build configuration with just a few more steps.

But at its core, Cloud Build and its Build Triggers work pretty much the same and give us a seamless deployment experience. Our second application now gets deployed to Kubernetes Engine on every commit to master ??

_Photo by [Unsplash](https://unsplash.com/photos/Esq0ovRY-Zs?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText" rel="noopener" target="_blank" title="">Maximilian Weisbecker on Continuous Deployment for Cloud Functions

Sure, App Engine and Kubernetes Engine are great, but how about automated deployments for our Serverless app? I mean, having no servers to manage at all is really the best, right? Let’s do this!

Deploying a Node app to Cloud functions will require us to create a new project. No configuration files are needed, and once GCloud functions deploy on our terminal, our functions are deployed!

If you want to try this out for yourself, here are the resources you will need:

If you’ve been following along, you can probably already picture what steps we need to do:

Put our code on GitHub
We already know how to do this
Enable Cloud Build
Enable the Cloud Build API for our project
Enable the Cloud Functions API for our project.
Grant Cloud Functions IAM to Cloud Build Service account by going to the IAM page, find this service account @cloudbuild.gserviceaccount.com, edit it and give it the Project Editor role.
Create a Cloud Build Configuration file
Create a new file cloudbuild.yaml that looks like this:

steps:- name: 'gcr.io/cloud-builders/npm'  args: ['install']- name: 'gcr.io/cloud-builders/npm'  args: ['test']- name: 'gcr.io/cloud-builders/gcloud' args: - beta - functions - deploy - helloWorld - -- source=. - -- runtime=nodejs8 - -- trigger-http

Similar to the App Engine configuration, this configuration has 3 steps to install. Then test the build, and if all is good, then deploy it to Cloud Functions.

Run the Build manually (optional, for verification)
We can now deploy our function from our machine using Cloud Build
Run this in your terminal: gcloud builds submit — config cloudbuild.yaml .
Head over to the Cloud Builds page to see the build in action.
At the end of the build, you can test out your serverless app using the Cloud Function URL
Create a Build Trigger
The steps for setting this up are the same as that from the section above for App Engine and Kubernetes Engine. Go to Cloud Build Triggers page for this project, select the right GitHub repository, create a trigger called continuous deployment just for the master branch, and you’re done.
Run a Build automatically by pushing to GitHub
This is also the same as the section above for App Engine & Kubernetes Engine: make a change, add, commit and push to GitHub, which will kickoff a build that you can see on your Cloud Builds page. Once the build completes, you will be able to see the updated app using the Cloud Functions URL

Here is a screenshot for build being triggered through a GitHub push for our sample app:

Cloud Functions were super easy to setup with automated builds and makes the “code → build → test → push → deploy” workflow really really fast! Our third application now gets deployed to Cloud functions on every commit to master ???

_Photo by [Unsplash](https://unsplash.com/photos/kAjrml-a8R0?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText" rel="noopener" target="_blank" title="">Jassim Vailoces on Wrapping Up

Phew! We covered a lot of ground in this post. If this was your first time trying out GCP for Node, hopefully you got to see how easy and straightforward it is to try out the various options. If you were most eager to understand how to setup continuous deployment for apps on GCP, I hope you weren’t disappointed either!

Before you go, I just wanted to make sure that you didn’t miss the fact that all the sections had a sample app: Hello World for App Engine, Hello World for Kubernetes Engine and Hello World for Cloud Functions.

That’s it for now! Let’s go ship some code! ?

Thanks for following along. If you have any questions or want to report any mistakes in this post, do leave a comment.

If you found this article helpful, don’t be shy to ?

And you can follow me on Twitter here.

Decentralize your application with Google Cloud Platform

freeCodeCamp — Thu, 21 Dec 2017 22:57:25 +0000

By Simeon Kostadinov

When first starting a new software project, you normally choose a certain programming language, a specific framework and libraries. Then you begin coding. After 2 - 3 months you end up with a nicely working single application.

But, as the project grows and more functionalities are added, you quickly realize the disadvantages of a centralized system. Difficult to maintain and unscalable are some of the reasons which will make you search for a better solution. Here is where Microservices come in help.

What are Microservices?

Microservices are independently built systems, each running in their own process and often communicating with REST API. Representing different parts of your application, they are separately deployable and each part can be written in any language.

You can easily see how, by dealing with the problems of a monolithic system, Microservices have become a requirement for any state-of-the-art software.

I strongly recommend reading Microservices (by James Lewis) and On Monoliths and Microservices if you want to understand more in depth what are the key concepts in this architectural style.

What are you going to build?

This article will walk you through the process of implementing a Microservice using Google Cloud Platform.

Imagine you’re developing an application that accepts a text input from a user and determine the category of the key words within the input.

We’ll use an example to illustrate the functionality of the App. Consider the sample text below from the GCP Cloud Natural Language API website:

“Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show. Sundar Pichai said in his keynote that users love their new Android phones.”

Our web App would accept the text above as input, and return the category that the key words belong to, as in the figure below:

_Source: [GCP Cloud Natural Language API website](https://cloud.google.com/natural-language/" rel="noopener" target="blank" title=")

This feature is quite likeable and people use it hundreds of times each day. Now, if you’re going to offer this functionality as a service that receives a high amount of daily traffic, you want to respond with a stable and reliable system.

That’s why we’ll build a lightweight Flask App, hosted on Google App Engine. Integrating it with Google Cloud Pub/Sub will help us handle all the asynchronous requests we receive and help us assure that users don’t wait too long for a response.

Create and deploy the application

Let’s first start with the Flask app (you can also choose Django, Node.js, Go or anything used to build server-side applications). If you’re not very familiar with developing a Flask App, this Flask Series can show you step-by-step how to set up an application.

For the purpose of this tutorial we will use this simple example:

This embed is from an external site and no longer seems to be available

First you need to install the dependencies pip install Flask gunicorn. You will be using **gunicorn** to run the application on Google App Engine. For local access you can run python text.py in the console and find the app on port 8080.

To deploy the app to Google App Engine, you need to take these steps:

Create a project (follow the ‘Before you begin’ instructions from the documentation). Save the project id for later.
Create app.yaml file (shown below), which is used by the Google App Engine to recognize the application.
Run gcloud app deploy in the console.

The app.yaml file looks like this:

This embed is from an external site and no longer seems to be available

Line 3 is important, where you use **gunicorn** to tell Google App Engine to run the application **app** from a file called text.py (the Flask app). You can learn more about the .yaml file structure here. After deployment you should be able to access your project from https://[YOUR_PROJECT_ID].appspot.com.

When building production ready applications, you often want to test your code before pushing it live. One way to do this is to run your App within a server locally. A better approach is to have a development version of the app which can be tested not only from your local machine but also from a hosted environment. You can use Google App Engine versions for this.

Just deploy your App with gcloud app deploy -v textdev (for development) or gcloud app deploy -v textprod (for production).

Then navigate to https://textdev.[YOUR_PROJECT_ID].appspot.com or https://textprod.[YOUR_PROJECT_ID].appspot.com to access the specific version.

Scale to infinity

So far so good. You have a working application, hosted on the Google Cloud Platform. Now you need to add Google Cloud Pub/Sub and Google Natural Language API.

But first, let’s explain the architecture.

Once a request is received, the Flask app will publish a message with the text to a topic (created below). Then a subscriber (Python script) will pull this message and apply the Google Natural Language API to each text. Finally, the result will be saved to a database.

For multiple requests, the app asynchronously publishes them to the topic and the subscriber starts executing the first one. When ready, it picks the second one and so on.

Now you need to modify text.py file:

This embed is from an external site and no longer seems to be available

The code on line 15 and 16 creates the publisher. On line 18 it publishes a message containing the user email and text input.

You only need to fill in the project_id and topic_id (line 6 and 7).

Since the project_id was used earlier, just add it here.

For the topic_id you need to do the following:

Enable Google Cloud Pub/Sub API
Go to the Pub/Sub page of your project
Create a topic and a subscription
Use the topic name as your topic_id
Keep the subscription name for later.
You will need it as your subscription_id

Wonderful! Now you have a working publisher.

Let’s jump into setting up the subscriber. There are two files that need to be created: worker.py and startup-script.sh.

The worker.py looks like this:

This embed is from an external site and no longer seems to be available

The file is slightly larger but we will examine it step-by-step, starting from the bottom.

When the file is executed, the code on line 44 runs main(). This function sets the subscriber with your project_id and subscription_id and assigns a callback to it.

The callback (initialized on line 7) is going to receive all messages and perform the required task (to determine the category of a text). If you follow the code from the callback, you can easily see how the Google Natural Language API is being used.

The interesting line is 11 where message.ack() acknowledges the current message. You can see this is as if the worker is saying: “I am done with this message and ready to handle the next one”.

Now, you need to implement startup-script.sh.

This is a shell script with several commands:

This embed is from an external site and no longer seems to be available

Before explaining the code above, I need to clarify the process.

Basically, Google Cloud Compute Engine gives you the ability to scale an application by providing as many virtual machines (VM) as needed to run several workers simultaneously.

You just need to add the code for the worker, which you already have, and set the configurations of the VM. Together with the worker.py, you also need to add a startup-script.sh which will run every time a new VM boots up.

New VM instances are booted up to prevent delay in responses when a high number of messages is received.

For a deeper and more technical explanation of this process check out the documentation.

Now, let me walk you through the script:

Line 1: means that the script should always be run with bash, rather than another shell.
Lines 2 and 3: creates and enters into a new directory where all of the files will be stored.
Line 4: copies the worker.py file from Google Cloud Storage into the VM (I will explain how to upload your files to the storage below).
Line 5: here you need to specify a JSON string of your key so that Google can verify your credentials. In order to get this string you need to create a service account. Select **Furnish a new private key** and for **Key type** use JSON. A file will be downloaded to your computer. Copy the content and turn it into a JSON string (using JSON.stringify(key_in_json_format) in a browser console). Paste it instead of SERVICE_ACCOUNT_KEY.
Line 6: exports the key as an environment variable which will be used by the Google APIs to verify your credentials.
Lines 7 - 12: sets up configurations and installs the python libraries.
Line 15: runs the worker.

Now you need to upload worker.py and startup-script.sh to your storage and set up the VM. To upload the files just go here and create a new bucket with the same name as your project id. Create a folder called workers and upload the scripts inside. Make sure to change the worker.py to a ‘Public link’ and edit the permissions of the _startup-script.sh_ to have your service account as an owner.

Configurations and testing

The final step is to set up the configurations of the VM and test the system. Just follow the ‘Create an instance template’ instructions from the documentation and you are good to go!

Once the VM boots up, you can try sending requests to your application and examine how it reacts by checking the logs.

Final thoughts

Going through Google’s documentation may help you a lot. Also check out this tutorial - you may find it useful while implementing some of the steps above.

I want to express my gratefulness to Logan Allen for helping me better understand this process. I hope you find it useful.

Leave any questions or suggestions in the comment section.

Google Cloud Platform - freeCodeCamp.org

How to Use Google Dataproc – Example with PySpark and Jupyter Notebook

How to Create a Dataproc Cluster

How to Submit a PySpark Job

How to Create a Jupyter Notebook Instance

Conclusion

Google Cloud Platform Tutorial: From Zero to Hero with GCP

How to get started with Google Cloud Platform for free

Why would you migrate your services to Google Cloud Platform?

How to optimize your VMs to reduce costs in GCP

Custom Machine Types

Preemptible VM's

Sustained Use Discounts

Committed Use Discounts

Labels

Labels vs Network tags

Identities

Roles

Cloud Logging

VPC Flow Logs

Cloud Monitoring

Alerts

Trace

Error Reporting

Debug

Profile

How to store data in GCP

Permissions in GCS

Bucket lock

Relational Managed Databases in GCP

NoSQL Managed Databases in GCP

How to choose your database

How does networking work in GCP?

Virtual Private Cloud (VPC) - see the docs here

How to share resources between multiple VPCs

How to connect on-premise and GCP infrastructures

Other networking services

Where can you run your applications in GCP?

Where to store your VM's data: disks

Cloud Storage

How to back up your VM's data: Snapshots

Instance groups

Security best practices for GCE

How to work with Big Data in GCP

Cloud Pub/Sub vs Cloud Task

How to explore and visualize your data in GCP

Encryption on Google Cloud Platform

More GCP resources

Time to test your knowledge

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Answers

Back to the initial proposition

Conclusion

How to Pass Almost Every Google Cloud Platform Professional Certification Exam

About the certification exams

Professional Cloud Architect

Professional Data Engineer

Professional Cloud Security Engineer

Professional Cloud Network Engineer

Professional Cloud Developer

The Preparation Process

Study guides

Disclaimer

Tips for taking your exams

Conclusion

How to secure and manage secrets using Google Cloud KMS

Step1: Preparing Secrets

Step2: Creating KMS Secret Keys

Step3: Assigning Permission to use these keys

Production Environment:

Staging Environment: