Get Started with BigQuery

Prajwol KC
3 min readNov 12, 2020

--

BigQuery helps you perform interactive analysis of petabyte-scale databases, and it enables near-real-time analysis of massive datasets. It offers a familiar SQL 2011 query language and functions.

Data stored in BigQuery is highly durable. Google stores your data in a replicated manner by default and at no additional charge for replicas. With BigQuery, you pay only for the resources you use. Data storage in BigQuery is inexpensive. Queries incur charges based on the amount of data they process: when you submit a query, you pay for the compute nodes only for the duration of that query. You don’t have to pay to keep a compute cluster up and running.

Using BigQuery involves interacting with a number of Google Cloud Platform resources, including projects (covered elsewhere in this course), datasets, tables, and jobs. This lab introduces you to some of these resources, and this brief introduction summarizes their role in interacting with BigQuery.

Datasets: A dataset is a grouping mechanism that holds zero or more tables. A dataset is the lowest level unit of access control. Datasets are owned by GCP projects. Each dataset can be shared with individual users.

Tables: A table is a row-column structure that contains actual data. Each table has a schema that describes strongly-typed columns of values. Each table belongs to a dataset.

Load data from Cloud Storage into BigQuery

  1. Create a new dataset within your project by selecting your project in the Resources section, then clicking on CREATE DATASET on the right.
  2. For Data location, select the continent closest to the region your project was created in. click Create dataset.
  3. Create a new table in the logdata to store the data from the CSV file.
  • For Create table from, choose select Google Cloud Storage, and in the field, type gs://cloud-training/gcpfci/access_log.csv.
  • Verify File format is set to CSV.

Note: When you have created a table previously, the Create from Previous Job option allows you to quickly use your settings to create similar tables.

  • For Dataset name, leave logdata selected.
  • For Table name, type accesslog.
  • For Table type, Native table should be selected.
  1. Accept the remaining default values and click Create Table.BigQuery creates a load job to create the table and upload data into the table (this may take a few seconds).
  2. (Optional) To track job progress, click Job History.
  3. When the load job is complete, click logdata > accesslog.

Perform a query on the data using the BigQuery web UI

you use the BigQuery web UI to query the accesslog table you created previously.

  1. In the Query editor window, type (or copy-and-paste) the following query:
  2. Because you told BigQuery to automatically discover the schema when you load the data, the hour of the day during which each web hit arrived is in a field called int_field_6.
select int64_field_6 as hour, count(*) as hitcount from logdata.accesslog group by hour order by hour

Notice that the Query Validator tells you that the query syntax is valid (indicated by the green checkmark) and indicates how much data the query will process. The amount of data processed allows you to determine the price of the query using the Cloud Platform Pricing Calculator.

3. Click Run and examine the results. At what time of day is the website busiest?

Perform a query on the data using the bq command

Here, we use the bq command in Cloud Shell to query the accesslog table you created previously.

bq query "select string_field_10 as request, count(*) as requestcount from logdata.accesslog group by request order by requestcount desc"

The first time you use the bq command, it caches your Google Cloud Platform credentials and then asks you to choose your default project. Choose the project that Qwiklabs assigned you to. Its name will look like qwiklabs-gcp- followed by a hexadecimal number.

The bq command then performs the action requested on its command line.

Originally published at https://prajwol-kc.com.np on November 12, 2020.

--

--

No responses yet