Snowplow Analytics Consulting
One tool among many to transform your business
Our clients Bark&Co as well as Valvoline both utilize Snowplow Analytics as an essential part of their data analytics workflow in conjunction with Amazon’s Redshift for data storage and Looker or DOMO for business intelligence and making sense of their data. When considering implementing Snowplow analytics with your organization you really have to ensure you have all the essential pieces in place. Snowplow serves as a great tool for getting your data from point A to point B, B, in this case, being your database. This is useless however if you don’t have the tools to process this information properly.
Valvoline’s Integration
For our client, Valvoline we created a website with embedded charts from looker that the client can visit and view metrics in a whole host of areas. Some example areas are Content Metrics, Commerce Metrics, User Metrics and KPIs. To achieve this we wrote ten Drupal modules to track site data for Valvoline, 3 modules on the Drupal Commerce pro Valvoline Site, and 7 on the team Valvoline site. we also have a laravel application that runs a custom artisan command daily to ingest CSV data that we get from their other partners. This contains data mostly relating to rebates earned via in-store purchases. This external data is input directly into redshift and bypasses snowplow altogether; this illustrates how much control we have over the data and how the tools bend to our needs we don’t change our needs to fit the tool.
Snowplow for the tracking on the website is just a tool to get this data into Redshift in a sane way but redshift at the end of the day is just a database, so we have the ability to write things to that database ourselves.
Bark & Co’s Integration
For Bark&Co we implemented our snowplow tracking via a WordPress plugin. BarkPost is the site that we wrote the tracking for, but we also have data from the other sites in the Bark&Co network such as BarkBox and BarkShop. With snowplow, we have the ability to track events from multiple domains in the same database and the events are differentiated by a set app_id that is provided by the code tracking the events. So this makes it easy to follow a user’s journey from site to site as an example following a person that just read a story about a dog toy on BarkPost and then proceeded to go to BarkShop and purchase that toy.
Almost the entirety of the BarkPost reader base comes from social networks, so a big thing that we in conjunction with the growth hacker at Bark&Co worked on was looking at the retention of users week over week broken out by the social network. I wrote the query below to get this data.
WITH cohort_dfn_by_refr_channel_acquired_by_week AS (
SELECT
domain_userid,
refr_acquired_medium,
refr_acquired_source
FROM (
SELECT
domain_userid,
refr_medium AS refr_acquired_medium,
refr_source AS refr_acquired_source,
rank()
OVER (PARTITION BY domain_userid, refr_medium, refr_source
ORDER BY collector_tstamp) AS visit_number
FROM atomic.events
WHERE refr_medium != 'internal'
AND refr_source != ''
AND event = 'page_view'
AND useragent != 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
AND app_id = 'barkpost-web'
) t
WHERE visit_number = 1
),
retention_by_week_by_user AS (
SELECT
domain_userid,
rank()
OVER (PARTITION BY t.domain_userid
ORDER BY t.weeks_active ASC) AS weeks_active
FROM (
SELECT
domain_userid,
DATE_TRUNC('week', collector_tstamp) AS weeks_active
FROM atomic.events
WHERE app_id = 'barkpost-web'
AND event = 'page_view'
AND useragent != 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
AND refr_medium != 'internal'
GROUP BY 1, 2
) t
),
retention_totals AS (
SELECT
first.refr_acquired_medium AS refr_acquired_medium,
first.refr_acquired_source AS refr_acquired_source,
total.weeks_active AS week_rank,
count(DISTINCT total.domain_userid) AS uniques
FROM retention_by_week_by_user AS total JOIN cohort_dfn_by_refr_channel_acquired_by_week AS first
ON first.domain_userid = total.domain_userid
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3
),
retention_starting_totals AS (
SELECT
refr_acquired_medium,
refr_acquired_source,
sum(count(DISTINCT domain_userid)) OVER (PARTITION BY refr_acquired_medium,refr_acquired_source) as starting_total
FROM cohort_dfn_by_refr_channel_acquired_by_week
GROUP BY 1,2
),
retention_normalize AS (
SELECT
total.refr_acquired_medium,
total.refr_acquired_source,
total.week_rank,
sum(uniques) as uniques,
start.starting_total,
SUM(uniques) / sum(start.starting_total) :: Real * 100 as percentage
FROM retention_totals as total INNER JOIN retention_starting_totals as start on total.refr_acquired_source=start.refr_acquired_source AND total.refr_acquired_medium=start.refr_acquired_medium
GROUP BY total.refr_acquired_medium, total.refr_acquired_source, total.week_rank,start.starting_total
)
SELECT *
FROM retention_normalize
ORDER BY 2, 3;
This query is then inputted into looker in the form of lookml files that we then use to build our visualizations. An example of a lookml file we have written is below.
# Copyright (c) 2016 Usterix LLC. All rights reserved.
#
# This program is licensed to you under the Apache License Version 2.0,
# and you may not use this file except in compliance with the Apache License Version 2.0.
# You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the Apache License Version 2.0 is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the Apache License Version 2.0 for the specific language governing permissions and limitations there under.
#
# Version: 1-0-0
#
# Authors: William Wilkerson
# Copyright: Copyright (c) 2016 Usterix LLC
# License: Apache License Version 2.0
- view: weekly_retention_overall
derived_table:
sql: |
WITH cohort_dfn_by_week_first_touch_website AS (
SELECT
domain_userid,
DATE_TRUNC('week', MIN(collector_tstamp)) AS cohort
FROM atomic.events
WHERE event = 'page_view'
AND useragent != 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
GROUP BY 1),
retention_by_user_by_week AS (
SELECT
domain_userid,
DATE_TRUNC('week', collector_tstamp) AS weeks_active
FROM "atomic".events
WHERE event = 'page_view'
GROUP BY 1, 2),
retention_totals AS (
SELECT
cohort,
weeks_active AS week_actual,
rank()
OVER (PARTITION BY cohort
ORDER BY weeks_active ASC) AS week_rank,
COUNT(DISTINCT (m.domain_userid)) AS uniques,
COUNT(DISTINCT (m.domain_userid)) / (first_value(COUNT(DISTINCT (m.domain_userid)))
OVER (PARTITION BY cohort)) :: REAL AS fraction_retained
FROM cohort_dfn_by_week_first_touch_website c
JOIN retention_by_user_by_week m
ON c.domain_userid = m.domain_userid
GROUP BY 1, 2
ORDER BY 1, 2
),
retention_normalize AS (
SELECT
week_rank,
sum(uniques) AS uniques,
(SELECT sum(uniques)
FROM retention_totals
WHERE week_rank = 1) AS starting_total
FROM retention_totals
GROUP BY week_rank
)
SELECT
retention_normalize.uniques AS uniques,
retention_normalize.week_rank AS week_rank,
retention_normalize.starting_total AS starting_total,
SUM(uniques) / first_value(starting_total) OVER(PARTITION BY week_rank) :: REAL * 100 as percentage
FROM retention_normalize
GROUP BY 1, 2, 3
ORDER BY 2 DESC
fields:
- dimension: week_rank
type: number
sql: ${TABLE}.week_rank
- dimension: starting_total
type: number
sql: ${TABLE}.starting_total
- dimension: returning
type: number
sql: ${TABLE}.uniques
- dimension: percentage_users_retained
type: number
value_format: '0.00\%'
sql: ${TABLE}.percentage
We then were able to go even further with this data and look at the pages viewed in those sessions and put together a ranked list of which pieces of content pulled in the most users from social networks and which ones kept readers coming back. We then also wrote another query to pull a similarly ranked list but broken out instead by the vertical/category of the content to see that articles written under the humor vertical as an example had almost quadruple the amount of traffic of the travel vertical.
In Summary
There’s a lot of information above indeed maybe too much. Our hope is that we have educated you on the implementations of business analytics solutions utilizing tools such as snowplow.