Docs
Older Version

Data Pipelines

Overview

⚠️

This version of managing pipelines via API is Deprecated going forward. For the customers who earlier used these apis to create and manage can continue using it. Otherwise Go to Integrations page in mixpanel to create and manage pipelines via UI and go to docs for more info.

Data Pipelines is a paid add-on. Visit our pricing page (opens in a new tab) to add it to your plan.

Data Pipelines continuously exports the events in your Mixpanel project to a cloud storage bucket or data warehouse of your choice. It's useful if you want to analyze Mixpanel events using SQL in your own environment.

Using Data Pipelines requires 2 steps:

  1. Configuring your destination to allow Mixpanel to write to it.
  2. Telling Mixpanel to start exporting your data to that destination using the Pipelines API.

We offer a 30-day free trial of the Data Pipelines add-on. See the FAQ for how to enable it.

Step 1: Configuring your destination

Configuration depends on the type of Pipeline you want to set up.

Raw

Raw Pipelines export events as JSON to a cloud storage bucket. This is the simplest approach.

See our configuration guides for each raw destination:

Upon successful creation of a pipeline, events will be exported to the following locations:

  • Hourly: <BUCKET_NAME>/<PATH_PREFIX>/<MIXPANEL_PROJECT_ID>/<YEAR>/<MONTH>/<DAY>/<HOUR>
  • Daily: <BUCKET_NAME>/<PATH_PREFIX>/<MIXPANEL_PROJECT_ID>/<YEAR>/<MONTH>/<DAY>/full_day

An empty complete file will be written in the finished hour or day prefix to indicate that the export is complete. The absence of this file means there is an ongoing export for that hour or day.

Schematized

Schematized Pipelines export events into tables with schemas generated by Mixpanel, inferred from your event history. There are two types of schemas, which you can configure:

  • Monoschema: A single table for all events in which you have the event name as the column and one column per property.
  • Multischema: One table per event name with the properties of that event as columns.

See our configuration guides for each schematized destination:

The Schematized Pipeline reference goes the details of schematization and the output format.

Step 2: Creating the Pipeline

Once you’ve configured your destination, you need to tell Mixpanel to start exporting to that destination.

You can do this with our Create Pipeline API (opens in a new tab). You can create the Pipeline directly from our developer docs UI.

Limits:

  • For event export pipelines (data_source: events) in each Mixpanel project, we support at most two recurring pipelines (to_date is empty) and one non-recurring pipeline (has a to_date that is the ending date of the export window).
  • Note that from_date must also be no more than 6 months from the date the pipeline is created.

FAQ

1. Why are some events or properties not exported to the destination?

This normally happens when you have thousands of unique event names or property names, which is usually an implementation mistake (eg: including a UUID in the event or property name). This causes the export process to exceed table or column limits in the destination. Mixpanel itself imposes a limit of 10K unique properties in your schema after tranformation rules have been applied. Any projects exceeding this limit will have their pipelines paused until the issue can be remediated. If you notice an error in your pipelines around exceeding this limitation please try to identify a regex selector that selects some properties you would like to filter out of your schema and reach out to our support team for assistance.

2. Why does the number of events in Mixpanel not match the number of exported events to my destination?

This can happen for a few reasons:

  • Data Sync is not enabled or not supported for your pipeline.
  • Data Delay: it can take up to 1 day for late arriving data to be synced from Mixpanel to your destination.
  • Hidden Events: Mixpanel exports all events to your destination, even ones that are hidden in the UI via Lexicon. We recommend checking whether the count in your destination is mostly due to events that have been hidden in the Mixpanel UI.

3. What timezone is used for my event exports?

The timestamp for exported events from your pipeline will be in the UTC timezone. For projects created before 1 Jan 2023, the raw event timestamps are transformed from Project Timezone to UTC. For projects after 1 Jan 2023, raw event data is stored in UTC. Learn more about managing timezones here.

4. How can I count events exported by Mixpanel in the warehouse?

Counting events can be slightly different for each warehouse, since we use different partitioning methods. Here are examples for BigQuery and Snowflake.

Free Trial

Mixpanel offers a 30-day trial version of the Data Pipelines. The trial allows for one data export pipeline per project to be created. Simply pass trial=true to our API to create a trial pipeline.

Trial limitations:

  • Export scheduling is daily only.
  • Data sync is unavailable.
  • You can only create one pipeline per project.
  • Backfilled data will only include one day prior to the creation date.
  • Pipelines will, by default, include both event and user data (not available for raw pipelines).
  • The pipeline cannot filter by event name.
  • The “Create Pipeline” parameters will default to the values highlighted to list in the parameters table (opens in a new tab).

Was this page useful?