Skip to main content

Configuration reference

Introduction

This document describes all available configuration options for d8a.

When multiple configuration sources are provided, values are resolved in the following order of precedence:

  • CLI flags (highest priority)
  • Environment variables
  • YAML configuration file (lowest priority)

The configuration file is a YAML file. You can specify a custom location using the --config or -c flag.

Configuration keys


--dbip-destination-directory

Directory where the DB-IP database files are stored after downloading from the OCI registry. If the database already exists at this location, the download is skipped. Defaults to a temporary directory if not specified.

Configuration key: dbip.destination_directory
Environment variable: DBIP_DESTINATION_DIRECTORY

Default: /tmp/dbip


--dbip-download-timeout

Maximum time to wait for downloading the DB-IP MaxMind database from the OCI registry during program startup. If the download exceeds this timeout, the program will fail to start with DBIP columns enabled.

Configuration key: dbip.download_timeout
Environment variable: DBIP_DOWNLOAD_TIMEOUT

Default: 1m0s


--dbip-enabled

When enabled, adds geolocation column implementations (city, country, etc.) using DB-IP database. On program startup, downloads the DB-IP database from the OCI registry (ghcr.io/d8a-tech). The database is cached locally and reused on subsequent runs if already present.

Configuration key: dbip.enabled
Environment variable: DBIP_ENABLED


--device-detector-provider

Device detector provider (dd2 or stub)

Configuration key: device_detector.provider
Environment variable: DEVICE_DETECTOR_PROVIDER

Default: dd2


--filters-conditions

Array of filter conditions for traffic filtering. Each condition is a JSON-encoded string with fields: 'name' (string identifier), 'type' (exclude or allow), 'test_mode' (boolean), 'expression' (filter expression). Example: {"name":"internal_traffic","type":"exclude","test_mode":false,"expression":"ip_address == '10.0.0.1'"}. Can be set via CLI flag, environment variable (FILTERS_CONDITIONS), or YAML config (filters.conditions). Conditions from flag/env are appended to YAML conditions. See Traffic filtering for details.

Configuration key: filters.conditions
Environment variable: FILTERS_CONDITIONS


--filters-fields

Array of field names to make available to filter expressions. Can contain any event-scoped column names. These fields are injected into the expression environment and can be referenced in filter condition expressions. Example: ip_address, event_name, user_id, page_location. The default value includes ip_address for backward compatibility. See Traffic filtering for details.

Configuration key: filters.fields
Environment variable: FILTERS_FIELDS

Default: [ip_address]


--monitoring-enabled

Enable OpenTelemetry metrics

Configuration key: monitoring.enabled
Environment variable: MONITORING_ENABLED


--monitoring-otel-endpoint

OTel collector endpoint for metrics

Configuration key: monitoring.otel_endpoint
Environment variable: MONITORING_OTEL_ENDPOINT

Default: localhost:4317


--monitoring-otel-export-interval

Interval for exporting metrics to OTel collector

Configuration key: monitoring.otel_export_interval
Environment variable: MONITORING_OTEL_EXPORT_INTERVAL

Default: 30s


--monitoring-otel-insecure

Allow insecure (non-TLS) connection to OTel collector

Configuration key: monitoring.otel_insecure
Environment variable: MONITORING_OTEL_INSECURE


--property-id

Property ID, used to satisfy interfaces required by d8a cloud. Ends up as column in the warehouse.

Configuration key: property.id
Environment variable: PROPERTY_ID

Default: default


--property-name

Property name, used to satisfy interfaces required by d8a cloud. Ends up as column in the warehouse.

Configuration key: property.name
Environment variable: PROPERTY_NAME

Default: Default property


--property-settings-split-by-campaign

When enabled, splits a session into multiple sessions when the UTM campaign parameter value changes between events. This allows tracking separate sessions for different marketing campaigns within the same user visit.

Configuration key: property.settings.split_by_campaign
Environment variable: PROPERTY_SETTINGS_SPLIT_BY_CAMPAIGN

Default: true


--property-settings-split-by-max-events

Splits a session into multiple sessions when the number of events exceeds this value. This prevents sessions with excessive event counts from being stored as a single large session.

Configuration key: property.settings.split_by_max_events
Environment variable: PROPERTY_SETTINGS_SPLIT_BY_MAX_EVENTS

Default: 1000


--property-settings-split-by-time-since-first-event

Splits a session into multiple sessions when the time elapsed since the first event exceeds this duration. This prevents extremely long sessions from being grouped together, creating more meaningful session boundaries.

Configuration key: property.settings.split_by_time_since_first_event
Environment variable: PROPERTY_SETTINGS_SPLIT_BY_TIME_SINCE_FIRST_EVENT

Default: 12h0m0s


--property-settings-split-by-user-id

When enabled, splits a session into multiple sessions when the user ID value changes between events. This ensures that events from different authenticated users are not grouped into the same session.

Configuration key: property.settings.split_by_user_id
Environment variable: PROPERTY_SETTINGS_SPLIT_BY_USER_ID

Default: true


--protocol

Protocol to use for tracking requests. Valid values are 'ga4', 'd8a'.

Configuration key: protocol
Environment variable: PROTOCOL

Default: ga4


--queue-backend

Queue backend used between receiver and worker (filesystem or objectstorage)

Configuration key: queue.backend
Environment variable: QUEUE_BACKEND

Default: filesystem


--queue-object-storage-gcs-bucket

QUEUE_OBJECT_STORAGE GCS bucket name (only used when queue-object-storage-type=gcs)

Configuration key: queue.object_storage.gcs.bucket
Environment variable: QUEUE_OBJECT_STORAGE_GCS_BUCKET


--queue-object-storage-gcs-creds-json

QUEUE_OBJECT_STORAGE GCS credentials JSON (raw or base64); empty uses ADC (only used when queue-object-storage-type=gcs)

Configuration key: queue.object_storage.gcs.creds_json
Environment variable: QUEUE_OBJECT_STORAGE_GCS_CREDS_JSON


--queue-object-storage-gcs-project

QUEUE_OBJECT_STORAGE GCS project ID (optional; only used when queue-object-storage-type=gcs)

Configuration key: queue.object_storage.gcs.project
Environment variable: QUEUE_OBJECT_STORAGE_GCS_PROJECT


--queue-object-storage-prefix

Object storage prefix/namespace for QUEUE_OBJECT_STORAGE objects

Configuration key: queue.object_storage.prefix
Environment variable: QUEUE_OBJECT_STORAGE_PREFIX

Default: d8a/queue


--queue-object-storage-s3-access-key

QUEUE_OBJECT_STORAGE S3/MinIO access key (only used when queue-object-storage-type=s3)

Configuration key: queue.object_storage.s3.access_key
Environment variable: QUEUE_OBJECT_STORAGE_S3_ACCESS_KEY


--queue-object-storage-s3-bucket

QUEUE_OBJECT_STORAGE S3/MinIO bucket name (only used when queue-object-storage-type=s3)

Configuration key: queue.object_storage.s3.bucket
Environment variable: QUEUE_OBJECT_STORAGE_S3_BUCKET


--queue-object-storage-s3-create-bucket

QUEUE_OBJECT_STORAGE: create bucket on startup if missing (only used when queue-object-storage-type=s3)

Configuration key: queue.object_storage.s3.create_bucket
Environment variable: QUEUE_OBJECT_STORAGE_S3_CREATE_BUCKET


--queue-object-storage-s3-host

QUEUE_OBJECT_STORAGE S3/MinIO host (only used when queue-object-storage-type=s3)

Configuration key: queue.object_storage.s3.host
Environment variable: QUEUE_OBJECT_STORAGE_S3_HOST


--queue-object-storage-s3-port

QUEUE_OBJECT_STORAGE S3/MinIO port (only used when queue-object-storage-type=s3)

Configuration key: queue.object_storage.s3.port
Environment variable: QUEUE_OBJECT_STORAGE_S3_PORT

Default: 9000


--queue-object-storage-s3-protocol

QUEUE_OBJECT_STORAGE S3 endpoint protocol (http or https; only used when queue-object-storage-type=s3)

Configuration key: queue.object_storage.s3.protocol
Environment variable: QUEUE_OBJECT_STORAGE_S3_PROTOCOL

Default: http


--queue-object-storage-s3-region

QUEUE_OBJECT_STORAGE S3 region (only used when queue-object-storage-type=s3)

Configuration key: queue.object_storage.s3.region
Environment variable: QUEUE_OBJECT_STORAGE_S3_REGION

Default: us-east-1


--queue-object-storage-s3-secret-key

QUEUE_OBJECT_STORAGE S3/MinIO secret key (only used when queue-object-storage-type=s3)

Configuration key: queue.object_storage.s3.secret_key
Environment variable: QUEUE_OBJECT_STORAGE_S3_SECRET_KEY


--queue-object-storage-type

QUEUE_OBJECT_STORAGE object storage type (s3 or gcs)

Configuration key: queue.object_storage.type
Environment variable: QUEUE_OBJECT_STORAGE_TYPE


--queue-objectstorage-interval-exp-factor

Exponential backoff factor for objectstorage queue consumer polling interval (only used for objectstorage backend)

Configuration key: queue.object_storage.interval_exp_factor
Environment variable: QUEUE_OBJECTSTORAGE_INTERVAL_EXP_FACTOR

Default: 1.5


--queue-objectstorage-max-interval

Maximum polling interval for objectstorage queue consumer exponential backoff (only used for objectstorage backend)

Configuration key: queue.object_storage.max_interval
Environment variable: QUEUE_OBJECTSTORAGE_MAX_INTERVAL

Default: 1m0s


--queue-objectstorage-max-items-to-read-at-once

Maximum number of items to read in one batch from objectstorage queue (only used for objectstorage backend)

Configuration key: queue.object_storage.max_items_to_read_at_once
Environment variable: QUEUE_OBJECTSTORAGE_MAX_ITEMS_TO_READ_AT_ONCE

Default: 1000


--queue-objectstorage-min-interval

Minimum polling interval for objectstorage queue consumer (only used for objectstorage backend)

Configuration key: queue.object_storage.min_interval
Environment variable: QUEUE_OBJECTSTORAGE_MIN_INTERVAL

Default: 5s


--receiver-batch-size

Maximum number of hits to accumulate before flushing to the queue storage. When this many hits are received, they are immediately flushed even if the timeout hasn't been reached.

Configuration key: receiver.batch_size
Environment variable: RECEIVER_BATCH_SIZE

Default: 5000


--receiver-batch-timeout

Maximum time to wait before flushing accumulated hits to the queue storage. Hits are flushed when either this timeout is reached or the batch size limit is exceeded, whichever comes first.

Configuration key: receiver.batch_timeout
Environment variable: RECEIVER_BATCH_TIMEOUT

Default: 1s


--receiver-max-hit-kbytes

Maximum size of a hit in kilobytes. Tracking requests are rejected if they contain a hit, which exceeds this size.

Configuration key: receiver.max_hit_kbytes
Environment variable: RECEIVER_MAX_HIT_KBYTES

Default: 128


--server-host

Host to listen on for HTTP server

Configuration key: server.host
Environment variable: SERVER_HOST

Default: 0.0.0.0


--server-port

Port to listen on for HTTP server

Configuration key: server.port
Environment variable: SERVER_PORT

Default: 8080


--sessions-join-by-session-stamp

When enabled, the system will merge proto-sessions that share the same session stamp identifier, even if they have different client IDs. This allows tracking user sessions across different devices or browsers when they share a common session identifier, enabling cross-device session continuity for authenticated or identified users.

Configuration key: sessions.join_by_session_stamp
Environment variable: SESSIONS_JOIN_BY_SESSION_STAMP

Default: true


--sessions-join-by-user-id

When enabled, the system will merge proto-sessions that share the same user ID, even if they have different client IDs. This enables cross-device session tracking for authenticated users, allowing hits from different devices or browsers to be grouped into a single session when they share the same authenticated user identifier. Only hits that include a user ID value will participate in this joining behavior.

Configuration key: sessions.join_by_user_id
Environment variable: SESSIONS_JOIN_BY_USER_ID

Default: true


--sessions-timeout

Maximum time period of inactivity after which a proto-session is considered expired and ready to be closed. The system uses a timing wheel to schedule session closures based on each hit's server received time plus this duration. After this period elapses without new hits, the proto-session is finalized and written to the warehouse as a completed session.

Configuration key: sessions.timeout
Environment variable: SESSIONS_TIMEOUT

Default: 30m0s


--storage-bolt-directory

Directory path where BoltDB database files are stored. This directory hosts two databases: 'bolt.db' for proto-session data, identifier metadata, and timing wheel bucket information, and 'bolt_kv.db' for key-value storage. These databases persist session state across restarts and are essential for session management functionality.

Configuration key: storage.bolt_directory
Environment variable: STORAGE_BOLT_DIRECTORY

Default: .


--storage-queue-directory

Directory path where batched hits are stored in a filesystem-based queue before being processed by background workers. This directory acts as a persistent buffer between the receiver and the session processing pipeline.

Configuration key: storage.queue_directory
Environment variable: STORAGE_QUEUE_DIRECTORY

Default: ./queue


--storage-spool-directory

Directory path where sessions are stored in a filesystem-based spool before being written to the warehouse. This directory acts as a persistent buffer between the session writer and the warehouse.

Configuration key: storage.spool_directory
Environment variable: STORAGE_SPOOL_DIRECTORY

Default: ./spool


--storage-spool-enabled

Enable spooling of sessions to a filesystem-based spool before writing to the warehouse. This can improve performance by deferring the writes to the warehouse.

Configuration key: storage.spool_enabled
Environment variable: STORAGE_SPOOL_ENABLED

Default: true


--storage-spool-write-chan-buffer

Capacity of the spool writer's input channel. Larger values reduce blocking of close path when L2 flush runs (improves close p99) at the cost of more sessions in memory on crash. Zero = unbuffered.

Configuration key: storage.spool_write_chan_buffer
Environment variable: STORAGE_SPOOL_WRITE_CHAN_BUFFER

Default: 1000


--telemetry-url

Telemetry endpoint URL for sending usage events. Anonymous and non-invasive: collects only app version and runtime duration. Client ID (UUID) is generated per app start and not persisted, resetting on each restart. If empty, telemetry is disabled.

Configuration key: telemetry.url
Environment variable: TELEMETRY_URL

Default: https://global.t.d8a.tech/28b4fbc6-a4d0-49c4-883f-58314f83416e/g/collect


--warehouse-bigquery-creds-json

BigQuery service account JSON (raw or base64). Only applicable when warehouse-driver is set to 'bigquery'.

Configuration key: warehouse.bigquery.creds_json
Environment variable: WAREHOUSE_BIGQUERY_CREDS_JSON


--warehouse-bigquery-dataset-name

BigQuery dataset name. Only applicable when warehouse-driver is set to 'bigquery'.

Configuration key: warehouse.bigquery.dataset_name
Environment variable: WAREHOUSE_BIGQUERY_DATASET_NAME


--warehouse-bigquery-partition-expiration-days

BigQuery partition expiration in days. 0 means partitions do not expire. By default uses no expiration.

Configuration key: warehouse.bigquery.partition_expiration_days
Environment variable: WAREHOUSE_BIGQUERY_PARTITION_EXPIRATION_DAYS


--warehouse-bigquery-partition-field

BigQuery partition field (top-level TIMESTAMP or DATE). By default uses date_utc column.

Configuration key: warehouse.bigquery.partition_field
Environment variable: WAREHOUSE_BIGQUERY_PARTITION_FIELD

Default: date_utc


--warehouse-bigquery-partition-interval

BigQuery partition interval (HOUR, DAY, MONTH, YEAR). By default uses DAY interval.

Configuration key: warehouse.bigquery.partition_interval
Environment variable: WAREHOUSE_BIGQUERY_PARTITION_INTERVAL

Default: DAY


--warehouse-bigquery-project-id

BigQuery GCP project ID. Only applicable when warehouse-driver is set to 'bigquery'.

Configuration key: warehouse.bigquery.project_id
Environment variable: WAREHOUSE_BIGQUERY_PROJECT_ID


--warehouse-bigquery-query-timeout

BigQuery query timeout. Only applicable when warehouse-driver is set to 'bigquery'.

Configuration key: warehouse.bigquery.query_timeout
Environment variable: WAREHOUSE_BIGQUERY_QUERY_TIMEOUT

Default: 30s


--warehouse-bigquery-table-creation-timeout

BigQuery table creation timeout. Only applicable when warehouse-driver is set to 'bigquery'.

Configuration key: warehouse.bigquery.table_creation_timeout
Environment variable: WAREHOUSE_BIGQUERY_TABLE_CREATION_TIMEOUT

Default: 10s


--warehouse-bigquery-writer-type

BigQuery writer type (loadjob or streaming). Only applicable when warehouse-driver is set to 'bigquery'.

Configuration key: warehouse.bigquery.writer_type
Environment variable: WAREHOUSE_BIGQUERY_WRITER_TYPE

Default: loadjob


--warehouse-clickhouse-database

ClickHouse database name. Only applicable when warehouse-driver is set to 'clickhouse'.

Configuration key: warehouse.clickhouse.database
Environment variable: WAREHOUSE_CLICKHOUSE_DB


--warehouse-clickhouse-host

ClickHouse host. Only applicable when warehouse-driver is set to 'clickhouse'.

Configuration key: warehouse.clickhouse.host
Environment variable: WAREHOUSE_CLICKHOUSE_HOST


--warehouse-clickhouse-order-by

Comma-separated list of columns for ORDER BY clause (e.g., 'property_id,date_utc'). Only applicable when warehouse-driver is set to 'clickhouse'.

Configuration key: warehouse.clickhouse.order_by
Environment variable: WAREHOUSE_CLICKHOUSE_ORDER_BY

Default: property_id,date_utc,session_id


--warehouse-clickhouse-partition-by

Expression for PARTITION BY clause (e.g., 'toYYYYMM(date_utc)'). Only applicable when warehouse-driver is set to 'clickhouse'.

Configuration key: warehouse.clickhouse.partition_by
Environment variable: WAREHOUSE_CLICKHOUSE_PARTITION_BY

Default: toYYYYMM(date_utc)


--warehouse-clickhouse-password

ClickHouse password. Only applicable when warehouse-driver is set to 'clickhouse'.

Configuration key: warehouse.clickhouse.password
Environment variable: WAREHOUSE_CLICKHOUSE_PASSWORD


--warehouse-clickhouse-port

ClickHouse port. Only applicable when warehouse-driver is set to 'clickhouse'.

Configuration key: warehouse.clickhouse.port
Environment variable: WAREHOUSE_CLICKHOUSE_PORT

Default: 9000


--warehouse-clickhouse-username

ClickHouse username. Only applicable when warehouse-driver is set to 'clickhouse'.

Configuration key: warehouse.clickhouse.username
Environment variable: WAREHOUSE_CLICKHOUSE_USER


--warehouse-driver

Target warehouse driver (clickhouse, bigquery, files, console, or noop)

Configuration key: warehouse.driver
Environment variable: WAREHOUSE_DRIVER

Default: console


--warehouse-files-compression

Compression algorithm for warehouse files (gzip, or empty for none)

Configuration key: warehouse.files.compression
Environment variable: WAREHOUSE_FILES_COMPRESSION


--warehouse-files-compression-level

Compression level for warehouse files (-1 = default, 1 = fastest, 9 = best compression)

Configuration key: warehouse.files.compression_level
Environment variable: WAREHOUSE_FILES_COMPRESSION_LEVEL

Default: -1


--warehouse-files-filesystem-path

Destination directory for filesystem storage (required when warehouse-files-storage=filesystem)

Configuration key: warehouse.files.filesystem.path
Environment variable: WAREHOUSE_FILES_FILESYSTEM_PATH


--warehouse-files-format

File format for warehouse output (csv)

Configuration key: warehouse.files.format
Environment variable: WAREHOUSE_FILES_FORMAT

Default: csv


--warehouse-files-gcs-bucket

WAREHOUSE_FILES GCS bucket name (only used when warehouse-files-type=gcs)

Configuration key: warehouse.files.gcs.bucket
Environment variable: WAREHOUSE_FILES_GCS_BUCKET


--warehouse-files-gcs-creds-json

WAREHOUSE_FILES GCS credentials JSON (raw or base64); empty uses ADC (only used when warehouse-files-type=gcs)

Configuration key: warehouse.files.gcs.creds_json
Environment variable: WAREHOUSE_FILES_GCS_CREDS_JSON


--warehouse-files-gcs-project

WAREHOUSE_FILES GCS project ID (optional; only used when warehouse-files-type=gcs)

Configuration key: warehouse.files.gcs.project
Environment variable: WAREHOUSE_FILES_GCS_PROJECT


--warehouse-files-max-segment-age

Maximum segment age before sealing (default: 1h)

Configuration key: warehouse.files.max_segment_age
Environment variable: WAREHOUSE_FILES_MAX_SEGMENT_AGE

Default: 1h0m0s


--warehouse-files-max-segment-size

Maximum segment size in bytes before sealing (default: 1 GiB)

Configuration key: warehouse.files.max_segment_size
Environment variable: WAREHOUSE_FILES_MAX_SEGMENT_SIZE

Default: 1073741824


--warehouse-files-path-template

Path template for warehouse file uploads. Variables: Table, Schema, SegmentID, Extension, Year, Month, MonthPadded, Day, DayPadded

Configuration key: warehouse.files.path_template
Environment variable: WAREHOUSE_FILES_PATH_TEMPLATE

Default: table={{.Table}}/schema={{.Schema}}/y={{.Year}}/m={{.MonthPadded}}/d={{.DayPadded}}/{{.SegmentID}}.{{.Extension}}


--warehouse-files-prefix

Object storage prefix/namespace for WAREHOUSE_FILES objects

Configuration key: warehouse.files.prefix
Environment variable: WAREHOUSE_FILES_PREFIX


--warehouse-files-s3-access-key

WAREHOUSE_FILES S3/MinIO access key (only used when warehouse-files-type=s3)

Configuration key: warehouse.files.s3.access_key
Environment variable: WAREHOUSE_FILES_S3_ACCESS_KEY


--warehouse-files-s3-bucket

WAREHOUSE_FILES S3/MinIO bucket name (only used when warehouse-files-type=s3)

Configuration key: warehouse.files.s3.bucket
Environment variable: WAREHOUSE_FILES_S3_BUCKET


--warehouse-files-s3-create-bucket

WAREHOUSE_FILES: create bucket on startup if missing (only used when warehouse-files-type=s3)

Configuration key: warehouse.files.s3.create_bucket
Environment variable: WAREHOUSE_FILES_S3_CREATE_BUCKET


--warehouse-files-s3-host

WAREHOUSE_FILES S3/MinIO host (only used when warehouse-files-type=s3)

Configuration key: warehouse.files.s3.host
Environment variable: WAREHOUSE_FILES_S3_HOST


--warehouse-files-s3-port

WAREHOUSE_FILES S3/MinIO port (only used when warehouse-files-type=s3)

Configuration key: warehouse.files.s3.port
Environment variable: WAREHOUSE_FILES_S3_PORT

Default: 9000


--warehouse-files-s3-protocol

WAREHOUSE_FILES S3 endpoint protocol (http or https; only used when warehouse-files-type=s3)

Configuration key: warehouse.files.s3.protocol
Environment variable: WAREHOUSE_FILES_S3_PROTOCOL

Default: http


--warehouse-files-s3-region

WAREHOUSE_FILES S3 region (only used when warehouse-files-type=s3)

Configuration key: warehouse.files.s3.region
Environment variable: WAREHOUSE_FILES_S3_REGION

Default: us-east-1


--warehouse-files-s3-secret-key

WAREHOUSE_FILES S3/MinIO secret key (only used when warehouse-files-type=s3)

Configuration key: warehouse.files.s3.secret_key
Environment variable: WAREHOUSE_FILES_S3_SECRET_KEY


--warehouse-files-seal-check-interval

How often to evaluate sealing triggers (default: 15s)

Configuration key: warehouse.files.seal_check_interval
Environment variable: WAREHOUSE_FILES_SEAL_CHECK_INTERVAL

Default: 15s


--warehouse-files-storage

Storage destination for warehouse files (s3, gcs, or filesystem)

Configuration key: warehouse.files.storage
Environment variable: WAREHOUSE_FILES_STORAGE


--warehouse-files-type

WAREHOUSE_FILES object storage type (s3 or gcs)

Configuration key: warehouse.files.type
Environment variable: WAREHOUSE_FILES_TYPE


--warehouse-table

Target warehouse table name.

Configuration key: warehouse.table
Environment variable: WAREHOUSE_TABLE

Default: events