Configuration reference
Introduction
This document describes all available configuration options for d8a.
When multiple configuration sources are provided, values are resolved in the following order of precedence:
- CLI flags (highest priority)
- Environment variables
- YAML configuration file (lowest priority)
The configuration file is a YAML file. You can specify a custom location using the --config or -c flag.
Configuration keys
--dbip-destination-directory
Directory where the DB-IP database files are stored after downloading from the OCI registry. If the database already exists at this location, the download is skipped. Defaults to a temporary directory if not specified.
Configuration key: dbip.destination_directory
Environment variable: DBIP_DESTINATION_DIRECTORY
Default: /tmp/dbip
--dbip-download-timeout
Maximum time to wait for downloading the DB-IP MaxMind database from the OCI registry during program startup. If the download exceeds this timeout, the program will fail to start with DBIP columns enabled.
Configuration key: dbip.download_timeout
Environment variable: DBIP_DOWNLOAD_TIMEOUT
Default: 1m0s
--dbip-enabled
When enabled, adds geolocation column implementations (city, country, etc.) using DB-IP database. On program startup, downloads the DB-IP database from the OCI registry (ghcr.io/d8a-tech). The database is cached locally and reused on subsequent runs if already present.
Configuration key: dbip.enabled
Environment variable: DBIP_ENABLED
--device-detector-provider
Device detector provider (dd2 or stub)
Configuration key: device_detector.provider
Environment variable: DEVICE_DETECTOR_PROVIDER
Default: dd2
--filters-conditions
Array of filter conditions for traffic filtering. Each condition is a JSON-encoded string with fields: 'name' (string identifier), 'type' (exclude or allow), 'test_mode' (boolean), 'expression' (filter expression). Example: {"name":"internal_traffic","type":"exclude","test_mode":false,"expression":"ip_address == '10.0.0.1'"}. Can be set via CLI flag, environment variable (FILTERS_CONDITIONS), or YAML config (filters.conditions). Conditions from flag/env are appended to YAML conditions. See Traffic filtering for details.
Configuration key: filters.conditions
Environment variable: FILTERS_CONDITIONS
--filters-fields
Array of field names to make available to filter expressions. Can contain any event-scoped column names. These fields are injected into the expression environment and can be referenced in filter condition expressions. Example: ip_address, event_name, user_id, page_location. The default value includes ip_address for backward compatibility. See Traffic filtering for details.
Configuration key: filters.fields
Environment variable: FILTERS_FIELDS
Default: [ip_address]
--monitoring-enabled
Enable OpenTelemetry metrics
Configuration key: monitoring.enabled
Environment variable: MONITORING_ENABLED
--monitoring-otel-endpoint
OTel collector endpoint for metrics
Configuration key: monitoring.otel_endpoint
Environment variable: MONITORING_OTEL_ENDPOINT
Default: localhost:4317
--monitoring-otel-export-interval
Interval for exporting metrics to OTel collector
Configuration key: monitoring.otel_export_interval
Environment variable: MONITORING_OTEL_EXPORT_INTERVAL
Default: 30s
--monitoring-otel-insecure
Allow insecure (non-TLS) connection to OTel collector
Configuration key: monitoring.otel_insecure
Environment variable: MONITORING_OTEL_INSECURE
--property-id
Property ID, used to satisfy interfaces required by d8a cloud. Ends up as column in the warehouse.
Configuration key: property.id
Environment variable: PROPERTY_ID
Default: default
--property-name
Property name, used to satisfy interfaces required by d8a cloud. Ends up as column in the warehouse.
Configuration key: property.name
Environment variable: PROPERTY_NAME
Default: Default property
--property-settings-split-by-campaign
When enabled, splits a session into multiple sessions when the UTM campaign parameter value changes between events. This allows tracking separate sessions for different marketing campaigns within the same user visit.
Configuration key: property.settings.split_by_campaign
Environment variable: PROPERTY_SETTINGS_SPLIT_BY_CAMPAIGN
Default: true
--property-settings-split-by-max-events
Splits a session into multiple sessions when the number of events exceeds this value. This prevents sessions with excessive event counts from being stored as a single large session.
Configuration key: property.settings.split_by_max_events
Environment variable: PROPERTY_SETTINGS_SPLIT_BY_MAX_EVENTS
Default: 1000
--property-settings-split-by-time-since-first-event
Splits a session into multiple sessions when the time elapsed since the first event exceeds this duration. This prevents extremely long sessions from being grouped together, creating more meaningful session boundaries.
Configuration key: property.settings.split_by_time_since_first_event
Environment variable: PROPERTY_SETTINGS_SPLIT_BY_TIME_SINCE_FIRST_EVENT
Default: 12h0m0s
--property-settings-split-by-user-id
When enabled, splits a session into multiple sessions when the user ID value changes between events. This ensures that events from different authenticated users are not grouped into the same session.
Configuration key: property.settings.split_by_user_id
Environment variable: PROPERTY_SETTINGS_SPLIT_BY_USER_ID
Default: true
--protocol
Protocol to use for tracking requests. Valid values are 'ga4', 'd8a'.
Configuration key: protocol
Environment variable: PROTOCOL
Default: ga4
--queue-backend
Queue backend used between receiver and worker (filesystem or objectstorage)
Configuration key: queue.backend
Environment variable: QUEUE_BACKEND
Default: filesystem
--queue-object-storage-gcs-bucket
QUEUE_OBJECT_STORAGE GCS bucket name (only used when queue-object-storage-type=gcs)
Configuration key: queue.object_storage.gcs.bucket
Environment variable: QUEUE_OBJECT_STORAGE_GCS_BUCKET
--queue-object-storage-gcs-creds-json
QUEUE_OBJECT_STORAGE GCS credentials JSON (raw or base64); empty uses ADC (only used when queue-object-storage-type=gcs)
Configuration key: queue.object_storage.gcs.creds_json
Environment variable: QUEUE_OBJECT_STORAGE_GCS_CREDS_JSON
--queue-object-storage-gcs-project
QUEUE_OBJECT_STORAGE GCS project ID (optional; only used when queue-object-storage-type=gcs)
Configuration key: queue.object_storage.gcs.project
Environment variable: QUEUE_OBJECT_STORAGE_GCS_PROJECT
--queue-object-storage-prefix
Object storage prefix/namespace for QUEUE_OBJECT_STORAGE objects
Configuration key: queue.object_storage.prefix
Environment variable: QUEUE_OBJECT_STORAGE_PREFIX
Default: d8a/queue
--queue-object-storage-s3-access-key
QUEUE_OBJECT_STORAGE S3/MinIO access key (only used when queue-object-storage-type=s3)
Configuration key: queue.object_storage.s3.access_key
Environment variable: QUEUE_OBJECT_STORAGE_S3_ACCESS_KEY
--queue-object-storage-s3-bucket
QUEUE_OBJECT_STORAGE S3/MinIO bucket name (only used when queue-object-storage-type=s3)
Configuration key: queue.object_storage.s3.bucket
Environment variable: QUEUE_OBJECT_STORAGE_S3_BUCKET
--queue-object-storage-s3-create-bucket
QUEUE_OBJECT_STORAGE: create bucket on startup if missing (only used when queue-object-storage-type=s3)
Configuration key: queue.object_storage.s3.create_bucket
Environment variable: QUEUE_OBJECT_STORAGE_S3_CREATE_BUCKET
--queue-object-storage-s3-host
QUEUE_OBJECT_STORAGE S3/MinIO host (only used when queue-object-storage-type=s3)
Configuration key: queue.object_storage.s3.host
Environment variable: QUEUE_OBJECT_STORAGE_S3_HOST
--queue-object-storage-s3-port
QUEUE_OBJECT_STORAGE S3/MinIO port (only used when queue-object-storage-type=s3)
Configuration key: queue.object_storage.s3.port
Environment variable: QUEUE_OBJECT_STORAGE_S3_PORT
Default: 9000
--queue-object-storage-s3-protocol
QUEUE_OBJECT_STORAGE S3 endpoint protocol (http or https; only used when queue-object-storage-type=s3)
Configuration key: queue.object_storage.s3.protocol
Environment variable: QUEUE_OBJECT_STORAGE_S3_PROTOCOL
Default: http
--queue-object-storage-s3-region
QUEUE_OBJECT_STORAGE S3 region (only used when queue-object-storage-type=s3)
Configuration key: queue.object_storage.s3.region
Environment variable: QUEUE_OBJECT_STORAGE_S3_REGION
Default: us-east-1
--queue-object-storage-s3-secret-key
QUEUE_OBJECT_STORAGE S3/MinIO secret key (only used when queue-object-storage-type=s3)
Configuration key: queue.object_storage.s3.secret_key
Environment variable: QUEUE_OBJECT_STORAGE_S3_SECRET_KEY
--queue-object-storage-type
QUEUE_OBJECT_STORAGE object storage type (s3 or gcs)
Configuration key: queue.object_storage.type
Environment variable: QUEUE_OBJECT_STORAGE_TYPE
--queue-objectstorage-interval-exp-factor
Exponential backoff factor for objectstorage queue consumer polling interval (only used for objectstorage backend)
Configuration key: queue.object_storage.interval_exp_factor
Environment variable: QUEUE_OBJECTSTORAGE_INTERVAL_EXP_FACTOR
Default: 1.5
--queue-objectstorage-max-interval
Maximum polling interval for objectstorage queue consumer exponential backoff (only used for objectstorage backend)
Configuration key: queue.object_storage.max_interval
Environment variable: QUEUE_OBJECTSTORAGE_MAX_INTERVAL
Default: 1m0s
--queue-objectstorage-max-items-to-read-at-once
Maximum number of items to read in one batch from objectstorage queue (only used for objectstorage backend)
Configuration key: queue.object_storage.max_items_to_read_at_once
Environment variable: QUEUE_OBJECTSTORAGE_MAX_ITEMS_TO_READ_AT_ONCE
Default: 1000
--queue-objectstorage-min-interval
Minimum polling interval for objectstorage queue consumer (only used for objectstorage backend)
Configuration key: queue.object_storage.min_interval
Environment variable: QUEUE_OBJECTSTORAGE_MIN_INTERVAL
Default: 5s
--receiver-batch-size
Maximum number of hits to accumulate before flushing to the queue storage. When this many hits are received, they are immediately flushed even if the timeout hasn't been reached.
Configuration key: receiver.batch_size
Environment variable: RECEIVER_BATCH_SIZE
Default: 5000
--receiver-batch-timeout
Maximum time to wait before flushing accumulated hits to the queue storage. Hits are flushed when either this timeout is reached or the batch size limit is exceeded, whichever comes first.
Configuration key: receiver.batch_timeout
Environment variable: RECEIVER_BATCH_TIMEOUT
Default: 1s
--receiver-max-hit-kbytes
Maximum size of a hit in kilobytes. Tracking requests are rejected if they contain a hit, which exceeds this size.
Configuration key: receiver.max_hit_kbytes
Environment variable: RECEIVER_MAX_HIT_KBYTES
Default: 128
--server-host
Host to listen on for HTTP server
Configuration key: server.host
Environment variable: SERVER_HOST
Default: 0.0.0.0
--server-port
Port to listen on for HTTP server
Configuration key: server.port
Environment variable: SERVER_PORT
Default: 8080
--sessions-join-by-session-stamp
When enabled, the system will merge proto-sessions that share the same session stamp identifier, even if they have different client IDs. This allows tracking user sessions across different devices or browsers when they share a common session identifier, enabling cross-device session continuity for authenticated or identified users.
Configuration key: sessions.join_by_session_stamp
Environment variable: SESSIONS_JOIN_BY_SESSION_STAMP
Default: true
--sessions-join-by-user-id
When enabled, the system will merge proto-sessions that share the same user ID, even if they have different client IDs. This enables cross-device session tracking for authenticated users, allowing hits from different devices or browsers to be grouped into a single session when they share the same authenticated user identifier. Only hits that include a user ID value will participate in this joining behavior.
Configuration key: sessions.join_by_user_id
Environment variable: SESSIONS_JOIN_BY_USER_ID
Default: true
--sessions-timeout
Maximum time period of inactivity after which a proto-session is considered expired and ready to be closed. The system uses a timing wheel to schedule session closures based on each hit's server received time plus this duration. After this period elapses without new hits, the proto-session is finalized and written to the warehouse as a completed session.
Configuration key: sessions.timeout
Environment variable: SESSIONS_TIMEOUT
Default: 30m0s
--storage-bolt-directory
Directory path where BoltDB database files are stored. This directory hosts two databases: 'bolt.db' for proto-session data, identifier metadata, and timing wheel bucket information, and 'bolt_kv.db' for key-value storage. These databases persist session state across restarts and are essential for session management functionality.
Configuration key: storage.bolt_directory
Environment variable: STORAGE_BOLT_DIRECTORY
Default: .
--storage-queue-directory
Directory path where batched hits are stored in a filesystem-based queue before being processed by background workers. This directory acts as a persistent buffer between the receiver and the session processing pipeline.
Configuration key: storage.queue_directory
Environment variable: STORAGE_QUEUE_DIRECTORY
Default: ./queue
--storage-spool-directory
Directory path where sessions are stored in a filesystem-based spool before being written to the warehouse. This directory acts as a persistent buffer between the session writer and the warehouse.
Configuration key: storage.spool_directory
Environment variable: STORAGE_SPOOL_DIRECTORY
Default: ./spool
--storage-spool-enabled
Enable spooling of sessions to a filesystem-based spool before writing to the warehouse. This can improve performance by deferring the writes to the warehouse.
Configuration key: storage.spool_enabled
Environment variable: STORAGE_SPOOL_ENABLED
Default: true
--storage-spool-write-chan-buffer
Capacity of the spool writer's input channel. Larger values reduce blocking of close path when L2 flush runs (improves close p99) at the cost of more sessions in memory on crash. Zero = unbuffered.
Configuration key: storage.spool_write_chan_buffer
Environment variable: STORAGE_SPOOL_WRITE_CHAN_BUFFER
Default: 1000
--telemetry-url
Telemetry endpoint URL for sending usage events. Anonymous and non-invasive: collects only app version and runtime duration. Client ID (UUID) is generated per app start and not persisted, resetting on each restart. If empty, telemetry is disabled.
Configuration key: telemetry.url
Environment variable: TELEMETRY_URL
Default: https://global.t.d8a.tech/28b4fbc6-a4d0-49c4-883f-58314f83416e/g/collect
--warehouse-bigquery-creds-json
BigQuery service account JSON (raw or base64). Only applicable when warehouse-driver is set to 'bigquery'.
Configuration key: warehouse.bigquery.creds_json
Environment variable: WAREHOUSE_BIGQUERY_CREDS_JSON
--warehouse-bigquery-dataset-name
BigQuery dataset name. Only applicable when warehouse-driver is set to 'bigquery'.
Configuration key: warehouse.bigquery.dataset_name
Environment variable: WAREHOUSE_BIGQUERY_DATASET_NAME
--warehouse-bigquery-partition-expiration-days
BigQuery partition expiration in days. 0 means partitions do not expire. By default uses no expiration.
Configuration key: warehouse.bigquery.partition_expiration_days
Environment variable: WAREHOUSE_BIGQUERY_PARTITION_EXPIRATION_DAYS
--warehouse-bigquery-partition-field
BigQuery partition field (top-level TIMESTAMP or DATE). By default uses date_utc column.
Configuration key: warehouse.bigquery.partition_field
Environment variable: WAREHOUSE_BIGQUERY_PARTITION_FIELD
Default: date_utc
--warehouse-bigquery-partition-interval
BigQuery partition interval (HOUR, DAY, MONTH, YEAR). By default uses DAY interval.
Configuration key: warehouse.bigquery.partition_interval
Environment variable: WAREHOUSE_BIGQUERY_PARTITION_INTERVAL
Default: DAY
--warehouse-bigquery-project-id
BigQuery GCP project ID. Only applicable when warehouse-driver is set to 'bigquery'.
Configuration key: warehouse.bigquery.project_id
Environment variable: WAREHOUSE_BIGQUERY_PROJECT_ID
--warehouse-bigquery-query-timeout
BigQuery query timeout. Only applicable when warehouse-driver is set to 'bigquery'.
Configuration key: warehouse.bigquery.query_timeout
Environment variable: WAREHOUSE_BIGQUERY_QUERY_TIMEOUT
Default: 30s
--warehouse-bigquery-table-creation-timeout
BigQuery table creation timeout. Only applicable when warehouse-driver is set to 'bigquery'.
Configuration key: warehouse.bigquery.table_creation_timeout
Environment variable: WAREHOUSE_BIGQUERY_TABLE_CREATION_TIMEOUT
Default: 10s
--warehouse-bigquery-writer-type
BigQuery writer type (loadjob or streaming). Only applicable when warehouse-driver is set to 'bigquery'.
Configuration key: warehouse.bigquery.writer_type
Environment variable: WAREHOUSE_BIGQUERY_WRITER_TYPE
Default: loadjob
--warehouse-clickhouse-database
ClickHouse database name. Only applicable when warehouse-driver is set to 'clickhouse'.
Configuration key: warehouse.clickhouse.database
Environment variable: WAREHOUSE_CLICKHOUSE_DB
--warehouse-clickhouse-host
ClickHouse host. Only applicable when warehouse-driver is set to 'clickhouse'.
Configuration key: warehouse.clickhouse.host
Environment variable: WAREHOUSE_CLICKHOUSE_HOST
--warehouse-clickhouse-order-by
Comma-separated list of columns for ORDER BY clause (e.g., 'property_id,date_utc'). Only applicable when warehouse-driver is set to 'clickhouse'.
Configuration key: warehouse.clickhouse.order_by
Environment variable: WAREHOUSE_CLICKHOUSE_ORDER_BY
Default: property_id,date_utc,session_id
--warehouse-clickhouse-partition-by
Expression for PARTITION BY clause (e.g., 'toYYYYMM(date_utc)'). Only applicable when warehouse-driver is set to 'clickhouse'.
Configuration key: warehouse.clickhouse.partition_by
Environment variable: WAREHOUSE_CLICKHOUSE_PARTITION_BY
Default: toYYYYMM(date_utc)
--warehouse-clickhouse-password
ClickHouse password. Only applicable when warehouse-driver is set to 'clickhouse'.
Configuration key: warehouse.clickhouse.password
Environment variable: WAREHOUSE_CLICKHOUSE_PASSWORD
--warehouse-clickhouse-port
ClickHouse port. Only applicable when warehouse-driver is set to 'clickhouse'.
Configuration key: warehouse.clickhouse.port
Environment variable: WAREHOUSE_CLICKHOUSE_PORT
Default: 9000
--warehouse-clickhouse-username
ClickHouse username. Only applicable when warehouse-driver is set to 'clickhouse'.
Configuration key: warehouse.clickhouse.username
Environment variable: WAREHOUSE_CLICKHOUSE_USER
--warehouse-driver
Target warehouse driver (clickhouse, bigquery, files, console, or noop)
Configuration key: warehouse.driver
Environment variable: WAREHOUSE_DRIVER
Default: console
--warehouse-files-compression
Compression algorithm for warehouse files (gzip, or empty for none)
Configuration key: warehouse.files.compression
Environment variable: WAREHOUSE_FILES_COMPRESSION
--warehouse-files-compression-level
Compression level for warehouse files (-1 = default, 1 = fastest, 9 = best compression)
Configuration key: warehouse.files.compression_level
Environment variable: WAREHOUSE_FILES_COMPRESSION_LEVEL
Default: -1
--warehouse-files-filesystem-path
Destination directory for filesystem storage (required when warehouse-files-storage=filesystem)
Configuration key: warehouse.files.filesystem.path
Environment variable: WAREHOUSE_FILES_FILESYSTEM_PATH
--warehouse-files-format
File format for warehouse output (csv)
Configuration key: warehouse.files.format
Environment variable: WAREHOUSE_FILES_FORMAT
Default: csv
--warehouse-files-gcs-bucket
WAREHOUSE_FILES GCS bucket name (only used when warehouse-files-type=gcs)
Configuration key: warehouse.files.gcs.bucket
Environment variable: WAREHOUSE_FILES_GCS_BUCKET
--warehouse-files-gcs-creds-json
WAREHOUSE_FILES GCS credentials JSON (raw or base64); empty uses ADC (only used when warehouse-files-type=gcs)
Configuration key: warehouse.files.gcs.creds_json
Environment variable: WAREHOUSE_FILES_GCS_CREDS_JSON
--warehouse-files-gcs-project
WAREHOUSE_FILES GCS project ID (optional; only used when warehouse-files-type=gcs)
Configuration key: warehouse.files.gcs.project
Environment variable: WAREHOUSE_FILES_GCS_PROJECT
--warehouse-files-max-segment-age
Maximum segment age before sealing (default: 1h)
Configuration key: warehouse.files.max_segment_age
Environment variable: WAREHOUSE_FILES_MAX_SEGMENT_AGE
Default: 1h0m0s
--warehouse-files-max-segment-size
Maximum segment size in bytes before sealing (default: 1 GiB)
Configuration key: warehouse.files.max_segment_size
Environment variable: WAREHOUSE_FILES_MAX_SEGMENT_SIZE
Default: 1073741824
--warehouse-files-path-template
Path template for warehouse file uploads. Variables: Table, Schema, SegmentID, Extension, Year, Month, MonthPadded, Day, DayPadded
Configuration key: warehouse.files.path_template
Environment variable: WAREHOUSE_FILES_PATH_TEMPLATE
Default: table={{.Table}}/schema={{.Schema}}/y={{.Year}}/m={{.MonthPadded}}/d={{.DayPadded}}/{{.SegmentID}}.{{.Extension}}
--warehouse-files-prefix
Object storage prefix/namespace for WAREHOUSE_FILES objects
Configuration key: warehouse.files.prefix
Environment variable: WAREHOUSE_FILES_PREFIX
--warehouse-files-s3-access-key
WAREHOUSE_FILES S3/MinIO access key (only used when warehouse-files-type=s3)
Configuration key: warehouse.files.s3.access_key
Environment variable: WAREHOUSE_FILES_S3_ACCESS_KEY
--warehouse-files-s3-bucket
WAREHOUSE_FILES S3/MinIO bucket name (only used when warehouse-files-type=s3)
Configuration key: warehouse.files.s3.bucket
Environment variable: WAREHOUSE_FILES_S3_BUCKET
--warehouse-files-s3-create-bucket
WAREHOUSE_FILES: create bucket on startup if missing (only used when warehouse-files-type=s3)
Configuration key: warehouse.files.s3.create_bucket
Environment variable: WAREHOUSE_FILES_S3_CREATE_BUCKET
--warehouse-files-s3-host
WAREHOUSE_FILES S3/MinIO host (only used when warehouse-files-type=s3)
Configuration key: warehouse.files.s3.host
Environment variable: WAREHOUSE_FILES_S3_HOST
--warehouse-files-s3-port
WAREHOUSE_FILES S3/MinIO port (only used when warehouse-files-type=s3)
Configuration key: warehouse.files.s3.port
Environment variable: WAREHOUSE_FILES_S3_PORT
Default: 9000
--warehouse-files-s3-protocol
WAREHOUSE_FILES S3 endpoint protocol (http or https; only used when warehouse-files-type=s3)
Configuration key: warehouse.files.s3.protocol
Environment variable: WAREHOUSE_FILES_S3_PROTOCOL
Default: http
--warehouse-files-s3-region
WAREHOUSE_FILES S3 region (only used when warehouse-files-type=s3)
Configuration key: warehouse.files.s3.region
Environment variable: WAREHOUSE_FILES_S3_REGION
Default: us-east-1
--warehouse-files-s3-secret-key
WAREHOUSE_FILES S3/MinIO secret key (only used when warehouse-files-type=s3)
Configuration key: warehouse.files.s3.secret_key
Environment variable: WAREHOUSE_FILES_S3_SECRET_KEY
--warehouse-files-seal-check-interval
How often to evaluate sealing triggers (default: 15s)
Configuration key: warehouse.files.seal_check_interval
Environment variable: WAREHOUSE_FILES_SEAL_CHECK_INTERVAL
Default: 15s
--warehouse-files-storage
Storage destination for warehouse files (s3, gcs, or filesystem)
Configuration key: warehouse.files.storage
Environment variable: WAREHOUSE_FILES_STORAGE
--warehouse-files-type
WAREHOUSE_FILES object storage type (s3 or gcs)
Configuration key: warehouse.files.type
Environment variable: WAREHOUSE_FILES_TYPE
--warehouse-table
Target warehouse table name.
Configuration key: warehouse.table
Environment variable: WAREHOUSE_TABLE
Default: events