Telemetry

Starting with version v1.1.1 of StripePy, we introduced support for telemetry collection.

This page outlines what information we are collecting and why. Furthermore, we provide instructions on how telemetry collection can be disabled.

What information is being collected

stripepy is instrumented to collect general information about stripepy itself and the system where it is being run.

We do not collect any sensitive information that could be used to identify our users, the machine where stripepy is being run, or the datasets processed by stripepy.

This is the data we are collecting:

  • Information on how stripepy was installed (i.e., the package version and third-party dependency versions).

  • Information on the system where stripepy is being run (i.e., operating system, processor architecture, and Python version).

  • How stripepy is being invoked (i.e., the subcommand, input file format and parameters).

  • Information about stripepy execution (i.e., when it was launched, how long the command took to finish, and whether the command terminated with an error).

The following table shows an example of the telemetry collected when running stripepy call ENCFF993FGR.hic 10000 -p 8:

Telemetry information collected when running stripepy call

Field

Value

dependency.h5py.version

3.14.0

dependency.hictkpy.version

1.3.0

dependency.numpy.version

2.3.1

dependency.packaging.version

25.0

dependency.pandas.version

2.3.1

dependency.scipy.version

1.16.0

dependency.structlog.version

25.4.0

duration_ms

103074.812875

host.arch

x86_64

library.name

stripepy

meta.signal_type

trace

name

call

os.type

linux

os.version

6.11.0-1015-azure

params.constrain_heights

false

params.contact_map_format

mcool

params.contact_map_raw_interactions

true

params.contact_map_resolution

20000

params.genomic_belt

5000000

params.glob_pers_min

0.04

params.k

3

params.loc_pers_min

0.33

params.loc_trend_min

0.25

params.max_width

100000

params.nproc

4

process.runtime.description

GCC 12.2.0

process.runtime.name

CPython

process.runtime.version

3.13.5

Sample Rate

1

service.name

stripepy

service.version

1.1.1.dev73+g2e13fec

span.kind

internal

span.num_events

0

span.num_links

0

status_code

1

telemetry.sdk.language

python

telemetry.sdk.name

opentelemetry

telemetry.sdk.version

1.34.1

trace.span_id

4b6f5534c8aec420

trace.trace_id

30b4804d781557f552d15dec8270fca8

type

internal

Why are we collecting this information?

There are two main motivations behind our decision to start collecting telemetry data:

  1. To get an idea of how big our user base is: this will help us, among other things, to secure funding to maintain stripepy in the future.

  2. To better understand which of the functionalities offered by stripepy are most used by our users: we intend to use this information to help us decide which features we should focus our development efforts on.

How is telemetry information processed and stored

Telemetry is sent to an OpenTelemetry collector running on a virtual server hosted on the Norwegian Research and Education Cloud (NREC).

The virtual server and collector are managed by us, and traffic between stripepy and the collector is encrypted.

The collector processes incoming data continuously and forwards it to a dashboard for data analytics and a backup solution (both services are hosted in Europe). Communication between the collector, dashboard, and backup site is also encrypted. Data stored by the dashboard and backup site is encrypted at rest.

The analytics dashboard keeps telemetry data for up to 60 days, while the backup site is currently set up to store telemetry data indefinitely (although this may change in the future).

How to disable telemetry collection

To disable telemetry collection, simply define the STRIPEPY_NO_TELEMETRY environment variable before launching stripepy (e.g., STRIPEPY_NO_TELEMETRY=1 stripepy download)

Where can I find the code used for telemetry collection?

All code concerning telemetry collection is defined in file src/stripepy/cli/telemetry.py.