Telemetry¶
Starting with version v1.1.1 of StripePy, we introduced support for telemetry collection.
This page outlines what information we are collecting and why. Furthermore, we provide instructions on how telemetry collection can be disabled.
What information is being collected¶
stripepy is instrumented to collect general information about stripepy itself and the system where it is being run.
We do not collect any sensitive information that could be used to identify our users, the machine where stripepy is being run, or the datasets processed by stripepy.
This is the data we are collecting:
Information on how
stripepywas installed (i.e., the package version and third-party dependency versions).Information on the system where
stripepyis being run (i.e., operating system, processor architecture, and Python version).How
stripepyis being invoked (i.e., the subcommand, input file format and parameters).Information about
stripepyexecution (i.e., when it was launched, how long the command took to finish, and whether the command terminated with an error).
The following table shows an example of the telemetry collected when running stripepy call ENCFF993FGR.hic 10000 -p 8:
Field |
Value |
|---|---|
dependency.h5py.version |
3.14.0 |
dependency.hictkpy.version |
1.3.0 |
dependency.numpy.version |
2.3.1 |
dependency.packaging.version |
25.0 |
dependency.pandas.version |
2.3.1 |
dependency.scipy.version |
1.16.0 |
dependency.structlog.version |
25.4.0 |
duration_ms |
103074.812875 |
host.arch |
x86_64 |
library.name |
stripepy |
meta.signal_type |
trace |
name |
call |
os.type |
linux |
os.version |
6.11.0-1015-azure |
params.constrain_heights |
false |
params.contact_map_format |
mcool |
params.contact_map_raw_interactions |
true |
params.contact_map_resolution |
20000 |
params.genomic_belt |
5000000 |
params.glob_pers_min |
0.04 |
params.k |
3 |
params.loc_pers_min |
0.33 |
params.loc_trend_min |
0.25 |
params.max_width |
100000 |
params.nproc |
4 |
process.runtime.description |
GCC 12.2.0 |
process.runtime.name |
CPython |
process.runtime.version |
3.13.5 |
Sample Rate |
1 |
service.name |
stripepy |
service.version |
1.1.1.dev73+g2e13fec |
span.kind |
internal |
span.num_events |
0 |
span.num_links |
0 |
status_code |
1 |
telemetry.sdk.language |
python |
telemetry.sdk.name |
opentelemetry |
telemetry.sdk.version |
1.34.1 |
trace.span_id |
4b6f5534c8aec420 |
trace.trace_id |
30b4804d781557f552d15dec8270fca8 |
type |
internal |
Why are we collecting this information?¶
There are two main motivations behind our decision to start collecting telemetry data:
To get an idea of how big our user base is: this will help us, among other things, to secure funding to maintain
stripepyin the future.To better understand which of the functionalities offered by
stripepyare most used by our users: we intend to use this information to help us decide which features we should focus our development efforts on.
How is telemetry information processed and stored¶
Telemetry is sent to an OpenTelemetry collector running on a virtual server hosted on the Norwegian Research and Education Cloud (NREC).
The virtual server and collector are managed by us, and traffic between stripepy and the collector is encrypted.
The collector processes incoming data continuously and forwards it to a dashboard for data analytics and a backup solution (both services are hosted in Europe). Communication between the collector, dashboard, and backup site is also encrypted. Data stored by the dashboard and backup site is encrypted at rest.
The analytics dashboard keeps telemetry data for up to 60 days, while the backup site is currently set up to store telemetry data indefinitely (although this may change in the future).
How to disable telemetry collection¶
To disable telemetry collection, simply define the STRIPEPY_NO_TELEMETRY environment variable before launching stripepy (e.g., STRIPEPY_NO_TELEMETRY=1 stripepy download)
Where can I find the code used for telemetry collection?¶
All code concerning telemetry collection is defined in file src/stripepy/cli/telemetry.py.