Quickstart

StripePy is organized into a few subcommands:

Walkthrough

The following is a synthetic example of a typical run of StripePy. The steps outlined in this section assume that StripePy is running on a UNIX system. Some commands may need a bit of tweaking to run on Windows.

1) Download a sample dataset (optional)

If you need to download the example matrix used here, you can do so by running:

user@dev:/tmp$ stripepy download --name 4DNFI9GMP2J8

Feel free to use your own interaction matrix instead of 4DNFI9GMP2J8.mcool. Please make sure the matrix is in .cool, .mcool, or .hic format.

A more extended description of the subcommand stripepy download is found in Downloading sample datasets.

2) Detect architectural stripes

The stripepy call subcommand is the core of the analysis, designed to identify architectural stripes within contact maps. This process can be quite time-consuming, especially when working with large files.

The path to your contact map file and the desired resolution are required to run the analysis. For instance, to analyse the 4DNFI9GMP2J8.mcool file at a 10,000 bp resolution, you would use:

user@dev:/tmp$ stripepy call 4DNFI9GMP2J8.mcool 10000

The command will output a single HDF5 file (e.g., 4DNFI9GMP2J8.10000.hdf5).

Additional information is provided in Detect architectural stripes.

3) Fetch stripes in BEDPE format

Stripe coordinates can be fetched from the .hdf5 file using stripepy view, as in

user@dev:/tmp$ stripepy view 4DNFI9GMP2J8.10000.hdf5 > stripes.bedpe

Further details can be found in Fetch architectural stripes.

4) Generating plots

StripePy comes with a plot subcommand that can be used to visualize architectural stripes overlaid on top of the Hi-C matrix. stripepy plot can also generate several graphs showing the general properties of the called stripes, see Generating plots for a complete overview.

For instance, running

user@dev:/tmp$ stripepy plot cm 4DNFI9GMP2J8.mcool 10000 /tmp/matrix_with_stripes.png --stripepy-hdf5 4DNFI9GMP2J8.10000.hdf5 --highlight-stripes

will generate the following plot

_images/4DNFI9GMP2J8_chr14_34mbp-cm_plot_highlight_stripes.png

Accessing stripes and descriptors from Python

If you are working in Python, you might want to carry out analysis on the stripes and their biodescriptors. The ResultFile class helps load and process HDF5 files (e.g., 4DNFI9GMP2J8.10000.hdf5) generated by StripePy.

The following code snippet can be used to load lower-triangular stripes over the whole genome:

In [1]: from stripepy.data_structures import ResultFile

In [2]:  with ResultFile("4DNFI9GMP2J8.10000.hdf5") as f:
   ...:      df = f.get(
   ...:          chrom="chr1",     # Pass None to fetch data for all chromosomes
   ...:          field="stripes",  # See API docs for a complete list of supported fields
   ...:          location="LT",    # Use "UT" to fetch from the upper-triangle
   ...:      )
   ...:

In [3]: df
Out[3]:
       seed  top_persistence  left_bound  right_bound  top_bound  ...  outer_lmean  outer_rmean  outer_mean  rel_change  cfx_of_variation
0        93         0.398490          91           96         93  ...     0.180769     0.240014    0.210392   19.138436          0.563444
1       102         0.053084          99          105        102  ...     0.250077     0.246783    0.248430    1.276074          0.605748
2       108         0.082636         106          111        108  ...     0.251255     0.242434    0.246845    6.744239          0.629097
3       116         0.103803         114          119        116  ...     0.452872     0.395339    0.424105    3.394272          0.394917
4       130         0.073611         126          132        130  ...     0.235412     0.249025    0.242219    3.656868          0.608349
...     ...              ...         ...          ...        ...  ...          ...          ...         ...         ...               ...
1743  24693         0.057216       24687        24695      24693  ...     0.274141     0.284040    0.279090    5.741370          0.382488
1744  24708         0.048084       24706        24710      24708  ...     0.280574     0.322965    0.301770    7.036960          0.354274
1745  24720         0.044175       24718        24723      24720  ...     0.162981     0.155803    0.159392    5.192390          0.833381
1746  24733         0.054484       24730        24737      24733  ...     0.181836     0.191120    0.186478    0.238297          0.791300
1747  24793         0.052317       24790        24796      24793  ...     0.168377     0.219650    0.194013    7.811918          0.518017

[1748 rows x 22 columns]