Python API Reference

stripepy.data_structures.SparseMatrix

alias of csr_matrix | csc_matrix

class stripepy.data_structures.Stripe(
seed: int,
top_pers: float | None,
horizontal_bounds: Tuple[int, int] | None = None,
vertical_bounds: Tuple[int, int] | None = None,
where: str | None = None,
)

A class used to represent architectural stripes. This class takes care of validating stripe coordinates and computing several descriptive statistics.

This is how this class should be used:

The stripe properties and statistics can now be accessed through the attributes listed below.

Attributes representing the descriptive statistics return negative values to signal that it was not possible to compute the statistics for the current Stripe instance.

__init__(
seed: int,
top_pers: float | None,
horizontal_bounds: Tuple[int, int] | None = None,
vertical_bounds: Tuple[int, int] | None = None,
where: str | None = None,
)
Parameters:
  • seed – the stripe seed position

  • top_pers – the topological persistence of the seed

  • horizontal_bounds – the horizontal bounds of the stripe

  • vertical_bounds – the_vertical bounds of the stripe

  • where – the location of the stripe: should be “upper_triangular” or “lower_triangular”. When provided, this is used validate the coordinates set when calling set_horizontal_bounds() and set_vertical_bounds().

property seed: int

The stripe seed

property top_persistence: float | None

The topological persistence

property lower_triangular: bool

True when the stripe extends in the lower-triangular portion of the matrix

property upper_triangular: bool

True when the stripe extends in the upper-triangular portion of the matrix

property triangular_undetermined: bool

True when the stripe has height 0

property left_bound: int

The left bound of the stripe

property right_bound: int

The right bound of the stripe

property top_bound: int

The top bound of the stripe

property bottom_bound: int

The bottom bound of the stripe

property inner_mean: float

The average number of interactions within the stripe

property inner_std: float

The standard deviation of the number of interactions within the stripe

property five_number: ndarray[tuple[Any, ...], dtype[float]]

A vector of five numbers corresponding to the 0, 25, 50, 75, and 100 percentiles of the number of within-stripe interactions

property outer_lsum: float

The sum of interactions in the band to the left of the stripe

property outer_rsum: float

The sum of interactions in the band to the right of the stripe

property outer_lsize: float

The number of entries in the band to the left of the stripe

property outer_rsize: float

The number of entries in the band to the right of the stripe

property outer_lmean: float

The average number of interactions in the band to the left of the stripe

property outer_rmean: float

The average number of interactions in the band to the right of the stripe

property outer_mean: float

The average number of interactions in the bands to the left and right of the stripe

property rel_change: float

The ratio of the average number of interactions within the stripe and in the neighborhood outside of the stripe

set_horizontal_bounds(left_bound: int, right_bound: int)

Set the horizontal bounds for the stripe. This function raises an exception when the coordinates have already been set or when the given coordinates are incompatible with the seed position.

Parameters:
  • left_bound

  • right_bound

set_vertical_bounds(top_bound: int, bottom_bound: int)

Set the vertical bounds for the stripe. This function raises an exception when the coordinates have already been set or when the given coordinates are incompatible with the seed position and/or the where location.

Parameters:
  • top_bound

  • bottom_bound

compute_biodescriptors(
matrix: csr_matrix | csc_matrix,
window: int = 3,
)

Use the sparse matrix to compute various descriptive statistics. Statistics are stored in the current Stripe instance. This function raises an exception when it is called before the stripe bounds have been set.

Parameters:
  • matrix – the sparse matrix from which the stripe originated

  • window – window size used to compute statistics to the left and right of the stripe

class stripepy.data_structures.ResultFile(
path: Path,
mode: str = 'r',
_create_key: object | None = None,
)

A class used to read and write StripePy results to a HDF5 file.

There are 3 main use cases:

  • Open the file in read mode:

with ResultFile("results.hdf5") as h5:
...
  • Open file in write mode:

    • If all data will be written to the file before the file is closed:

      with ResultFile.create("results.hdf5", mode="w", ...) as h5:
          h5.write_descriptors(res1)
          h5.write_descriptors(res2)
          ...
      
    • If the data will be added progressively:

      with ResultFile.create("results.hdf5", mode="a", ...) as h5:
          h5.write_descriptors(res1)  # not mandatory, it is also possible to create the
                                      # file and close it immediately
      ...
      with ResultFile.append("results.hdf5") as h5:
          h5.write_descriptors(res2)
          h5.write_descriptors(res3)
      ...
      with ResultFile.append("results.hdf5") as h5:
          h5.write_descriptors(res4)
          h5.finalize()  # IMPORTANT!
                         # Without the above line you'll get an error when trying to open
                         # the file in read mode
      

When opening or creating a ResultFile write or append mode, a context manager (e.g. with:) must be used

__init__(
path: Path,
mode: str = 'r',
_create_key: object | None = None,
)
static create(
path: Path,
mode: str,
chroms: Dict[str, int],
resolution: int,
normalization: str | None = None,
assembly: str = 'unknown',
metadata: Dict[str, Any] | None = None,
compression_lvl: int = 9,
)

Create a ResultFile using the provided information.

static create_from_file(
path: Path,
mode: str,
matrix_file: File,
normalization: str | None = None,
metadata: Dict[str, Any] | None = None,
compression_lvl: int = 9,
)

Create a ResultFile using information from the given matrix file.

static append(path: Path)

Append to an existing ResultFile.

IMPORTANT: the file must have been created with create or create_from_file with mode="a"

property assembly: str

The name of the reference genome assembly used to generate the file

property chromosomes: Dict[str, int]

The chromosomes associated with the opened file

property creation_date: datetime

The file creation date

property format: str

The file format string

property format_url: str

The URL where the file format is documented

property format_version: int

The format version of the file currently opened

property generated_by: str

The name of the tool used to generate the opened file

property metadata: Dict[str, Any]

The metadata associated with the file

property normalization: str | None

The name of the normalization used to generate the data stored in the given file

property path: Path

The path to the opened file

property resolution: int

The resolution of the Hi-C matrix used to generate the file

finalize()

Finalize a file opened in append mode

__getitem__(chrom: str) Result
get_min_persistence(chrom: str) float

Get the minimum persistence associated with the given chromosome.

Parameters:

chrom – chromosome name

Returns:

the minimum persistence

get(
chrom: str | None,
field: str,
location: str,
) DataFrame

Get the data associated with the given chromosome, field, and location.

Parameters:
  • chrom – chromosome name. when not provided, return data for the entire genome.

  • field

    name of the field to be fetched. Supported names:

    • pseudodistribution

    • all_minimum_points

    • persistence_of_all_minimum_points

    • all_maximum_points

    • persistence_of_all_maximum_points

    • geo_descriptors

    • bio_descriptors

    • stripes

  • location – location of the attribute to be registered. Should be “LT” or “UT”

Returns:

the data associated with the given chromosome, field, and location

write_descriptors(result: Result)

Read the descriptors from the given Result object and write them to the opened file.

Parameters:

result – results to be added to the opened file

class stripepy.data_structures.Result(chrom_name: str, chrom_size: int)

A class used to represent the results generated by stripepy call.

__init__(chrom_name: str, chrom_size: int)
Parameters:
  • chrom_name (str) – chromosome name

  • chrom_size (int) – chromosome size

property chrom: Tuple[str, int]

The name and length of the chromosomes to which the Result instance belongs to

property empty: bool

Check whether any stripe has been registered with the Result instance

property min_persistence: float

The minimum persistence used during computation

property roi: Dict[str, List[int]] | None

The region of interest associated with the Result instance

get(
name: str,
location: str,
) List[Stripe] | ndarray[int] | ndarray[float]

Get the value associated with the given attribute name and location.

Parameters:
  • name – name of the attribute to be fetched

  • location – location of the attribute to be fetched. Should be “LT” or “UT”

Returns:

the value associated with the given name and location.

get_stripes_descriptor(
descriptor: str,
location: str,
) ndarray[int] | ndarray[float]

Get the stripe descriptor for the given location.

Parameters:
  • descriptor – name of the descriptor to be fetched

  • location – location of the attribute to be fetched. Should be “LT” or “UT”

Returns:

the value associated with the given descriptor and location.

get_stripe_bio_descriptors(location: str) DataFrame

Fetch all biological descriptors at once.

Parameters:

location – location of the attribute to be fetched. Should be “LT” or “UT”

Returns:

the table with the biological descriptors associated with the Result instance

get_stripe_geo_descriptors(location: str) DataFrame

Fetch all geometric descriptors at once.

Parameters:

location – location of the attribute to be fetched. Should be “LT” or “UT”

Returns:

the table with the geometric descriptors associated with the Result instance