scrollstats package

Copyright (c) 2025 Andrew Vanderheiden. All rights reserved.

scrollstats: An open-source python library to calculate and extract morphometrics from scroll bar floodplains

class scrollstats.BendDataExtractor(transects, bin_raster=None, dem=None, ridges=None, packets=None)

Bases: object

Responsible for extraction of ridge metrics across an entire bend.

Parameters:
calc_itx_metrics()

For each transect found in transects, calculate the itx metrics.

Return type:

GeoDataFrame

calc_transect_metrics()
Return type:

GeoDataFrame

count_ridges(signal)

Counts ridges in binary waves.

Parameters:

signal (ndarray[float])

Return type:

int

dense_sample(line, raster)

Sample an underlying in_bin_raster along a given LineString at a frequency of ~1m

Parameters:
Return type:

ndarray[float]

disqualify_coords(coord_array, raster)

Some coordinates may be out of the in_bin_raster. This function disqualifies these coordinates and returns a boolean array showing the location of all disqualified coordinates.

Coordinates are checked to see if they are 1) negative, 2) too large in x, or 3) too large in y

Parameters:
Return type:

ndarray[bool]

dominant_wavelength(ridge_count, signal)

Identifies the dominant wavelength from an input binary signal

Parameters:
Return type:

float

sample_array(coord_array, raster)

Takes in an array of image coordinates, samples the image, and returns the sampled values. Assumes that the coord array and in_bin_raster dataset share the same crs.

Parameters:
Return type:

ndarray[float]

trans_fft(signal)

Calculates the fast fourier transform for a 1D signal.

If you wish to see the power spectra, plot the sampled frequencies (x) vs their measured amplitude (y) The dominant wavelength within a given signal can be found with the function dominant_wavelength() below

Parameters:

signal (ndarray[float])

Return type:

tuple[ndarray[float], ndarray[float]]

class scrollstats.LineSmoother(lines, spacing, window)

Bases: object

Smooth and densify rough, manually drawn LineStrings.

Smoothing is accomplished with the use of a mean filter and densifying is accomplished with the use of a piecewise cubic spline. The GeoDataFrame provided must only contain LineStrings. MuliLineStrings or other geometries are not supported. The vertex count of any ridge cannot be lower than the window size for the mean filter

Values used for the Lower Brazos Ridges were:

window = 5 (vertices) spacing = 1 (meters)

Parameters:
calc_cubic_spline(line, spacing)

Fit a cubic spline function to a LineString then sample that function at the given spacing

Parameters:
Return type:

LineString

calc_dist(x, y)

Calc distance along the line

Parameters:
Return type:

list[float]

check_geometry_type()

Check that all geometries are of type LineString

Return type:

None

check_vertex_count()

Check that all ridges have at least as many vertices as the smoothing window is long

Return type:

None

execute()

Apply the mean filter and cubic spline to each line in the geodataframe. Return a new geodataframe with the smooth lines

Return type:

GeoDataFrame

meanfilt(line, w)

Use a mean filter to smooth the xy points of the line. This is done by passing a moving window with size w over the x and y coordinates separately and replacing the central value of the window with the mean value of the window. This particular method appends the first and last coord to the new line to account for erosion via convolution

Parameters:
Return type:

LineString

class scrollstats.MultiTransect(coord_list, centerline, ridges, shoot_distance, search_distance, dev_from_90, user_direction=None, verbose=1)

Bases: object

Creates multiple instances of H74Transect from a given centerline, ridge dataset, and other parameters.

The create_transects method is used to generate a GeoDataframe of transects. The return_all_geometries method returns the transects from create_transects as well as other intermediate geometries used in the creation of the transects. Useful for deubgging and plotting.

This class is used in the create_transects convenience function in the public API.

Parameters:
  • coord_list (list[Point]) – List of starting coordinates for each transect.

  • centerline (GeoDataFrame) – GeoDataFrame containing the centerline geometry.

  • ridges (GeoDataFrame) – GeoDataFrame containing the ridge geometries.

  • shoot_distance (float) – Distance for each shot.

  • search_distance (float) – Buffer distance for the search area on r2.

  • dev_from_90 (float) – Allowed deviation from 90 degrees for p2 shots.

  • user_direction (int | None) – User-specified initial shot direction from centerline.

  • verbose (int) – Verbosity level for user feedback

coord_list

List of starting coordinates for each transect.

Type:

list of Point

centerline

GeoDataFrame containing the centerline geometry.

Type:

GeoDataFrame

ridges

GeoDataFrame containing the ridge geometries.

Type:

GeoDataFrame

shoot_distance

Distance for each shot.

Type:

float

search_distance

Buffer distance for the search area on r2.

Type:

float

dev_from_90

Allowed deviation from 90 degrees for p2 shots.

Type:

float

user_direction

User-specified initial shot direction from centerline.

Type:

int or None

verbose

Verbosity level for user feedback

Type:

int

crs

Coordinate reference system for all geometries. Read from centerline

Type:

CRS

transect_list

List of generated transects.

Type:

list of H74Transect

transect_df

GeoDataFrame containing transect geometries.

Type:

GeoDataFrame

point_df

GeoDataFrame containing point geometries used in transect creation.

Type:

GeoDataFrame

search_area_df

GeoDataFrame containing search area polygons for p2 points.

Type:

GeoDataFrame

ridge_clip_df

GeoDataFrame containing ridge geometries clipped with search area polygons.

Type:

GeoDataFrame

create_point_df()

Creates a GeoDataFrame of all points used to create transects.

Parameters:

None

Returns:

GeoDataFrame containing points from all transects.

Return type:

GeoDataFrame

create_ridge_clip_df()

Creates a GeoDataFrame of all ridge sections searched to create transects.

Parameters:

None

Returns:

GeoDataFrame containing ridge sections.

Return type:

GeoDataFrame

create_search_area_df()

Creates a GeoDataFrame of all search areas used to create transects.

Parameters:

None

Returns:

GeoDataFrame containing search areas from all transects.

Return type:

GeoDataFrame

create_transect_df()

Creates a GeoDataFrame of transects from all transects which successfully left the centerline.

Parameters:

None

Returns:

GeoDataFrame containing all successful transects.

Return type:

GeoDataFrame

create_transect_list()

Creates a set of transects and aux geometries for a bend.

Parameters:

None

Returns:

List of generated transects.

Return type:

list[H74Transect]

return_all_geometries()

Return all geometries created for a set of transects

Parameters:

None

Returns:

Tuple containing the transect, point, search area, and ridge clip GeoDataFrames.

Return type:

tuple[GeoDataFrame, GeoDataFrame, GeoDataFrame, GeoDataFrame]

class scrollstats.RidgeDataExtractor(geometry, position, ridges, dem_signal=None, bin_signal=None)

Bases: object

Responsible for calculating ridge metrics at each intersection of a ridge and transect. The geometry for this class is a 3-vertex LineString

Parameters:
add_point_geometries(gdf, line)

Add the vertices from the 3vertex line as point geometries

Parameters:
Return type:

GeoDataFrame

boolify_mask()

Simplifies the bin_sig (which may contain nans) to a pure boolean array

Return type:

ndarray[bool] | None

calc_every_ridge_amp()

Calculates the average amplitude of each observed ridges in the units of the DEM.

Return type:

ndarray[float] | list[None]

calc_relative_vertex_distance(gdf, line)

Calculate the relative distance of each vertex along the transect.

Parameters:
Return type:

GeoDataFrame

calc_ridge_coms()

Find the center of mass for each ridge in the input binary signal.

Return type:

ndarray[bool] | None

calc_ridge_width_px()

Calculate the width of the single ridge in pixels

Return type:

float | None

calc_values_from_ridge_info(gdf)

Calculates the migration time, distance, and rate both before and after the center ridge. If the ridge does not have values for the deposit year, then mig_rate will be NaN.

Parameters:

gdf (GeoDataFrame)

Return type:

GeoDataFrame

calc_vertex_indices(gdf, signal_length)

Calculate the array index of all vertices. If self.signal_length is nan, then return array of nans

Parameters:
Return type:

GeoDataFrame

coerce_dtypes(gdf)

Coerce the the ‘object’ dtypes into their proper numeric types

Parameters:

gdf (GeoDataFrame)

Return type:

GeoDataFrame

create_point_gdf()

Create a 3 point GeoDataFrame to contain all relevant info for other methods.

Return type:

GeoDataFrame

determine_metric_confidence()

Assign a metric confidence score based on the boolean mask.

Return type:

int

determine_ridge_amp()

Select the correct ridge amplitude calculated by calc_every_ridge_amp() based on the number of ridges present

Return type:

float | None

determine_signal_length()

Return length of dem/bin signal if provided

Return type:

float

dq_first_swale()

If the ridge position of the signal is 0, then remove the first chunk of false values

Return type:

ndarray[bool] | None

dump_data()

Dump all the relevant info for the middle point.

Return type:

dict[str, Any]

find_closest_ridge()

The bin_signal may have more than two ridges present. This method identifies which ridge is closest to the transect-ridge intersection point.

Return type:

ndarray[float]

join_ridge_info(gdf, ridges)

Get ridge ids, time, distance, and migration rates via spatial join from the ridge features

Parameters:
Return type:

GeoDataFrame

class scrollstats.TransectDataExtractor(transect_id, geometry, dem_signal=None, bin_signal=None, ridges=None)

Bases: object

Responsible for extracting ridge metrics along a transect.

TransectDataExtractor will ultimately return a GeoDataFrame where each row is an eligible intersection between the transect and the ridge An eligible intersection is one that has a vertex before and after it so that the raster underneath can be sampled along the full width of the ridge. If a transect contains no eligible intersections, the gdf will be empty.

Parameters:
add_point_geometry(gdf)

Add the intersection (middle) point of the 3 vertex substring as its own point

Parameters:

gdf (GeoDataFrame)

Return type:

GeoDataFrame

add_relative_vertex_distances(gdf)

Calculate the distance between the substring coordinates relative to the length of the whole line.

Parameters:

gdf (GeoDataFrame)

Return type:

GeoDataFrame

add_substring_geometry(gdf)

Adds the 3 vertex substring that corresponds to each itx.

Parameters:

gdf (GeoDataFrame)

Return type:

GeoDataFrame

add_transect_id(gdf)

Add the transect id as a column

Parameters:

gdf (GeoDataFrame)

Return type:

GeoDataFrame

calc_cumulative_dist(coords)

Calculate the cumulative distances along a coordinate series

Parameters:

coords (Sequence[tuple[float, float]])

Return type:

ndarray[float]

calc_relative_vertex_distances(ls, start_dist)

Calculate the relative distance of each vertex along the transect.

Parameters:
Return type:

ndarray[float]

calc_ridge_metrics()

Calculate ridge width and amplitude at every transect-ridge intersection. Return a GeoDataFrame with Point geometries.

Return type:

GeoDataFrame

calc_vertex_indices(gdf)

Calculates the corresponding signal index of each of the substring vertices

Parameters:

gdf (GeoDataFrame)

Return type:

GeoDataFrame

create_itx_gdf()

Create the gdf that will contain all the ridge data for each intersection.

Return type:

GeoDataFrame

create_substrings(ls)

Create substrings starting from the eligible coordinates of the given linestring

Parameters:

ls (LineString)

Return type:

list[LineString]

determine_eligible_coords(ls)

Determine coordinates in the transect linestring that are eligible to be a start of a substring. Because the substrings are all 3 vertices long, the last two are not eligible. These eligible coords are defined because multiple functions need to use these coordinates.

Parameters:

ls (LineString)

Return type:

list[tuple[float, float]]

determine_substring_starts(gdf)

Determine the along-transect distance of the points of each substring

Parameters:

gdf (GeoDataFrame)

Return type:

GeoDataFrame

slice_bin_signal(gdf)

Slice the binary signal between the two end vertices of the substrings

Parameters:

gdf (GeoDataFrame)

Return type:

GeoDataFrame

slice_dem_signal(gdf)

Slice the DEM between the two end vertices of the substrings

Parameters:

gdf (GeoDataFrame)

Return type:

GeoDataFrame

scrollstats.calc_ridge_amps(dem_sig, bin_sig)

Calculate the ridge amplitudes from a DEM profile using the boolean mask signal.

Different strategies are used to calculate the ridge amplitude based on the ridge and swale count found within the boolean mask signal.

Parameters:
Return type:

ndarray[float]

scrollstats.calculate_ridge_metrics(in_transects, in_ridges, in_bin_raster=None, in_dem=None, in_packets=None)

Main function to calculate scroll metrics.

If in_packets is specified, then all metrics for the rich_transects will be calculated for the transect fragment within each packet.

All arguments can be provided as a file path or in-memory object (vector: GeoDataFrame, raster: rasterio dataset)

Parameters:
Return type:

tuple[GeoDataFrame, GeoDataFrame]

scrollstats.create_ridge_area_raster(dem_ds, geometry, classifier_funcs=(<function profile_curvature_classifier>, <function residual_topography_classifier>), denoiser_funcs=(<function binary_closing>, <function binary_opening>, <function remove_small_feats_w_flip>), no_data_value=None, **kwargs)

Main processing function to create the ridge area raster.

This function uses the provided classifier_funcs and denoiser_funcs to classify the ridge and swale areas within the input DEM.

Return type:

tuple[ndarray, ndarray, dict[Any, Any]]

Ridge Area Classification:

By default, scrollstats uses profile curvature (a measure of ridge convexity) and residual topography (a measure of ridge prominence) to classify ridge areas. Each classifier function is applied to the DEM, then the union of all the resulting binary arrays will be used for denoising. This means that the more classifier functions you use, the more conservative, but ideally more accurate, your ridge areas will be.

If the user desires, they can provide their own classifier functions so long as the functions follow the pattern below

classifier_func(ElevationArray2D, **kwargs) -> BinaryArray2D

See scrollstats/delineation/raster_classifiers.py for the DEFAULT_CLASSIFIERS list of functions and their definitions.

Clip Ridge and Swale Topography:

In order to avoid edge-effects from the classifier functions, the area corresponding to the ridge and swale topography will be clipped from a larger DEM. The nodata value for the input DEM will be used unless no_data_value is specified.

Image Denoising:

Once the ridge areas are classified within the DEM as a binary array (1=ridge, 0=swale), scrollstats uses a series of denoising algorithms to clean up the result. By default, scrollstats uses binary closing and binary opening operations to efficiently remove small objects from the binary image, then it uses another filter to remove of any remaining image object smaller than a certain size (measured in px). Each classifier function is applied to the binary array in sequence, meaning that the output of the first classifier function is the input of the second, and so on. Therefore, a different ordering of the same list of denoiser functions may yield a different result.

If the user desires, they can provide their own denoiser functions so long as the functions follow the pattern below

denoiser_func(BinarryArray2D, **kwargs) -> BinaryArray2D

See scrollstats/delineation/raster_denoisers.py for the DEFAULT_DENOISERS list of functions their definitions.

Keyword Arguments for Image Processing Functions:

Any additional arguments required by the classifier_funcs or denoier_funcs can be provided to this function as keyword arguments Any keyword arguments provided to this function will be passed to a given classifier or denoiser function if the provided keyword matches a keyword in the function’s signature.

type dem_ds:

DatasetReader

param dem_ds:

type geometry:

Polygon

param geometry:

type classifier_funcs:

tuple[Callable[..., ndarray], ...]

param classifier_funcs:

type denoiser_funcs:

tuple[Callable[..., ndarray], ...]

param denoiser_funcs:

type no_data_value:

Any | None

param no_data_value:

type kwargs:

Any

param kwargs:

scrollstats.create_ridge_area_raster_fs(dem_path, geometry_path, out_dir, bend_id_dict=None, **kwargs)

File system interface for create_ridge_area_raster

Parameters:
Return type:

tuple[Path, Path]

scrollstats.create_transects(centerline, ridges, step, shoot_distance, search_distance, dev_from_90)

Convenience function to create a series of transects from a given centerline, set of ridges, and the necessary parameters.

Transects are created at the step provided by the user (ex. every nth vertex along the centerline). Centerline is assumed to have a vertex spacing of ~1m.

Parameters:
  • centerline (GeoDataFrame) – GeoDataFrame containing the centerline geometry.

  • ridges (GeoDataFrame) – GeoDataFrame containing the ridge geometries.

  • step (int) – Number of centerline vertices between each transect.

  • shoot_distance (float) – How far each point will shoot from the origin in a given direction.

  • search_distance (float) – Buffer distance for the search area on r2.

  • dev_from_90 (float) – Allowed deviation from 90 degrees for p2 shots.

Returns:

GeoDataFrame containing the transects generated.

Return type:

GeoDataFrame

scrollstats.map_amp_values(amp_series, width_series)

Map the ridge amplidute values to their assumed location along the transect. Assumed location is the approximate midpoint of the ridge.

Parameters:
Return type:

ndarray[float]

Subpackages