Hydrogen Orientation Analysis

Analysis of hydrogen orientations is used to classify conserved waters into 3 different types: Fully Conserved Water (FCW), Half Conserved Water (HCW) and Weakly Conserved Waters (WCW). See Theory, Background, and Methods for more information.

Overview

ConservedWaterSearch.hydrogen_orientation.hydrogen_orientation_analysis

Determines if the water cluster is conserved and of what type.

ConservedWaterSearch.hydrogen_orientation.find_fully_conserved_orientations

Check if orientations belong to FCW.

ConservedWaterSearch.hydrogen_orientation.find_half_conserved_orientations

Checks if given orientations belong to HCW.

ConservedWaterSearch.hydrogen_orientation.find_weakly_conserved_orientations

Checks if given orientations belong to WCW.

Details

ConservedWaterSearch.hydrogen_orientation.hydrogen_orientation_analysis(orientations: ndarray, pct_size_buffer: float = 0.85, kmeans_ang_cutoff: float = 120, kmeans_inertia_cutoff: float = 0.4, FCW_angdiff_cutoff: float = 5, FCW_angstd_cutoff: float = 17, min_samp_data_size_pct: float = 0.15, nonFCW_angdiff_cutoff: float = 15, HCW_angstd_cutoff: float = 17, WCW_angstd_cutoff: float = 20, weakly_explained: float = 0.7, xiFCW: tuple[float] | list[float] = (0.03,), xiHCW: tuple[float] | list[float] = (0.05, 0.01), xiWCW: tuple[float] | list[float] = (0.05, 0.001), njobs: int = 1, verbose: int = 0, debugH: int = 0, plotreach: bool = False, which: tuple[str] | list[str] = ('FCW', 'HCW', 'WCW'), normalize_orientations: bool = True) list[source]

Determines if the water cluster is conserved and of what type.

High level function that does hydrogen orientation analysis. Checks if the water cluster belongs into one of the following groups by analyzing hydrogen orientations:

  • FCW (Fully Conserved Water): hydrogens are strongly oriented in two directions with angle of 104.5

  • HCW (Half Conserved Water): one set (cluster) of hydrogens is oriented in certain directions and other are spread into different orientations with angle of 104.5

  • WCW (Weakly Conserved Water): several orientation combinations exist with satisfying water angles

See [TFJB22] for more information on water classification Theory, Background, and Methods. If orientations don’t satisfy the criteria for any of the waters, an empty list is returned.

Parameters:
  • orientations (np.ndarray) – array of hydrogen orientations in space

  • pct_size_buffer (float, optional) – Minimum allowed size of the hydrogen orientation cluster. Defaults to 0.85.

  • kmeans_ang_cutoff (float, optional) – Maximum value of angle (in deg) allowed for FCW in kmeans clustering to be considered correct water angle. Defaults to 120.

  • kmeans_inertia_cutoff (float, optional) – upper limit allowed on kmeans inertia (measure of spread of data in a cluster). Defaults to 0.4.

  • FCW_angdiff_cutoff (float, optional) – Maximum value of angle (in deg) allowed for FCW in OPTICS clustering to be considered correct water angle. Defaults to 5.

  • FCW_angstd_cutoff (float, optional) – Maximal standard deviation of angle distribution of orientations of two hydrogens allowed for water to be considered FCW. Defaults to 17.

  • min_samp_data_size_pct (float, optional) – Minimum samples to choose for OPTICS clustering as percentage of number of water molecules considered for HCW and WCW. Defaults to 0.15.

  • nonFCW_angdiff_cutoff (float, optional) – Maximum standard deviation of angle allowed for HCW and WCW to be considered correct water angle. Defaults to 15.

  • HCW_angstd_cutoff (float, optional) – Maximum standard deviation cutoff for WCW angles to be considered correct water angles. Defaults to 17.

  • WCW_angstd_cutoff (float, optional) – Maximum standard deviation cutoff for WCW angles to be considered correct water angles. Defaults to 20.

  • weakly_explained (float, optional) – percentage of explained hydrogen orientations for water to be considered WCW. Defaults to 0.7.

  • xiFCW (tuple, optional) – Xi value for OPTICS clustering for FCW. Don’t touch this unless you know what you are doing. Defaults to (0.03).

  • xiHCW (tuple, optional) – Xi value for OPTICS clustering for HCW. Don’t touch this unless you know what you are doing. Defaults to (0.05, 0.01).

  • xiWCW (tuple, optional) – Xi value for OPTICS clustering for WCW. Don’t touch this unless you know what you are doing. Defaults to (0.05, 0.001).

  • njobs (int, optional) – how many cpu cores to use for clustering. Defaults to 1.

  • verbose (int, optional) – verbosity of output. Defaults to 0.

  • debugH (int, optional) – debug level for orientations. Defaults to 0.

  • plotreach (bool, optional) – weather to plot the reachability plot for OPTICS when debuging. Defaults to False.

  • which (tuple, optional) – tuple of strings denoting which water types to search for. Allowed is any combination of FCW (fully conserved waters), HCW (half conserved waters) and WCW (weakly conserved waters). Defaults to [“FCW”, “HCW”, “WCW”].

  • normalize_orientations (bool, optional) – weather to normalize the orientation vectors to unit distance. Defaults to True.

Returns:

returns a list containing two orientations of hydrogens and water classification string (“FCW”, “HCW”, “WCW”), if not conserved returns an empty list

Return type:

list

ConservedWaterSearch.hydrogen_orientation.find_fully_conserved_orientations(orientations: ndarray, pct_size_buffer: float = 0.85, kmeans_ang_cutoff: float = 120, kmeans_inertia_cutoff: float = 0.4, angdiff_cutoff: float = 5, angstd_cutoff: float = 17.0, xi: float = 0.03, njobs: int = 1, verbose: int = 0, debugH: int = 0, plotreach: bool = False) list[source]

Check if orientations belong to FCW.

Checks if given oxygen cluster can be considered as a fully conserved water based on hydrogen orientations. Fully conserved water is one which has well defined hydrogen orientations in two distinctive groups (ie strongly hydrogen bonded for both hydrogens). To check if water is conserved, one first checks if k means clustering of hydrogen orientations gives two distinctive clusters with low inertia and required angle between the clusters. Afterwards more rigorous check is carried out with OPTICS clustering where again the spread of orientations and angle is considered.

Parameters:
  • orientations (np.ndarray) – array of hydrogen orientations in space

  • pct_size_buffer (float, optional) – Minimum allowed size of the hydrogen orientation cluster. Defaults to 0.85.

  • kmeans_ang_cutoff (float, optional) – Maximum value of angle (in deg) allowed for FCW in kmeans clustering to be considered correct water angle. Defaults to 120.

  • kmeans_inertia_cutoff (float, optional) – upper limit allowed on kmeans inertia (measure of spread of data in a cluster). Defaults to 0.4.

  • angdiff_cutoff (float, optional) – Maximum value of angle (in deg) allowed for FCW in OPTICS clustering to be considered correct water angle. Defaults to 5.

  • angstd_cutoff (float, optional) – Maximal standard deviation of angle distribution of orientations of two hydrogens allowed for water to be considered FCW. Defaults to 17.

  • xi (float, optional) – Xi value for OPTICS clustering for FCW. Don’t touch this unless you know what you are doing. Defaults to 0.03.

  • njobs (int, optional) – how many cpu cores to use for clustering. Defaults to 1.

  • verbose (int, optional) – verbosity of output. Defaults to 0.

  • debugH (int, optional) – debug level for orientations. Defaults to 0.

  • plotreach (bool, optional) – weather to plot the reachability plot for OPTICS when debugging. Defaults to False.

Returns:

returns a list containing two orientations of hydrogens and water classification string “FCW”, if not FCW returns empty list

Return type:

list

ConservedWaterSearch.hydrogen_orientation.find_half_conserved_orientations(orientations: ndarray, pct_size_buffer: float = 0.85, min_samp_data_size_pct: float = 0.35, angdiff_cutoff: float = 15, angstd_cutoff: float = 17.0, xi: float = 0.01, njobs: int = 1, verbose: int = 0, debugH: int = 0, plotreach: bool = False) list[source]

Checks if given orientations belong to HCW.

Checks if given oxygen cluster can be considered as a half conserved water based on hydrogen orientations. Half conserved water is one which has one well defined hydrogen orientation (ie one strongly hydrogen bonded hydrogen). To check if water is half conserved, one calculates OPTICS clustering of hydrogen orientations. One then loops over clusters in an attempt to find a hydrogen orientation cluster which is the size of oxygen cluster and weather the angle between that cluster with all other orientations is of right angle and if spread of orientations is sufficiently low.

Parameters:
  • orientations (np.ndarray) – array of hydrogen orientations in space

  • pct_size_buffer (float, optional) – Minimum allowed size of the hydrogen orientation cluster. Defaults to 0.85.

  • min_samp_data_size_pct (float, optional) – Minimum samples to choose for OPTICS clustering as percentage of number of water molecules considered for HCW and WCW. Defaults to 0.15.

  • angdiff_cutoff (float, optional) – Maximum standard deviation of angle allowed for HCW to be considered correct water angle. Defaults to 15.

  • angstd_cutoff (float, optional) – Maximum standard deviation cutoff for WCW angles to be considered correct water angles. Defaults to 17.

  • xi (float, optional) – Xi value for OPTICS clustering for HCW. Don’t touch this unless you know what you are doing. Defaults to 0.01.

  • njobs (int, optional) – how many cpu cores to use for clustering. Defaults to 1.

  • verbose (int, optional) – verbosity of output. Defaults to 0.

  • debugH (int, optional) – debug level for orientations. Defaults to 0.

  • plotreach (bool, optional) – weather to plot the reachability plot for OPTICS when debugging. Defaults to False.

Returns:

returns a list containing two orientations of hydrogens and water classification string “HCW”, if not HCW returns an empty list

Return type:

list

ConservedWaterSearch.hydrogen_orientation.find_weakly_conserved_orientations(orientations: ndarray, pct_size_buffer: float = 0.85, lower_bound_pct_buffer: float = 0.35, min_samp_data_size_pct: float = 0.15, pct_explained: float = 0.7, angdiff_cutoff: float = 15, angstd_cutoff: float = 20.0, xi: float = 0.01, njobs: int = 1, verbose: int = 0, debugH: int = 0, plotreach: bool = False) list[source]

Checks if given orientations belong to WCW.

Checks if given oxygen cluster can be considered as a weakly conserved water based on hydrogen orientations. weakly conserved water is one which has no well defined hydrogen orientation (ie no strongly hydrogen bonded hydrogen) but still has distinct hydrogen orientational clusters. To check if water is weakly conserved, one calculates OPTICS clustering of hydrogen orientations. One then loops over clusters in an atempt to find a pair of hydrogen orientation clusters which is of the same size and weather the angle between the two clusters is of right angle and if spread of orientations is sufficiently low. Aditionally triplets are checked as well. Here we do the same check but we are looking at cluster one vs two other clusters combined.

Parameters:
  • orientations (np.ndarray) – array of hydrogen orientations in space

  • pct_size_buffer (float, optional) – Minimum allowed size of the hydrogen orientation cluster. Defaults to 0.85.

  • lower_bound_pct_buffer (float, optional) – Minimum allowed size of the hydrogen orientation cluster. Defaults to 0.35.

  • min_samp_data_size_pct (float, optional) – Minimum samples to choose for OPTICS clustering as percentage of number of water molecules considered for HCW and WCW. Defaults to 0.15.

  • pct_explained (float, optional) – percentage of explained hydrogen orientations for water to be considered WCW. Defaults to 0.7.

  • angdiff_cutoff (float, optional) – Maximum standard deviation of angle allowed for WCW to be considered correct water angle. Defaults to 15.

  • angstd_cutoff (float, optional) – Maximum standard deviation cutoff for WCW angles to be considered correct water angles. Defaults to 20.

  • xi (float, optional) – Xi value for OPTICS clustering for WCW. Don’t touch this unless you know what you are doing. Defaults to 0.01.

  • njobs (int, optional) – how many cpu cores to use for clustering. Defaults to 1.

  • verbose (int, optional) – verbosity of output. Defaults to 0.

  • debugH (int, optional) – debug level for orientations. Defaults to 0.

  • plotreach (bool, optional) – weather to plot the reachability plot for OPTICS when debuging. Defaults to False.

Returns:

returns a list containing two orientations of hydrogens and water classification string “WCW”, if not WCW returns an empty list

Return type:

list