ANNchor is a python library which constructs approximate k-nearest neighbour graphs for slow metrics.
The k-NN graph is an extremely useful data structure that appears in a wide variety of applications, for example: clustering, dimensionality reduction, visualisation and exploratory data analysis (EDA).
However, if we want to use a slow metric, these k-NN graphs can take an exceptionally long time to compute.
Typical slow metrics include the Wasserstein metric (Earth Mover’s distance) applied to images, and Levenshtein (Edit) distance on long strings, where the time taken to compute these distances is significantly longer than a typical Euclidean distance.
ANNchor uses Machine Learning methods to infer true distances between points in a data set from a variety of features derived from anchor points (aka landmarks/waypoints).
In practice, this means that ANNchor does not make as many calls to the underlying metric as other state of the art k-NN graph generation techniques.
This translates to quicker run times, especially when the metric is slow.
Results from ANNchor can easily be combined with other popular libraries in the Data Science community.
In the docs we give examples of how to use ANNchor in an EDA pipeline alongside UMAP and HDBSCAN.
Clone this repo and install with pip:
pip install git+https://github.com/gchq/annchor.git
import numpy as np
import annchor
X = #your data, list/np.array of items
distance = #your distance function, distance(X[i],X[j]) = d
ann = annchor.Annchor(X,
distance,
n_anchors=15,
n_neighbors=15,
p_work=0.1)
ann.fit()
print(ann.neighbor_graph)
This repo contains all variants of information security & Bug bounty & Penetration Testing write-up…
site:*/sign-in site:*/account/login site:*/forum/ucp.php?mode=login inurl:memberlist.php?mode=viewprofile intitle:"EdgeOS" intext:"Please login" inurl:user_login.php intitle:"Web Management Login" site:*/users/login_form site:*/access/unauthenticated site:account.*.*/login site:admin.*.com/signin/…
Matrix is an open network for secure and decentralized communication. Users from every Matrix homeserver…
Linux Security And Monitoring Scripts are a collection of security and monitoring scripts you can…
A fiber is a unit of execution that must be manually scheduled by the application…
XSS Exploitation Tool is a penetration testing tool that focuses on the exploit of Cross-Site…