This Python script clusters keywords based on the similarity of their associated URLs using MinHash and MinHashLSH. The clustering process helps identify keywords that return similar search engine result pages (SERPs), which can be useful for SEO and content optimization strategies.
git clone https://github.com/dartseoengineer/keyword-clustering-minhash.gitThis Python script identifies keyword clusters by analyzing the similarity of their associated search engine result pages (SERPs). It uses MinHash to create similarity sketches of URLs and Locality Sensitive Hashing (LSH) to group keywords that return comparable results. The tool helps SEO professionals discover keywords with overlapping search results, enabling more strategic content optimization and competitive analysis. By processing CSV input files with configurable thresholds, it outputs clustered keywords that share similar SERP landscapes, streamlining keyword research workflows.
Install dependencies with `pip install pandas tqdm datasketch`. Run the script via command line: `python minhash-cluster-cli.py input.csv output.csv` with optional parameters for file separator, column names, and similarity threshold (default 0.6). The script outputs a CSV with Group IDs and clustered keywords.
Identify keyword opportunities with overlapping SERPs for content consolidation
Discover competitor keywords ranking in similar search results
Prioritize content optimization by grouping related keyword clusters
Analyze keyword difficulty by clustering based on search result similarity
No install command available. Check the GitHub repository for manual installation instructions.
git clone https://github.com/dartseoengineer/keyword-clustering-minhashCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
I need to cluster these keywords based on their associated URLs using MinHash and MinHashLSH. Here's the [DATA] in CSV format: [PASTE DATA]. Can you provide the Python code to perform this clustering and explain the results?
# Keyword Clustering Results ## Cluster 1: E-commerce Platforms - **Keywords**: online shopping, best deals, discount codes - **Top URLs**: amazon.com, ebay.com, walmart.com - **Similarity Score**: 0.85 ## Cluster 2: Digital Marketing Tools - **Keywords**: SEO tools, marketing automation, email campaigns - **Top URLs**: hubspot.com, mailchimp.com, moz.com - **Similarity Score**: 0.78 ## Cluster 3: Health and Wellness - **Keywords**: fitness tips, healthy recipes, mental health - **Top URLs**: healthline.com, webmd.com, verywellfit.com - **Similarity Score**: 0.82 ### Analysis The clustering algorithm identified three distinct groups of keywords that share similar SERPs. This suggests that search engines are grouping these keywords together in their ranking algorithms. Marketers can use this information to optimize content for these clusters and improve their SEO strategies.
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan