7826c4cec6ab438a27d83a790a41097da1866dee
				
			
			
		
	make the root record (for links https://example.com/) have a record id of the url, thus preventing duplication when using upsert
Surreal Crawler
Mapping with a budget of 1000 (crawl 1000 sites, so many more links are actually discovered), on my webiste on 8/26/2024 took 1m9s.
This is including the crawl and loading into the database and linking sites. (Locally hosted surreal db instance)
This run created 4299 site links with 23286 links between the sites. (It found my this git site which really bolsters those numbers.)
Install / Build
- You will need rust to compile the crawler rustup.rs
 - You need python3 (will come installed on most linux distros) and poetry for dependancy management.
- Install 
pipx,python3 - Then: 
pipx install poetry - Then: 
poetry installto install the project dependancies 
 - Install 
 - You need to install surrealdb
 
Use
Just run ./crawl.sh {url} and it will start crawling. You can tweak the budget inside crawl.sh if you want.
You can also prefix the command with time to benchmark the system, such as: time ./crawl.sh https://discord.com.
Description
				
					Languages
				
				
								
								
									Rust
								
								97.8%
							
						
							
								
								
									HTML
								
								2%
							
						
							
								
								
									CSS
								
								0.1%