Web crawler + storage + visualization (soon)
Go to file
2024-08-26 01:06:13 -06:00
.vscode add 2024-08-23 05:22:49 -06:00
src let me cook! 2024-08-25 15:50:59 -06:00
.gitignore add 2024-08-23 05:22:49 -06:00
Cargo.lock let me cook! 2024-08-25 15:50:59 -06:00
Cargo.toml use custom spider dependency 2024-08-26 01:06:13 -06:00
compose.yml let me cook! 2024-08-25 15:50:59 -06:00
crawl.sh it helps when you don't reference [0] for two separate elements... 2024-08-26 00:55:36 -06:00
driver.py it helps when you don't reference [0] for two separate elements... 2024-08-26 00:55:36 -06:00
poetry.lock let me cook! 2024-08-25 15:50:59 -06:00
pyproject.toml let me cook! 2024-08-25 15:50:59 -06:00
README.md readmd with test results 2024-08-26 01:01:11 -06:00
schema.surql update schema 2024-08-26 00:57:36 -06:00
test let me cook! 2024-08-25 15:50:59 -06:00

Surreal Crawler

Mapping with a budget of 1000 (crawl 1000 sites, so many more links are actually discovered), on my webiste on 8/26/2024 took 1m9s.

This is including the crawl and loading into the database and linking sites. (Locally hosted surreal db instance)

This run created 4299 site links with 23286 links between the sites. (It found my this git site which really bolsters those numbers.)