Oliver/internet_mapper

Go to file

oliver 720adaa552 added support for nearly all html tags that can have a link

2024-11-12 17:50:06 -07:00

working, now onto speeding it up

2024-11-10 20:24:04 -07:00

added support for nearly all html tags that can have a link

2024-11-12 17:50:06 -07:00

.gitignore

working, now onto speeding it up

2024-11-10 20:24:04 -07:00

Cargo.lock

works more, but still not all the way

2024-11-09 11:30:32 -07:00

Cargo.toml

works more, but still not all the way

2024-11-09 11:30:32 -07:00

compose.yml

it works now

2024-11-09 15:28:10 -07:00

README.md

add useage

2024-08-26 01:14:10 -06:00

schema.surql

this is really all that's needed

2024-11-12 17:49:45 -07:00

README.md

Surreal Crawler

Mapping with a budget of 1000 (crawl 1000 sites, so many more links are actually discovered), on my webiste on 8/26/2024 took 1m9s.

This is including the crawl and loading into the database and linking sites. (Locally hosted surreal db instance)

This run created 4299 site links with 23286 links between the sites. (It found my this git site which really bolsters those numbers.)

Install / Build

You will need rust to compile the crawler rustup.rs
You need python3 (will come installed on most linux distros) and poetry for dependancy management.
- Install pipx, python3
- Then: pipx install poetry
- Then: poetry install to install the project dependancies
You need to install surrealdb

Use

Just run ./crawl.sh {url} and it will start crawling. You can tweak the budget inside crawl.sh if you want.

You can also prefix the command with time to benchmark the system, such as: time ./crawl.sh https://discord.com.