diff --git a/README.md b/README.md index a223089..7ec5c6d 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,35 @@ Crawls sites saving all the found links to a surrealdb database. It then proceeds to take batches of 100 uncrawled links untill the crawl budget is reached. It saves the data of each site in a minio database. +## How to use + +1. Clone the repo and `cd` into it. +2. Build the repo with `cargo build -r` +3. Start the docker conatiners + 1. cd into the docker folder `cd docker` + 2. Bring up the docker containers `docker compose up -d` +4. From the project's root, edit the `Crawler.toml` file to your liking. +5. Run with `./target/release/internet_mapper` + +You can view stats of the project at `http://:3000/dashboards` + +```bash +# Untested script but probably works +git clone https://git.oliveratkinson.net/Oliver/internet_mapper.git +cd internet_mapper + +cargo build -r + +cd docker +docker compose up -d +cd .. + +$EDITOR Crawler.toml + +./target/release/internet_mapper + +``` + ### TODO - [x] Domain filtering - prevent the crawler from going on alternate versions of wikipedia.