From a9465dda6e32d86a92d0b14e23a1513db9272bf4 Mon Sep 17 00:00:00 2001 From: Rushmore75 Date: Mon, 31 Mar 2025 15:05:18 -0600 Subject: [PATCH] add instructions --- README.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/README.md b/README.md index a223089..7ec5c6d 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,35 @@ Crawls sites saving all the found links to a surrealdb database. It then proceeds to take batches of 100 uncrawled links untill the crawl budget is reached. It saves the data of each site in a minio database. +## How to use + +1. Clone the repo and `cd` into it. +2. Build the repo with `cargo build -r` +3. Start the docker conatiners + 1. cd into the docker folder `cd docker` + 2. Bring up the docker containers `docker compose up -d` +4. From the project's root, edit the `Crawler.toml` file to your liking. +5. Run with `./target/release/internet_mapper` + +You can view stats of the project at `http://:3000/dashboards` + +```bash +# Untested script but probably works +git clone https://git.oliveratkinson.net/Oliver/internet_mapper.git +cd internet_mapper + +cargo build -r + +cd docker +docker compose up -d +cd .. + +$EDITOR Crawler.toml + +./target/release/internet_mapper + +``` + ### TODO - [x] Domain filtering - prevent the crawler from going on alternate versions of wikipedia.