add instructions
This commit is contained in:
parent
add6f00ed6
commit
a9465dda6e
29
README.md
29
README.md
@ -2,6 +2,35 @@
|
||||
|
||||
Crawls sites saving all the found links to a surrealdb database. It then proceeds to take batches of 100 uncrawled links untill the crawl budget is reached. It saves the data of each site in a minio database.
|
||||
|
||||
## How to use
|
||||
|
||||
1. Clone the repo and `cd` into it.
|
||||
2. Build the repo with `cargo build -r`
|
||||
3. Start the docker conatiners
|
||||
1. cd into the docker folder `cd docker`
|
||||
2. Bring up the docker containers `docker compose up -d`
|
||||
4. From the project's root, edit the `Crawler.toml` file to your liking.
|
||||
5. Run with `./target/release/internet_mapper`
|
||||
|
||||
You can view stats of the project at `http://<your-ip>:3000/dashboards`
|
||||
|
||||
```bash
|
||||
# Untested script but probably works
|
||||
git clone https://git.oliveratkinson.net/Oliver/internet_mapper.git
|
||||
cd internet_mapper
|
||||
|
||||
cargo build -r
|
||||
|
||||
cd docker
|
||||
docker compose up -d
|
||||
cd ..
|
||||
|
||||
$EDITOR Crawler.toml
|
||||
|
||||
./target/release/internet_mapper
|
||||
|
||||
```
|
||||
|
||||
### TODO
|
||||
|
||||
- [x] Domain filtering - prevent the crawler from going on alternate versions of wikipedia.
|
||||
|
Loading…
x
Reference in New Issue
Block a user