add instructions
This commit is contained in:
parent
add6f00ed6
commit
a9465dda6e
29
README.md
29
README.md
@ -2,6 +2,35 @@
|
|||||||
|
|
||||||
Crawls sites saving all the found links to a surrealdb database. It then proceeds to take batches of 100 uncrawled links untill the crawl budget is reached. It saves the data of each site in a minio database.
|
Crawls sites saving all the found links to a surrealdb database. It then proceeds to take batches of 100 uncrawled links untill the crawl budget is reached. It saves the data of each site in a minio database.
|
||||||
|
|
||||||
|
## How to use
|
||||||
|
|
||||||
|
1. Clone the repo and `cd` into it.
|
||||||
|
2. Build the repo with `cargo build -r`
|
||||||
|
3. Start the docker conatiners
|
||||||
|
1. cd into the docker folder `cd docker`
|
||||||
|
2. Bring up the docker containers `docker compose up -d`
|
||||||
|
4. From the project's root, edit the `Crawler.toml` file to your liking.
|
||||||
|
5. Run with `./target/release/internet_mapper`
|
||||||
|
|
||||||
|
You can view stats of the project at `http://<your-ip>:3000/dashboards`
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Untested script but probably works
|
||||||
|
git clone https://git.oliveratkinson.net/Oliver/internet_mapper.git
|
||||||
|
cd internet_mapper
|
||||||
|
|
||||||
|
cargo build -r
|
||||||
|
|
||||||
|
cd docker
|
||||||
|
docker compose up -d
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
$EDITOR Crawler.toml
|
||||||
|
|
||||||
|
./target/release/internet_mapper
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
### TODO
|
### TODO
|
||||||
|
|
||||||
- [x] Domain filtering - prevent the crawler from going on alternate versions of wikipedia.
|
- [x] Domain filtering - prevent the crawler from going on alternate versions of wikipedia.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user