add instructions
This commit is contained in:
		
							
								
								
									
										29
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										29
									
								
								README.md
									
									
									
									
									
								
							| @@ -2,6 +2,35 @@ | |||||||
|  |  | ||||||
| Crawls sites saving all the found links to a surrealdb database. It then proceeds to take batches of 100 uncrawled links untill the crawl budget is reached. It saves the data of each site in a minio database. | Crawls sites saving all the found links to a surrealdb database. It then proceeds to take batches of 100 uncrawled links untill the crawl budget is reached. It saves the data of each site in a minio database. | ||||||
|  |  | ||||||
|  | ## How to use | ||||||
|  |  | ||||||
|  | 1. Clone the repo and `cd` into it. | ||||||
|  | 2. Build the repo with `cargo build -r` | ||||||
|  | 3. Start the docker conatiners | ||||||
|  | 	1. cd into the docker folder `cd docker` | ||||||
|  | 	2. Bring up the docker containers `docker compose up -d` | ||||||
|  | 4. From the project's root, edit the `Crawler.toml` file to your liking. | ||||||
|  | 5. Run with `./target/release/internet_mapper` | ||||||
|  |  | ||||||
|  | You can view stats of the project at `http://<your-ip>:3000/dashboards` | ||||||
|  |  | ||||||
|  | ```bash | ||||||
|  | # Untested script but probably works | ||||||
|  | git clone https://git.oliveratkinson.net/Oliver/internet_mapper.git | ||||||
|  | cd internet_mapper | ||||||
|  |  | ||||||
|  | cargo build -r | ||||||
|  |  | ||||||
|  | cd docker | ||||||
|  | docker compose up -d | ||||||
|  | cd .. | ||||||
|  |  | ||||||
|  | $EDITOR Crawler.toml | ||||||
|  |  | ||||||
|  | ./target/release/internet_mapper | ||||||
|  |  | ||||||
|  | ``` | ||||||
|  |  | ||||||
| ### TODO | ### TODO | ||||||
|  |  | ||||||
| - [x] Domain filtering - prevent the crawler from going on alternate versions of wikipedia. | - [x] Domain filtering - prevent the crawler from going on alternate versions of wikipedia. | ||||||
|   | |||||||
		Reference in New Issue
	
	Block a user