add instructions
This commit is contained in:
		
							
								
								
									
										29
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										29
									
								
								README.md
									
									
									
									
									
								
							@@ -2,6 +2,35 @@
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
Crawls sites saving all the found links to a surrealdb database. It then proceeds to take batches of 100 uncrawled links untill the crawl budget is reached. It saves the data of each site in a minio database.
 | 
					Crawls sites saving all the found links to a surrealdb database. It then proceeds to take batches of 100 uncrawled links untill the crawl budget is reached. It saves the data of each site in a minio database.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## How to use
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1. Clone the repo and `cd` into it.
 | 
				
			||||||
 | 
					2. Build the repo with `cargo build -r`
 | 
				
			||||||
 | 
					3. Start the docker conatiners
 | 
				
			||||||
 | 
						1. cd into the docker folder `cd docker`
 | 
				
			||||||
 | 
						2. Bring up the docker containers `docker compose up -d`
 | 
				
			||||||
 | 
					4. From the project's root, edit the `Crawler.toml` file to your liking.
 | 
				
			||||||
 | 
					5. Run with `./target/release/internet_mapper`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					You can view stats of the project at `http://<your-ip>:3000/dashboards`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```bash
 | 
				
			||||||
 | 
					# Untested script but probably works
 | 
				
			||||||
 | 
					git clone https://git.oliveratkinson.net/Oliver/internet_mapper.git
 | 
				
			||||||
 | 
					cd internet_mapper
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					cargo build -r
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					cd docker
 | 
				
			||||||
 | 
					docker compose up -d
 | 
				
			||||||
 | 
					cd ..
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					$EDITOR Crawler.toml
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					./target/release/internet_mapper
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### TODO
 | 
					### TODO
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- [x] Domain filtering - prevent the crawler from going on alternate versions of wikipedia.
 | 
					- [x] Domain filtering - prevent the crawler from going on alternate versions of wikipedia.
 | 
				
			||||||
 
 | 
				
			|||||||
		Reference in New Issue
	
	Block a user