add instructions
This commit is contained in:
		
							
								
								
									
										29
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										29
									
								
								README.md
									
									
									
									
									
								
							@@ -2,6 +2,35 @@
 | 
			
		||||
 | 
			
		||||
Crawls sites saving all the found links to a surrealdb database. It then proceeds to take batches of 100 uncrawled links untill the crawl budget is reached. It saves the data of each site in a minio database.
 | 
			
		||||
 | 
			
		||||
## How to use
 | 
			
		||||
 | 
			
		||||
1. Clone the repo and `cd` into it.
 | 
			
		||||
2. Build the repo with `cargo build -r`
 | 
			
		||||
3. Start the docker conatiners
 | 
			
		||||
	1. cd into the docker folder `cd docker`
 | 
			
		||||
	2. Bring up the docker containers `docker compose up -d`
 | 
			
		||||
4. From the project's root, edit the `Crawler.toml` file to your liking.
 | 
			
		||||
5. Run with `./target/release/internet_mapper`
 | 
			
		||||
 | 
			
		||||
You can view stats of the project at `http://<your-ip>:3000/dashboards`
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
# Untested script but probably works
 | 
			
		||||
git clone https://git.oliveratkinson.net/Oliver/internet_mapper.git
 | 
			
		||||
cd internet_mapper
 | 
			
		||||
 | 
			
		||||
cargo build -r
 | 
			
		||||
 | 
			
		||||
cd docker
 | 
			
		||||
docker compose up -d
 | 
			
		||||
cd ..
 | 
			
		||||
 | 
			
		||||
$EDITOR Crawler.toml
 | 
			
		||||
 | 
			
		||||
./target/release/internet_mapper
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### TODO
 | 
			
		||||
 | 
			
		||||
- [x] Domain filtering - prevent the crawler from going on alternate versions of wikipedia.
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user