internet_mapper/README.md

# Surreal Crawler

Crawls sites saving all the found links to a surrealdb database. It then proceeds to take batches of 100 uncrawled links untill the crawl budget is reached. It saves the data of each site in a minio database.


### TODO

- [ ] Domain filtering - prevent the crawler from going on alternate versions of wikipedia.
- [ ] Conditionally save content - based on filename or file contents
- [ ] GUI / TUI ?
- [ ] Better asynchronous getting of the sites. Currently it all happens serially.
readmd with test results 2024-08-26 07:01:11 +00:00			`# Surreal Crawler`

readme updates 2024-11-13 04:24:57 +00:00			`Crawls sites saving all the found links to a surrealdb database. It then proceeds to take batches of 100 uncrawled links untill the crawl budget is reached. It saves the data of each site in a minio database.`
readmd with test results 2024-08-26 07:01:11 +00:00

readme updates 2024-11-13 04:24:57 +00:00			`### TODO`
add useage 2024-08-26 07:14:10 +00:00
readme updates 2024-11-13 04:24:57 +00:00			`- [ ] Domain filtering - prevent the crawler from going on alternate versions of wikipedia.`
			`- [ ] Conditionally save content - based on filename or file contents`
			`- [ ] GUI / TUI ?`
			`- [ ] Better asynchronous getting of the sites. Currently it all happens serially.`