once the crawl is 'done' #19

Open
opened 2025-08-13 14:01:36 +00:00 by Oliver · 0 comments
Owner

After the crawl is done ( crawled = true on everything ) doing something like this can clean up stragglers

find | grep crawl_temp > temp
# Edit the 'temp' file and remove the '.crawl_temp' from the file, then put in the following surreal sql statement:
$arry = [
'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/headers/d45110/d4511077_ne_hdr.txt',
'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/headers/d45110/d4511072_se_hdr.txt',
'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/headers/d46105/d4610565_se_hdr.txt',
'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/spc/sid/d45110/d4511025.txt',
'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/spc/sid/d47109/d4710974.sid',
'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/spc/sid/d46106/d4610678.sid',
'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/spc/sid/d48106/d4810683.sdw',
'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/spc/sid/d48111/d4811182.txt',
'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/spc/sid/d48113/d4811372.txt',
    
];

for $i in $arry {
    update website set crawled = false where site ~ $i;
};
After the crawl is done ( `crawled = true` on everything ) doing something like this can clean up stragglers ```bash find | grep crawl_temp > temp # Edit the 'temp' file and remove the '.crawl_temp' from the file, then put in the following surreal sql statement: ``` ```sql $arry = [ 'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/headers/d45110/d4511077_ne_hdr.txt', 'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/headers/d45110/d4511072_se_hdr.txt', 'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/headers/d46105/d4610565_se_hdr.txt', 'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/spc/sid/d45110/d4511025.txt', 'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/spc/sid/d47109/d4710974.sid', 'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/spc/sid/d46106/d4610678.sid', 'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/spc/sid/d48106/d4810683.sdw', 'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/spc/sid/d48111/d4811182.txt', 'ftpgeoinfo.msl.mt.gov/Data/Spatial/MSDI/Imagery/DOQQ_BW/spc/sid/d48113/d4811372.txt', ]; for $i in $arry { update website set crawled = false where site ~ $i; }; ```
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Oliver/internet_mapper#19
No description provided.