5b0a9bfbe2Git pushed after crawling #1
main
Hieuhuy Pham
2022-04-25 20:19:40 -0700
8d5a669d9eAdded some trap detection for really bad links
Hieuhuy Pham
2022-04-25 15:54:57 -0700
c1b7a50460Locks are not racing anymore, locks work multi-thread works, change some storing information stuff so its more readble, add some new regex but it will need to be trim later because it does not do its job
Hieuhuy Pham
2022-04-23 18:49:24 -0700
9c31a901b7another attempt at robots, merged regex as well
traps
Lacerum
2022-04-23 14:44:47 -0700
74063e5d00Fixed a lot of racing issues, there potentially could be a writer reader confusion type of thing, but it should not matter that much, as long as server is healthy we can let this bad boi lose
Hieuhuy Pham
2022-04-23 02:13:12 -0700
90a5d16456Load balancer installed, havent not been able to test yet
Hieuhuy Pham
2022-04-22 16:51:32 -0700
8b96a7c9f7More refinement of frontier and worker for delicious multi-threading
Hieuhuy Pham
2022-04-21 21:08:23 -0700
809b3dc820moved robots ok to other file like datacollect
Lacerum
2022-04-20 13:29:18 -0700
ab39c4b8c6changed elif to if to speed up regex in is_vaild
Lacerum
2022-04-20 12:18:25 -0700
af26611ef4hopeful fixes for issue #2,#3
Lacerum
2022-04-20 11:11:43 -0700
58d15918d5Change more syntax to get data collection working, check extracturl and sorted links into sets instead of lists to signifcantly reduce url extractions
Hieuhuy Pham
2022-04-20 04:03:58 -0700
d0dde4a4dbFixes error in syntax for new merged code from data collection branch, fixed 'infinite loop', added timers to measure performance of functions.
Hieuhuy Pham
2022-04-20 03:52:14 -0700