Commit Graph

17 Commits

Author SHA1 Message Date
inocturnis
fb88efd510 Implemented all necessary indexer informations 2022-05-27 06:29:48 -07:00
inocturnis
c43d6aa0a9 Fully changed indexer and worker classes with properly indexing 2022-05-27 05:11:01 -07:00
inocturnis
53c7b49806 Massive changes to indexer and created merge 2022-05-27 03:08:56 -07:00
inocturnis
c4b3512df7 Changed tf_idf model into the new one, try it on the current dataset 2022-05-12 15:00:09 -07:00
iNocturnis
c8640001c7 Merge branch 'tf_idf' 2022-05-12 14:30:22 -07:00
Lacerum
f5610eaa62 tf-idf ngrams and now returns dict rather than
score
2022-05-11 14:46:32 -07:00
inocturnis
f1fe3b26ac Merged with weighting but cannot implement due to tokens being messy and some comparison error 2022-05-06 20:45:52 -07:00
iNocturnis
5c703b6471 Merge remote-tracking branch 'origin/posting' 2022-05-06 20:26:03 -07:00
inocturnis
c892bbac03 Changed counter for tf to one doing O(n) instead of O(n^2), included multi-threading to speed up processing speed 2022-05-06 20:22:52 -07:00
unknown
c616b37432 added important tokens 2022-05-06 17:18:34 -07:00
iNocturnis
8e7013e840 Merge branch 'main' into tf_idf 2022-05-06 14:58:48 -07:00
inocturnis
c05b4c7b09 Changed some files and tf_idf, added data storage, and finish the loop for indexing 2022-05-06 14:58:03 -07:00
Lacerum
b82516ec85 attempted fix for if-idf 2022-05-06 14:03:49 -07:00
Lacerum
b833afbfa3 filled out get_tf_idf, added test file for it 2022-05-06 04:04:04 -07:00
inocturnis
81da17de93 Stemmed done 2022-05-04 15:30:01 -07:00
inocturnis
fbb1a1ab2c Implemented a starting point for the project, run indexer.py, it will stop after 1 single file, a very rudimentary tokenzier implemented. 2022-05-04 13:26:18 -07:00
Hieuhuy Pham
1fb8fef7a3 First pushed, setup all the stuff we need, no launcher yet. So test your code in another place for now, because they are all codepended on each others ... 2022-05-04 12:22:20 -07:00