Re-Scanning for Accuracy
Recently, I re-scanned the SEP entries in order to use a substantially more accurate regex for finding the InPhO terms within the entries. In addition, I grabbed their filesize and wordcount for future normalizing purposes. In working so long to get scans of this scale going, it made sense to make the SEP class more robust and have the iterative-scan provide more feedback on the progress of the scan through the set of all SEP entries. I also needed to add the ability to auto-scan in non-consecutive entries, since in the first attempt I missed about 180 entries because the script timed out at 60 seconds (the new regex is more accurate but very more slow).
I am liking the new non-grid layout of the activation table; now the sparse matrix is represented quite efficiently/sparsely, as only the SEP-term combos that have activations will appear in the table.
leave a comment