CatDan Dev

Re-Scanning for Accuracy

Posted in dev by gwyant on March 14, 2010

Recently, I re-scanned the SEP entries in order to use a substantially more accurate regex for finding the InPhO terms within the entries. In addition, I grabbed their filesize and wordcount for future normalizing purposes. In working so long to get scans of this scale going, it made sense to make the SEP class more robust and have the iterative-scan provide more feedback on the progress of the scan through the set of all SEP entries. I also needed to add the ability to auto-scan in non-consecutive entries, since in the first attempt I missed about 180 entries because the script timed out at 60 seconds (the new regex is more accurate but very more slow).

I am liking the new non-grid layout of the activation table; now the sparse matrix is represented quite efficiently/sparsely, as only the SEP-term combos that have activations will appear in the table.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.