Darth Autocrat (Lyndon NA)
Darth Autocrat (Lyndon NA)

@darth_na

4 Tweets Jan 13, 2023
@schachin @cemper ?
They are based on language models,
derived from corpuses of content.
Things like Wiki (factual), news (factual/perspective) etc.
Break away from those, and the distributions change,
due to vocabulary (which is influenced by topic and language level)
@schachin @cemper They hit similar issues back in the 80's when attempting to make PoS taggers and chunkers,
based on the Brown/WSJ corpuses etc.
The algo's worked fantastically for those types of pieces,
but started to suffer when they tackled other topics and/or styles.
@schachin @cemper Even those systems (such as Brill's) that automatically generated heuristic rules found their error rate tended to worsen when they tackled other topics (and those generated syntactic templates that were broader than word-pair/triple specific!).
@schachin @cemper The next wave will be far better,
as they are going to be utilising larger semantic/syntactic units,
which permits greater variance (more branches/options).

Loading suggestions...