Dennis,
Just a note on the seq query in case you will consider using something like that in the future development:
On current machines, it should not take more than something between 0 and 25 milliseconds per a "seq" query (running against half a million records) if the phrase exists (and possibly a bit more if it does not exist). The way how you build those "40" queries can let SQL Server significantly benefit from cached (sub)queries so some of the subsequent queries might take almost no time + with certain algorithms you may be able to skip some queries considering the results from some of the previous queries.
Questions:
Doesn't your system have a problem with yes/no questions?
What exactly the system does when it "activates concept"?
You mentioned learning from the Internet.
What if the text_to_learn_from contains something like
"X is broken. [a few more sentences] X is blue.. X cannot do Y."
Is the system gonna understand that the X (if not broken) might be able to do the Y? Note the ambiguous "blue" and the possibility of making an incorrect relationship between the "blue" and the "cannot do Y".
DennisGorelik said:
"Find a meaning of a word means to find all concepts which are related to specified word."
..Which (for your system) means to find all the words which relatively often appeared close to that word, right? How about concepts like "he" or "it". You cannot copy the Internet into your DB. If you keep just the "top" 100000 phrases then it might be just lots of junk data. You need to get the meaning and generalize most of the knowledge. It's not very clear to me how you can generalize with the level of "understanding" which your system is capable of.
DennisGorelik said:
"If words are used together, then the words probably are related to each other."
True, but not necessarily in the sense of cause/effect. Also note that there are many causes and effects which cannot be described in a few words. It often gets a lot more complex.
Your google example (on the word counts) may not be very valuable because it does not say how many times are the words used on those pages and if they are close enough to each other to allow your system to pick the relationship up.
I recommend you to do some experiments with reasoning (including things like deduction, induction, abduction, analogy analyses etc.)
BTW in order to think well, your system also needs to have some sense about things like time, space, minds of other subjects etc.. And put some thoughts on system's creativity..