Texai Project
Texai is a knowledge-based software project to create artificial intelligence.
Revised July 6, 2009
Introduction
Texai is an knowledge-based, open source project to create artificial intelligence. The project’s approach is to first construct a bootstrap English dialog system whose goals are to acquire linguistic and common sense skills to improve its own performance. Next the system will acquire expertise in algorithms, and in Java programming for the purpose of explicitly representing its own behavior in the knowledge base (KB). Thus it will understand, revise, test and automatically compose its own source code. In parallel at this point, the system will acquire lexical and common sense knowledge from the glosses (word sense definitions) in the Texai lexicon, and begin to covert Wikipedia English text into KB statements, fleshing out the OpenCyc terms. In addition to scaling to many disparate users via Jabber chat from a single Texai instance, the system will be deployed as a virtual appliance to compute clusters and to a multitude of Internet users, where each instance hosts one or more nodes organized within an Albus Hierarchical Control System. These Albus Nodes (i.e. agents) will be organized into agencies, many mirroring current human organizations in which a node is a user’s proxy into the Albus hierarchy for some role. The artificial intelligence will then consist of a vast community of organizations whose members are Albus nodes, each quite intelligent with regard to its agency’s mission.
Initial Deployment Plan
The initial deployment of the English bootstrap dialog system was planned for June 23, 2009, which is Alan Turing’s birthday. But the release will be delayed for about six weeks until after I return from vacation to provide the agile level of support required for beta testing.
Texai will communicate, as an online chatbot, with volunteer mentors to acquire English noun plural word forms, and subsequently word sense meanings. Usage during the remainder of 2009 will confirm or deny the hypothesis that Texai will be able to figure out for itself a substantial portion of the WordNet word senses from their text definitions, after having learned the most frequently occurring definition word senses.
What is Available Now
The project’s knowledge base is stored in the Sesame RDF server. Because the initial knowledge base is large, it has been partitioned into separate Sesame repositories. These have been extracted into RDF and have been released in the file download section here. The project’s domain objects are persisted in Sesame using the RDF Entity Manager and semantic annotations. The RDF Entity Manager is released as a separate component. See the the file download section here.
News
Click on the Home link above for the latest blog posts.
Immediate updates on Twitter



Stefano Bertolo on 25 May 2009 at 6:55 am #
hi Steve,
can you give a small example of:
i) an RDF representation of a Java class + methods;
ii) an English sentence that would describe a modification to said class/methods
iii) the RDF translation of said sentence
iv) the mechanism that would integrate i) and iii) to yield a correct RDF representation of the new desired functionality for the class + methods?
thanks in advance,
stefano
Steve Reed on 25 May 2009 at 8:52 pm #
Hi Stefano,
I made your great question into a blog post: Java Programming Via Dialog.
-Steve
Steve Reed on 12 Jun 2009 at 7:48 am #
On the AGI-list, Matt Mahoney said:
I entirely agree with Matt’s comment above. The notion of bootstrapping in the Texai English dialog system is to learn the meanings of the most frequently occurring words in the definitions of its yet-to-be-learned vocabulary, and then by reading their definitions, learn the meanings of the remaining words with help from a multitude of volunteer mentors.
In particular Matt said:
An analysis of the word usage frequency in the Texai vocabulary definitions reveals that knowing perhaps only 10000 frequently occurring words should be enough to understand half of the whole lexicon of 85000 English words.
I acknowledge that there must be a very expensive process of encoding knowledge explicitly. Like Cycorp’s initial approach for DARPA’s Rapid Knowledge Formation project, for which I was the first project manager, Texai will use English dialog to rapidly acquire knowledge. I hypothesize that such dialog greatly reduces the expense of teaching new facts to the system, and also permits a vast multitude of volunteer mentors to divide the effort: “many hands make light work”.
Steve Decato on 06 Nov 2009 at 1:53 pm #
I haven’t read much about where this project is headed but I had a similar idea to collect knowledge. A simple web prompt that could parse English inputs as knowledge entries. The system would accept all reasonable and understandable entries as facts. Fact contributors would be required to establish an account and a reliability rating would be established and maintained. Facts that are substantiated would increase a person’s reliability. Disputed facts would reduce reliability. Such a system with many participants would be self-policing, much in the same way the EBAY uses a feedback profile. A single knowledge database with reliable participants world-wide could collect the world’s knowledge over time. More or less a conversational Wikipedia.
Andrew H. on 05 Feb 2010 at 1:34 pm #
Here’s an application that strongly reminded me of texai:
http://www.techcrunch.com/2010/02/04/siri-iphone-personal-assistant/
Steve Reed on 12 Feb 2010 at 2:14 pm #
Hi Andrew,
Thanks for the link. I’ve kept track of this project over time, and you are right, it does have some things in common with what I’m trying to do with Texai.
-Steve
Brandon Bagwell on 18 Apr 2010 at 9:44 pm #
Hey Steven,
Great project here. While I have a CS background, I unfortunately don’t have much of an AI one though projects like this interest me greatly. I have to admit that I was kinda working on something with some friends a couple of years ago (though from a slightly different perspective). I was working on building a natural language processor for English and it seems we kept an issue of getting enough sample data for said processing.
Project Gutenburg (while full of rich, wonderful text) didn’t quite have the rich text I was hoping for but Wikipedia was an excellent source of text (as well as an extensive, if mudlesom KB).
For some fun with your raw data – Check out the Carnegie Mellon U Langage Model Tool Kit. After you get a nice base going, the next step in evolution is to be able to talk to it well :
Handy Link: http://www.speech.cs.cmu.edu/SLM_info.html
MechanicalCrowds on 30 Apr 2010 at 12:03 pm #
Fascinating stuff… I don’t know why more people don’t spend time doing this and collaborate. It’s a tough question but to me it’s like this: solve the seed AI problem and you’ll solve problems in many other fields (AI would eventually answer them for you).
I have three questions for you:
1) Going back to basics of seed AI:
“The task is not to build an AI with some astronomical level of intelligence; the task is building an AI which is capable of improving itself, of understanding and rewriting its own source code. The task is not to build a mighty oak tree, but a humble seed.” ~http://www.singinst.org/ourresearch/publications/GISAI/GISAI.html#para_seedAI
It seems you are putting some focus on language acquisition and understanding. To me, this is more of a side issue. The main focus for a seed AI project in my opinion should be make recursively self-improving code. Linguistics are part of that ‘oak tree’ and may be learned by the machine itself once it reaches a higher intelligence. Can you please explain your reasoning behind focusing on language first? Is it somehow a prerequisite to recursive self-improvement?
2) Will you be hard-coding any ethical values in the system or do you expect it to learn it?
3) Are you using any evolutionary programming techniques?
Thanks,
MechanicalCrowds
Steve Reed on 04 May 2010 at 7:38 am #
Hi MC,
My hypothesis is that language (i.e. NLP) is actually the least-effort path to AGI, in that recursive self-improvement will be bootstrapped by the AGI’s ability to be taught by volunteer mentors.
Regarding ethical values, I am hard coding governance at low levels – for example by using a Java security manager to enforce a sandbox on each user’s computer, or eventually Android smartphone. Beyond that, I treat governance as a distinguished skill that will be taught like any other skill.
Evolutionary programming techniques will be part of the AGI’s toolbox when it comes to designing new, and improving existing behaviors, to be used when rational analysis is not applicable. But my emphasis at first is on symbolic logical algorithm design, e.g. as though performed by a human software engineer.
-Steve
Jeff Zhuk on 07 May 2010 at 11:56 am #
Hi Steve,
You might remember me and I am definetely glad to see you online!
I found your name and Texai while looking for open source products with the keywords knowledge and conversation.
Your step-by-step approach and long term goals are very close to mine but the short term direction might be a bit different.
The scenario is very similar: conversational interface to extend a small domain knowledge (not necessary software related) with occasional participation of a “third man”, a script writer, who wrote original questions and can insert/change the scripted questions at run-time. Initially more present, then less and less this “third man” will help the process.
Initial domain knowledge is captured in owl files. This ontology is to be extended and it also supports the conversation.
Can your product be used in this scenario? Owl can be converted to RDF and opposite might be possible too.
I did some prototyping several years ago but your Texai might be a better way to accomplish this today.
What do you think?
Jeff
Steve Reed on 10 May 2010 at 3:46 am #
Hi Jeff, I do indeed fondly remember our meeting – at an AAAI symposium I think.
By coincidence, my own bootstrap English dialog approach has evolved to depend upon conversation-directing scripts, in which the scripts themselves can be created and edited via English dialog. When I achieve sufficient progress in my current infrastructure work, I’ll return to migrating the Texai bootstrap dialog code to the new OSGi modular execution framework. Then we can see how well our respective ideas complement each other.
Cheers.
-Steve
Jeff Zhuk on 12 May 2010 at 8:19 am #
Steve,
>scripts themselves can be created and edited via English dialog
Very good!
There is still a place for skeleton-scripts that will grow and get more meat with the conversations. Skeleton scripts establish rules and a process of buiding knowledge acquisition scripts in specific domains.
>When I achieve sufficient progress
What timeframe do you have in mind?
>complement each other
You started a great project with very ambicious but I beleive achievable goals. Reading the blog, I can see some features of the project and I probably need to read more to better understand the architecture and the main components (diagrams would be helpful).
>fondly remember our meeting
It wasn’t at a symposium.
A couple of hints:
http://javaschool.com/about/publications.html
and
http://javaschool.com/school/public/web/books/
Your name is there in the book’s Acknowledgements
Jeff_Zhuk@yahoo.com
hertzel on 08 Jun 2010 at 4:13 am #
Dear Steve
Is there any running online demo of Texai?
best regards
Alex on 13 Jun 2010 at 10:34 am #
Hi,
Am doing a project on cyc for my local college in Britain. can you supply me with any reference source on the construction of the project?
Steve Reed on 14 Jun 2010 at 6:01 am #
Hi,
I recommend the OpenCyc and Cyc Foundation web sites. The latest version of OpenCyc that I downloaded is implemented in closed-source Java but has a Java API that I wrote about eight years ago while working for Cycorp.
-Steve