Pretty much everything in AI can be done easily except for these 3 factors:

1) Vision detection- There is no way to use a computer currently to determine what an object is.

2) Intelligence representation- There is no accepted standard for a computer to currently represent all the intelligence that it can know and experience.

3) Intelligence gaining- There is no way to take experiences and learn or refine its knowledge.

1) Vision detection:
What I believe I have as original vision detection methods can be summed up in one word: Context. Remember when you're a little kid, and you looked at those pictures: Which one of these is not like the others? Well we do that on a daily basis... We don't expect a car parked in someone's house unless we're in a garage. If we see a notebook, we'll expect a pen nearby. This is why it is so crucial to have a 3d engine built in. A 3d engine can be analyzed to draw contextual clues from.
If we see a building, a row of shrubs, or a pile of scrap metal then we don't classify every single window, every single leaf, or classify every small piece of scrap... We look at it as a solid wall and barrier that we can't pass by. Unless of course we were looking for a specific window, a specific leaf or berry, or a certain spare part. What we see really depends on what you're looking for.
Also when we know a building has a certain architecture, then we know its going to be the same when we come back. The furniture may have rearranged... Changing your furniture will triggers exploration in many housepets such as dogs. This is because they have to relearn their territory. Once everything is learned, the AI can make a % assumtion that nothing has changed since the last visit. If the AI then wanted to locate a specific object in the environment, the AI would then have it stored in memory as the first place to look. If the AI can not find the object that it is looking for in the places the object should be assumed to be, then the AI can consider the object lost... The AI would then have to choose to continue its goal without the object or initiate a find algorithim... People do this all the time when loosing their wallet or keys. In some crisis situations, people will go to work without their driver's license. In a non crisis situation, people may build up a list of places the object might be. And some people just randomly clean because by cleaning, they build a stronger representation of where objects are, find several objects they find lost along the way, and of course have a clean house.
I feel that if a 3d engine was built to represent the entire world that the AI can experience then you could capture many of the above elements. First off: you would be able to build and store the information in a way that is easily analzed. Secondly, you'd have coded up the imagination by simply cloning out the 3d world... The imagination is used to make guesses on what will occur if a choice is made. Also a 3d world could be analyzed to make contextual guesses about the objects nearby. Objects in 3d world also can be analyzed to see what they'd appear like in a different orientation. Its for this reason that I believe many scientists may be short sighted about the vision detection. Without having an articulate 3d world in which to judge relevance, the problem of visually identifying an object becomes much more difficult.

2) Intelligence representation:
If we're in the forest, we may see a tree, but not know its scientific name. This is not in the realm of vision, but this ties into the knowledge base of the AI. People have this problem too... Since a person's memory is much more limited than a computers, many people don't even care to know the difference between one tree from another. I don't blame them either. I know my capacity, and I know I can't memorize alot of boring facts... A computer however can memorize a large number of boring facts, and thats what it will do to represent its intelligence.
Lets assume with have an object oriented programming language, and we define an object with a dynamic set of variables assigned to it. We could define something like a ball, then have a variable that pointed to more specific objects like a basketball or tennisball... Then we could have a variable that pointed to its shape: a sphere. We could have a variable that says it bounces(which when applied to a 3d world would have a definate result). Anything relevant to the object is stored with it... The information may never be used, but its good to know more about an object if in the future the AI needs to use that specific aspect. Its pretty straightforward... Just make fields that are relevant and can be used in calculations later.
Also, goals and subgoal instruction sets need to be understood. The way the android makes a sandwhich... It goes to the bread drawer, opens it, and locates the bread of choice. Now if the bread drawer is jammed, it may enter a subroutine to unjam the bread drawer. If the bread is missing, it may look for bread or go buy a new loaf. If the android doesn't understand what to do, and doesn't have instructions... Then it could do something fancy, or just sit and wait for the instructor... Real people on a new job experience this problem. In solving the goal to make a sandwhich the android doesn't need to be concerned down to the level of atoms, and doesn't even need to know if the car is parked outside(unless it needs to go to the store). Goals and subgoals are easy to understand and represent.
Exceptions. A pen is like a pencil, except a pen has ink where a pencil uses graphite. If the difference is not much, an exception can be kept into one object... But as the differences compound(or even just with one difference to make coding easier), the objects split and form two memory nodes instead of one.
Believe it or not the AI now suddenly knows English. By storing information in English, having a 3d world, and being able to imagine what is happening, the AI understands English. "Spot is a dog. Spot runs." In the AI's mind, a new node is created that copies the dog's template. One field is added: name. The name is then assigned to dog. When you tell the AI: Spot runs. In its mind, it draws up a random environment that a dog is likely to be present, and in that environment the dog is running. Or say I could tell the AI: "I forcibly threw a super bounce ball off the mall by my place, and it almost came back and hit me in the head". The AI then would understand the speaker. The speaker being me: James. If I would have coded information about myself, it would understand my height and what a human looks like. If it knows where I live, and has been to the mall with me, it knows what mall I am talking about. If it has information about super bounce balls, it could understand they bounce really well and their angle of deflection is very random when thrown from a human. It could understand that by forcibly throwing a ball, that it would hurt if hit in a vital area of a human's body. By explaining that it almost hit my head, the AI could understand an event almost happened that was bad and/or possibly funny. Since the AI KNOWS what is happening in English, it could then translate to other languages without the context mistranslation that current machines do, but thats just and aside.
Events and experiences. Events are what happens during the day. Events can be stored directly in English format like a log from a chat. Events are disposable except for traumatic ones. If the AI makes a bad decision and severe consequences occur, then the AI really needs to think about that decision and work to make better decisions in the future... Real people do this all the time. Say you were in the finals of a tournament for a large sum of money, but fail. You may play the event over and over in a way not to fail in the future. You can't forget the event and many things related because it was so important. Events are compiled mainly in the sleep state to construct possible events(dreaming) and look into ways of understanding the world better.
Looking at intelligence memory storage seems really straightforward... You have a 3d world that the AI knows of, and all the objects and actions therein. Through use of a 3d imagination, the AI can then understand a story told to it in English(More advanced AI could even make up their own stories). So the AI has a 3d world to store, objects, and actions.

3- Now that you you have storage in place, the next thing you need is learning:
Now remarkably this is the easiest section of the three. Basically you can learn by taking input from the environment. But more rigourously you see that you can learn from your parent, learn from aquaintances, or learn from experiences.
Imprinting, the thing ducks have, and trust in humans... A kid is inclined to trust his parents more... A computer can trust its programmer 100%, in almost all programs today, the computer obeys the programmer 100%. One way of learning from the programmer can is just hardcoding in storage. That is the most basic method. A stronger method would be allowing voice recognition, or direct english commands in the interface given. You can use commands to teach the AI new objects and actions, but you must describe them based on familiar objects and actions. You can also use commands to teach the AI new ways to achieve goals which is like what a foreman does to his workers in a factory. At the most basic level, if the AI doesn't understand a word you are saying it will directly ask you the definition of the word based off of words it knows... At a higher level, it can infer a meaning of a word by the context that its used in a sentence as to not break conversation... There really is no need to make the higher level understanding at first though.
Much like learning from the parent, learning from others follows the same route. Sure its aquaintances will not be able to hardcode the information directly into the machine, but they will be able to use a text interface or a voice recognition system(maybe based on sight to watch facial reactions). But sometimes aquaintances tell the wrong information or lie. Aquaintances must be assigned trust values, or aquaintances can decieve the AI. Some children are like this too. Until you learn to distrust individuals you may fall for anything. One of my ex's likes to tell me how her best friend got her to drink water from a gutter by claiming it as magical fairy water. Trust can be calculated in an infinate number of ways, but a naieve way to go about it is: If the person tells you lies, they are to be distrusted.
Finally learning from experiences. Picture when you learned to play basketball. There had to be a time where you shot 3 balls in the same place even though you were aiming for the hoop. You then aimed for a place that wasn't the hoop, hoping the 4th ball would go into the hoop... Basically its just re-assessing how well goals are achieved and so on. It would be a small algorithm that allowed the AI to change up some coefficients here and there.

------------------------------ Well basically that is all you need to make AI.

Its not overly tough to understand, its just ALOT of work to code up.

Final specs of AI:

*A series of objects understanding a bunch of nouns

*A series of objects understanding a bunch of verbs

*A 3d imagination, that uses nounds and verbs from a conversation in context.

*Series of senses that detect things in the real world as being the nouns and verbs in its database. It will then generate a 3d lay of its environment in its mind. Also coded in are several physics rules.

*A textual log of all the experiences the AI had interacting with people and its environment.

*A huge list of lists of instruction sets.

All those different data sets need interface together, but when combined, you basically made life into a computer game. Your AI can help you win life, it just needs to know the rules and goals.

If you want me to expand and further explain any aspect in that you don't believe this proof is tight enough, just email me: [email protected]