• Join
  • Login
  • Let’s Build our own Skynet – AI in a nutshell

    Jan 04, 2017

    Artificial Intelligence is on the rise - we cannot deny that. A huge amount of money is invested in the industry, especially in the US and China, and as we know, these two countries don’t like burning money for no reason.

    Some people call it “electricity of the XXI century” and some are afraid of the rise of Skynet from the movie Terminator. However, there are many exciting projects going on right at this moment: from autonomous cars to psychological analysis, image and voice recognition to even AI judges or policemen:

     

    Figure 1. Mr. Policeman and I at European Utility Week 2016 in Barcelona

     

    Let’s look at what it actually means and how important AI can be. So, what is AI? How powerful is it? Should we be afraid of creating a real-life thread like Skynet? In general, AI is a concept where it is possible for a machine to “think” or react like humans. But what does that really mean?

    First, we should dive a little bit deeper into how we, humans, make decisions. There are many branches of philosophy trying to answer this question (decadentism, modernism), but since we are scientists (or at least, interested in science) we can agree that empiricism  is something that suits our approach very well. We base our work on observation, experience and knowledge. The more experience you have, in science, sports or playing guitar, the better you are. Knowledge comes from experience. If we repeat a certain task over and over again, even with a different parameter, but with the same model, we can verify if our initial thesis is correct or not, based on observations and assumptions.

    Can we apply such an approach? Actually we can.

    Technology that might enable us to create real AI in the (near) future is called Machine Learning. Would you believe it if I told you that the basic concepts were developed more than 50 years ago? I was surprised myself.

    Machine learning was a forgotten branch of science for such a long time mostly due to technological limitations. It requires a substantial amount of computation power. Nowadays, we can play with machine learning with the same laptops that we carry in our backpacks. This is an outstanding technological leap that enables us to build upon ideas that were not possible 10 or 20 years before.

    Before we dive into details, we should ask ourselves: what do we actually want to achieve when using Artificial Intelligence?

    There are three main branches of machine learning:

    • Supervised machine learning
    • Unsupervised machine learning
    • Reinforced machine learning

    Supervised machine learning enables us to find patterns in data by comparing inputs and outputs.

    Please stay with me for a little experiment. I will give you a few inputs and outputs. Your task is to try to find a correlation.

    Input

    Output

    1

    2

    2

    4

    3

    6

    4

    8

     

    It was easy, huh? Yes, the output is obtained by multiplying the input by 2. Piece of cake. Do we really need a computer for that? Obviously not, but let’s make the problem more complicated.

    Let’s challenge ourselves with the very famous Iris data. Our task is to define the species of a flower, based on its petal width and petal length. Suddenly the problem gets too complicated for a human mind. What if I told you that there are many, many algorithms that will categorize the same data in various ways? On top of that, various parameters will impact the fitting of the data, as presented below.

     

    Figure 2. Iris data categorisation obtained by using the SVC algorithm with gamma = 0.2.

    Figure 3. Iris data categorisation obtained by using the SVC algorithm with gamma = 100

     

    But why is that? When we gather data, we deal not only with noises and measurement errors but also with uncertainty.

    What if I made the first thought challenge a little bit more complicated?

     

    Input

    Output

    1

    2

    2

    4

    3

    6

    4

    8

    5

    11

     

    You can easily misinterpret the data if there is limited data. In reality we cannot have the whole picture. We need to agree on having misclassifications and errors, but by carefully choosing and using a lot of data, we can approximate the optimal solution. Rising computer calculation power and big data analysis enable us to solve many problems with supervised learning – voice and image recognition as basic examples.

    Another branch of machine learning is reinforced machine learning. It is explained in a short and efficient way by the video below where an MIT team build an autonomous RC car that adjusts its steering pattern to the environment. It is very important to mention that the behaviour of the car is learned by experience, not by pure “if this then that” algorithm. Link: https://www.youtube.com/watch?v=opsmd5yuBF0

    In general, reinforced machine learning is similar to learning how to play a guitar. If you put your fingers in an incorrect position, you will experience unpleasant sounds; if you do it better, the outcome will result in a pleasurable feeling of experiencing art. For humans, the reward is objective, whether it be money, power, comfort or happiness from helping others, but for computers we can set a clear and consistent goal. If the result of the actions is far from the expected result, the cost (difference between the outcome and the goal) will be high, which means that our actions are not satisfactory. By changing actions along the experiment we are able to learn. To simplify, you can think about reinforced machine learning as a branch of supervised learning when the experiment continues forever. It is often said that “we learn our whole life” and that is true.

    The last branch of machine learning is unsupervised machine learning. Here we deal with unlabelled data and we have very little information about the structure of the data. Obviously, we have (or at least we should have) all the parameters of the data. Some of them can be missing due to technical problems or software bugs, but for simplification let’s assume that we have all the data we want.

    Let’s work on another example:

     

    Figure 4. Unlabeled data

     

    Obviously we, humans, can find patterns in this dataset. It looks like we have three big blobs. They represent something, but we do not know what. It might be a state of a system (stable, unstable, at risk of being unstable); it might be three species of a dog (based on the length and width of the body represented on the x and y axes). We don’t know. Computers, however, need an algorithm to solve this clustering challenge. To make the problem even more complicated, we do not know how many labels we should use. We see three blobs but they might represent data about students who passed and failed a course, or people being guilty of crimes or not. In this case there are only two labels – pass or fail, guilty or innocent.

    Figure 5. Clustering using KMeans algorithm for 3 clusters

     

    Figure 6. Clustering using KMeans algorithm for 2 clusters

     

    The biggest struggle with unsupervised learning is to decide how many labels we should use. If we use two labels for a problem that has three possible outcomes, we lose information. If we choose many labels, the information gain and clarity of the outcome is very, very weak.

    I hope that this article gave you basic insights on how we can build our own Skynet. Why would we do that? To automate boring activities or just to improve quality of life. Hopefully our creation will be less aggressive and violent than the Terminator.

    I am planning to make a series about the basics of machine learning, so any feedback is welcome. Are you interested in the topic? Follow me on the CommUnity platform and Twitter.

     

    Written by Wojciech Orzechowski

    The CommUnity Post

    04 January 2017


    Other news