The Virus
I really don't think so. I believe that microscopic life forms, especially the viral ones are the most powerful species around the world.
National Center for Health Statistics announced the top 10 causes of death in United States by 2006 as:
According to the some articles in the book, Next Fifty Years, by John Brockman, different sorts of long running viral infections cause most of the cancer types, and some diseases of those we know no proven cause yet, like Alzheimer, MS etc.
Therefore, I hate those God damn micro life forms. They're killing us cruelly since the beginning of time and we cannot fight against them in a sufficient way. Remember tuberculosis, black death, AIDS, SARS, ebola, Crimean-Congo haemorrhagic fever (CCHF), fatal influenza forms appearing annually in the recent years. What are we doing about them? We have beaten the historic forms of some viruses but we are very vulnerable to the evolved viral forms.
Come on doctors, work harder! Save us!
Nowadays, beautiful Charlize Theron is striving to beat the viral infection in her stomach. As far as I know, she spent 4-5 days in the intensive care unit of Cedars-Sinai Hospital, Los Angeles.
I am trying the understand the game of evolution... [Possibly, homo sapiens can be considered as the viruses of the universe, if we look at the scene from an objective macro perspective. But I don't want to sheer this way now...] What is the aim of this kind of a natural selection? One ugly ribonucleic acid chain (RNA) covered by a protein shell is trying to conquer one of the most beautiful organisms of the world: Charlize Theron.
It is insane! Really unfair.
Pray for Ms. Theron, my friends, we are not ready to lose her yet!
Get up girl! Get up and smile us again.
Decision Tree
Today, I am going to mention about creating a decision tree from a data table. Actually, for this simple operation, a lot of professionals pay huge amounts of money to the other professionals, let's say external data mining consultants, for having them give a meaning to their own enterprise data.
Here is the operation:
Firstly, we are going to analyze our columns (attributes) comprise the data table. The point of the analysis is to determine the classification attribute and the attributes which bring the highest level of information in their values. The attributes carrying the highest level of information are going to be our decision nodes in the tree.
Sample data table is below (fig. 1). The table is about passing a course at a college. The pink column, term result, is the classification attribute and available classes are "passed" and "failed". We are trying to reveal the effect of "self studying", "team work", "relations with the lecturer" and "the course attendance" to passing the course.
Figure 1.
There are some key points of the attrbute analysis. The value of an attribute regarding to the corresponding classification target is called Information Gain. The equation of information gain is as follows:
Equation 1.
Where P() denotes the probability and k stands for the label of the distinct values of an attribute. To make this clear, please check the example below:
Figure 2.
Figure 2 shows the sorted values and their corresponding classes of the attribute "course attendance". And it is a subset of fig. 1. The P(k) in equation 1 will take the values P(Bad), P(Good), P(Moderate) and P(None). The value of P(Bad) is 0.3 beacuse there are 3 out of 10 tuples labeled as Bad in fig. 2. Similarly the value of P(None) will be 0.1 (1/10).
The term Gini in equation 1 is the measure of impurity. And it is applied to the attributes of our table. Some attributes might be homogeneous which means that all the values of this attribute map the same class. In contrary, some attributes, like "course attendance" in fig. 2, have different values addressing the different classes and we call those attributes heterogenous or impure. There are different measures of impurity like classification error, entropy and Gini index. I am giving the equations of all but will be using the Gini in this text for measuring the impurity.
Figure 3.
The P(k) in the equation of Gini index will be P(Failed) = 1/3 and P(Passed)=2/3. And the Gini index for course attendance attribute will be 1-[(1/3)^2+(2/3)^2] = 0.44 where ^2 denotes the power of 2, in other words square operator.
As you can see, we will compute the impurity for each distinct value of every attribute, firstly. Secondly, we will calculate the information gain by every attribute by using their impurity degrees. Third, we will compare the information gain of the attributes and pick the one with the highest information gain level as the decision node of our tree. It will be end of a node description cycle. The similar cycles may follow in a recursive manner if further splits in the tree needed.
Let's start the procedure... The Gini of our initial table will be calculated by using our classification column, term result. There are 5 Failed and 5 Passed classes in the table. So
P(Failed)=0.5
P(Passed)=0.5
Gini(Parent) = 1-[0.5^2+0.5^2] = 1 - 0.5 = 0.5
According to that parent Gini value, the below figure shows the evaluation of our attributes regarding to their information gain.
Figure 4.
The winner is course attendance with the gain of 0.23. So our decision tree has a root and some leaves now. The very first form of the tree is given below (Fig. 5).
Figure 5
Now, cycle 1 ended and initial table in fig. 1 became shorter as given in fig. 6 below. We filtered the rows by giving the condition of Course Attendance in Good and Moderate. We have to split the nodes for those values.
Figure 6.
We will use the first 3 rows of data while we are running the second decision node description cycle. So that, in this cycle those 3 rows will become our parent table. Gini(Good) and Gini(Moderate) values have been calculated in cycle 1 and can be examined in fig. 4. Gini(Good) will be our Gini(Parent) in cycle 2. In this stage, course attendance will no longer be evaluated as a node candidate, we will focus on the other 3 columns: Self studied, team work and relations with lecturer. Next figure (Fig. 7) shows the information gain values of those attributes for the first 3 columns of the table in fig. 6.
Figure 7.
We have 2 attributes with the info gain of 0.44 (team work and self studied) so we can chose any of them as our second decision node in the tree. My choice will be team work. After cycle 2, the tree becomes as follows.
Figure 8.
It is time for cycle 3, where the situation with the moderate course attendants will be defined. We will use the second 3 rows of the table in figure 6, Gini(Parent) will be 0.44 and our candidate attributes will be self studied, team work and relations with lecturer. Let us see calculation in figure 9:
Figure 9.
Winner of cycle 3 is the attribute self studied with 0.44 of information gain. Finally, the decision tree is fully constructed. Here you are.
Figure 10.
According to the decision tree, relations with lecturer has no effect on passing this course.
As you can see, a software specialist can easily write a program for creating a decision tree by using a data set. No need to pay extra for this :)
Formally, subject of the decision trees is a vein of classification of predictive data mining which is a branch of machine learning.
God bless mathematics ;)
Envy-Free Cake Division 2
Think of this!
Envy-Free Cake Division
- One of the guys, let's say A, cuts the cake into 3 pieces
- B chooses one of the pieces
- C chooses one of the pieces
- A picks the last piece on the table
- A-C, A-B, C-B pairs try to agree that all the pieces are equal. If they cannot agree, they use the "I cut, you choose" procedure for their pieces.
- Party A cuts the cake into 3 pieces regarding just his feelings of equity (step II in fig. 1).
- Party B controls the pieces, if he thinks that at least two of the pieces are good for choosing (means those two are equal), he does nothing.
- Party C picks a piece.
- Party B picks a piece.
- Party A picks the remaining piece.
Everything is perfect. However, if party B thinks that one piece is quite bigger than the other ones, in the second bullet above, he takes the knife and trims the biggest piece in order to produce at least 2 tied pieces. (See the III in fig. 1) Then, no one can pick the "trm" bit in the stage 1. And procedure goes:
- Party C chooses a piece.
- If party C did not choose the trimmed piece (piece number 3 in fig. 1), party B has to pick trimmed piece.
- Party A picks the remaining one.
Stage 1 is done. No one envies the other. Now, it is time to divide the "trm" bit.
Figure 2. Selfridge & Conway stage 2.
- Among B and C, the party who did not pick the trimmed piece in stage 1 cuts the trm bit into 3 by his knife.
- If the party who cuts the trimming into 3 is B, then the parties pick the pieces (t1, t2, t3 in fig. 2) in C, A, B order. If the party that cuts the trm bit is C then the picking order is B, A, C.
Stage 2 ends. It is an envy-free division process. We're all done with this.
The problem, of course, goes too far by considering n parties to share the cake :)) but let us leave the rest of the story to the mathematicians.