Showing newest 2 of 4 posts from July 2009. Show older posts
Showing newest 2 of 4 posts from July 2009. Show older posts

Thread Programming in C, Java and Python with mutual exclusion/synchronization/locks

Wednesday, July 29, 2009

The concept of thread programming at times intimidates the programmers, this is due to the way its been implemented with different constructs, but the purpose is the same, to efficiently use system resources with concurrent execution, with that being said, this simple writing gives you an introduction to programming with threads and how it affects program execution with and without the use of locks or mutual exclusion or synchronization, whatever you may call it, lets start.

The below three programs are similar, they use threads without and with locks, there will be four threads which would increment a shared counter variable (or a global variable), sleep for a second and tries to print the incremented value, we will see the case without the use of the locks to see how the processor switches between threads and the effect of using locks to synchronize code so that only one thread can access a portion of the synchronized block and runs into completion.

As always the languages used are C, java and python, lets start with C language for change (Note: gcc compiler is used here).

1. Thread Programming in C with and without mutual exclusion
2. Thread Programming in java with and without synchronization
3. Thread Progamming in Python with and without synchronization

1. Thread Programming in C with and without mutual exclusion

Listing 1.1 threads.c (Threads without mutual exclusion)



1.  /* threads.c
2. Example of multithreaded programming in C without mutual exclusion/locks
3. Author: S.Prasanna
4. */
5.
6. #include <pthread.h>
7. #include <stdio.h>
8.
9. int count = 0;
10.
11. void* run_thread()
12. {
13. pthread_t thread_id = pthread_self();
14. printf("Thread %u: Current value of count = %d\n", thread_id, count);
15. printf("Thread %u incrementing count ...\n");
16. count++;
17. sleep(1);
18. printf("Value of count after incremented by thread %u = %d\n", thread_id, count);
19. pthread_exit(NULL);
20. }
21.
22. int main (int argc, char *argv[])
23. {
24. pthread_t thread_array[4];
25. int i = 0, ret, thread_num = 4;
26.
27. for (i = 0; i < thread_num; i++) {
28. if ((ret = pthread_create(&thread_array[i], NULL, run_thread, NULL)) == -1) {
29. printf("Thread creation failed with return code: %d", ret);
30. exit(ret);
31. }
32. }
33. pthread_exit(NULL);
34. }
Output:

Thread 2: Current value of count = 0
Thread 2 incrementing count ...
Thread 3: Current value of count = 1
Thread 3 incrementing count ...
Thread 4: Current value of count = 2
Thread 4 incrementing count ...
Thread 5: Current value of count = 2
Thread 5 incrementing count ...
Value of count after incremented by thread 4 = 4
Value of count after incremented by thread 3 = 4
Value of count after incremented by thread 5 = 4
Value of count after incremented by thread 2 = 4

Explanation:

In the above code, we use Pthreads interface for implementing the thread functionality in C, although I won't be going through the Pthread methods here, the above code creates four threads (line 27 - 31) which invokes the function run_thread function whose return type is a pointer to void (line 11) where all threads increment the global count value (line 16), sleeps for a second and prints the value of count after its incremented by that thread (lines 11 - 20), since context switch occurs when threads sleep, the other threads increment the value by the time a thread sleeps and so on, as a result the threads couldn't print the value of count correctly after its incremented by them, this problem can be solved using mutual exclusion construct as shown below.

Listing 1.2 threads_lock.c (Threads with mutual exclusion)

1.  /* threads_lock.c
2. Example of multithreaded programming in C with mutual exclusion/locks
3. Author: S.Prasanna
4. */
5.
6. #include <pthread.h>
7. #include <stdio.h>
8.
9. int count = 0;
10. pthread_mutex_t thread_lock;
11.
12. void* run_thread()
13. {
14. pthread_mutex_lock(&thread_lock);
15. pthread_t thread_id = pthread_self();
16. printf("Thread %u: Current value of count = %d\n", thread_id, count);
17. printf("Thread %u incrementing count ...\n");
18. count++;
19. sleep(1);
20. printf("Value of count after incremented by thread %u = %d\n", thread_id, count);
21. pthread_mutex_unlock(&thread_lock);
22. pthread_exit(NULL);
23. }
24.
25. int main (int argc, char *argv[])
26. {
27. pthread_t thread_array[4];
28. int i = 0, ret, thread_num = 4;
29.
30. for (i = 0; i < thread_num; i++) {
31. if ((ret = pthread_create(&thread_array[i], NULL, run_thread, NULL)) == -1) {
32. printf("Thread creation failed with return code: %d", ret);
33. exit(ret);
34. }
35. }
36. pthread_exit(NULL);
37. }
38.
Output:

Thread 2: Current value of count = 0
Thread 2 incrementing count ...
Value of count after incremented by thread 2 = 1
Thread 5: Current value of count = 1
Thread 5 incrementing count ...
Value of count after incremented by thread 5 = 2
Thread 4: Current value of count = 2
Thread 4 incrementing count ...
Value of count after incremented by thread 4 = 3
Thread 3: Current value of count = 3
Thread 3 incrementing count ...
Value of count after incremented by thread 3 = 4

Explanation:

The above code is similar to the Listing 1.1 with mutual exclusion construct to ensure exclusive access to the portion of code which would be executed only by a single thread to completion without context switching, a thread acquires a lock with the call to pthread_mutex_lock (line 14) and releases the lock using pthread_mutex_unlock (line 21), both methods take a pthread_mutex_t member thread_lock on which the lock is acquired or released (line 10), thus mutual exclusion ensures that all atomic operations run to completion without context switching.

2. Thread Programming in java with and without synchronization

Listing 2.1 Counter.java (Threads without synchronization)
1.  /*
2. * An example of threads without lock (synchronized block)
3. * Author: S.Prasanna
4. *
5. */
6.
7. public class Counter implements Runnable {
8.
9. int count = 0;
10.
11. public Counter() {
12. }
13.
14. public void run() {
15. System.out.println(Thread.currentThread().getName() + ": " + "Current Value of count = " + count);
16. System.out.println(Thread.currentThread().getName() + " increments count....");
17. count ++;
18. try {
19. Thread.currentThread().sleep(1000);
20. } catch (InterruptedException e) {
21. System.out.println("Thread " + Thread.currentThread().getName() + "interrupted.");
22. }
23. System.out.println("Value of count after incremented by thread "
24. + Thread.currentThread().getName() + " = " + count);
25. }
26.
27. public static void main(String args[]) {
28. Counter c = new Counter();
29.
30. Thread t1 = new Thread(c);
31. Thread t2 = new Thread(c);
32. Thread t3 = new Thread(c);
33. Thread t4 = new Thread(c);
34.
35. t1.setName("Thread 1");
36. t2.setName("Thread 2");
37. t3.setName("Thread 3");
38. t4.setName("Thread 4");
39.
40. t1.start();
41. t2.start();
42. t3.start();
43. t4.start();
44.
45. try {
46. t1.join();
47. t2.join();
48. t3.join();
49. t4.join();
50. }
51. catch (InterruptedException e) {
52. System.out.println("Main thread Interrupted .....");
53. }
54.
55. System.out.println("Main thread exits.....");
56. }
57. }
58.
Output:

C:\>java Counter
Thread 1: Current Value of count = 0
Thread 1 increments count....
Thread 2: Current Value of count = 1
Thread 2 increments count....
Thread 3: Current Value of count = 2
Thread 3 increments count....
Thread 4: Current Value of count = 3
Thread 4 increments count....
Value of count after incremented by thread Thread 1 = 4
Value of count after incremented by thread Thread 2 = 4
Value of count after incremented by thread Thread 3 = 4
Value of count after incremented by thread Thread 4 = 4
Main thread exits.....

Explanation:

The main method creates four threads (lines 30 - 33) created by extending the Runnable interface which increments the Counter's instance variable count (line 17), sleeps for a second and prints the incremented value (lines 14 - 25), since context switch occurs when a thread sleeps, other threads increment the count value by the time a thread sleeps and when the sleeping threads resumes execution, it read a wrong count value, this can be fixed with the synchronized construct as seen in Listing 2.2.

Listing 2.2 Counter_locks.java (Threads with synchronization)

1.  /*
2. * An example of threads without lock (synchronized block)
3. * Author: S.Prasanna
4. *
5. */
6.
7. public class Counter_locks implements Runnable {
8.
9. int count = 0;
10.
11. public Counter_locks() {
12. }
13.
14. public synchronized void run() {
15. System.out.println(Thread.currentThread().getName() + ": " + "Current Value of count = " + count);
16. System.out.println(Thread.currentThread().getName() + " increments count....");
17. count ++;
18. try {
19. Thread.currentThread().sleep(1000);
20. } catch (InterruptedException e) {
21. System.out.println("Thread " + Thread.currentThread().getName() + "interrupted.");
22. }
23. System.out.println("Value of count after incremented by thread "
24. + Thread.currentThread().getName() + " = " + count);
25. }
26.
27. public static void main(String args[]) {
28. Counter_locks c = new Counter_locks();
29.
30. Thread t1 = new Thread(c);
31. Thread t2 = new Thread(c);
32. Thread t3 = new Thread(c);
33. Thread t4 = new Thread(c);
34.
35. t1.setName("Thread 1");
36. t2.setName("Thread 2");
37. t3.setName("Thread 3");
38. t4.setName("Thread 4");
39.
40. t1.start();
41. t2.start();
42. t3.start();
43. t4.start();
44.
45. try {
46. t1.join();
47. t2.join();
48. t3.join();
49. t4.join();
50. }
51. catch (InterruptedException e) {
52. System.out.println("Main thread Interrupted .....");
53. }
54.
55. System.out.println("Main thread exits.....");
56. }
57. }
58.
Output:

C:\>java Counter_locks
Thread 1: Current Value of count = 0
Thread 1 increments count....
Value of count after incremented by thread Thread 1 = 1
Thread 4: Current Value of count = 1
Thread 4 increments count....
Value of count after incremented by thread Thread 4 = 2
Thread 3: Current Value of count = 2
Thread 3 increments count....
Value of count after incremented by thread Thread 3 = 3
Thread 2: Current Value of count = 3
Thread 2 increments count....
Value of count after incremented by thread Thread 2 = 4
Main thread exits.....

Explanation:


Listing 2.2
differs from Listing 2.1 in a single line, it synchronizes the run method (line 14) executed by threads to ensure that only a single thread can access the synchronized block at any point of time, therefore a thread runs through the synchronized block of code without context switching, resulting in printing the correct count value after it increments it.

3. Thread Programming in python with and without synchronization

For a comprehensive introduction on python thread programming, do read my article on Introduction to python thread programming, I am listing that portion of the code here for the sake of completeness.

Listing 3.1 pythtreads.py (Threads without locks)

1.  #
2. # Invoking multiple threads from python without locking mechanism
3. # pythreads.py
4. # Author: S.Prasanna
5. #
6.
7. import thread
8. import threading
9. import time
10.
11. def run_thread (threadname, sleeptime):
12. """This is the thread function to be invoked."""
13.
14. global threadcount, activethreads
15.
16. print "%s: Current value of threadcount %s" % (threadname, threadcount)
17. print "%s incrementing threadcount" % (threadname)
18. threadcount = threadcount + 1
19. time.sleep(sleeptime)
20. print "Value of threadcount after incremented by %s = %s" % (threadname, threadcount)
21. activethreads = activethreads - 1
22. print "%s exiting...." % (threadname)
23.
24. if __name__ == "__main__":
25.
26. threadcount=0
27. activethreads = 4
28.
29. thread.start_new_thread(run_thread, ("Thread1", 1))
30. thread.start_new_thread(run_thread, ("Thread2", 1))
31. thread.start_new_thread(run_thread, ("Thread3", 1))
32. thread.start_new_thread(run_thread, ("Thread4", 1))
33.
34. while activethreads > 0:
35. pass
36.
Output:

C:\>python pythreads.py
Thread1: Current value of threadcount 0
Thread1 incrementing threadcount
Thread2: Current value of threadcount 1
Thread2 incrementing threadcount
Thread3: Current value of threadcount 2
Thread3 incrementing threadcount
Thread4: Current value of threadcount 3
Thread4 incrementing threadcount
Value of threadcount after incremented by Thread1 = 4
Thread1 exiting....
Value of threadcount after incremented by Thread2 = 4
Thread2 exiting....
Value of threadcount after incremented by Thread3 = 4
Thread3 exiting....
Value of threadcount after incremented by Thread4 = 4
Thread4 exiting....

Explanation:

Thread programming python is quite simple to say the least, the main creates four threads using the thread module's start_new_thread() method (lines 29 - 32), each invokes the run_thread() (line 11) method which increments a shared variable count, sleeps for a second and prints the incremented value, since no lock mechanism is used ensure exclusive access, context switch occurs when the threads sleep as a result they read a wrong value upon resuming (after its incremented by other threads), the solution, Listing 3.2 with locks.

Listing 3.2 pythtreads_lock.py (Threads with locks)

1.  #
2. # Invoking multiple threads from python with locks
3. # pythreads_lock.py
4. # Author: S.Prasanna
5. #
6.
7. import thread
8. import threading
9. import time
10.
11. def run_thread (threadname, sleeptime):
12. """This is the thread function to be invoked."""
13.
14. global threadcount, activethreads, threadlock
15.
16. threadlock.acquire()
17.
18. print "%s: Current value of threadcount %s" % (threadname, threadcount)
19. print "%s incrementing threadcount" % (threadname)
20. threadcount = threadcount + 1
21. time.sleep(sleeptime)
22. print "Value of threadcount after incremented by %s = %s" % (threadname, threadcount)
23. activethreads = activethreads - 1
24. print "%s exiting...." % (threadname)
25.
26. threadlock.release()
27.
28. if __name__ == "__main__":
29.
30. threadcount=0
31. activethreads = 4
32. threadlock = thread.allocate_lock()
33.
34. thread.start_new_thread(run_thread, ("Thread1", 1))
35. thread.start_new_thread(run_thread, ("Thread2", 1))
36. thread.start_new_thread(run_thread, ("Thread3", 1))
37. thread.start_new_thread(run_thread, ("Thread4", 1))
38.
39. while activethreads > 0:
40. pass
41.
Output:

C:\>python pythreads_lock.py
Thread1: Current value of threadcount 0
Thread1 incrementing threadcount
Value of threadcount after incremented by Thread1 = 1
Thread1 exiting....
Thread2: Current value of threadcount 1
Thread2 incrementing threadcount
Value of threadcount after incremented by Thread2 = 2
Thread2 exiting....
Thread3: Current value of threadcount 2
Thread3 incrementing threadcount
Value of threadcount after incremented by Thread3 = 3
Thread3 exiting....
Thread4: Current value of threadcount 3
Thread4 incrementing threadcount
Value of threadcount after incremented by Thread4 = 4
Thread4 exiting....

Explanation:

The thread.allocate_lock() (line 32) method creates a lock object which a thread acquires using its acquire() method (line 16) before performing an atomic operation (lines 17 - 25), releases the lock using the release() method (line 26) so that other threads can access the mutually exclusive code block.

The concept of Naive Page rank and its significance

Tuesday, July 14, 2009

Every time I see the Google toolbar to see the check the page rank of a web-site (which is a measure of the importance of that page to Google, a score between 1 - 10), I was curious to know how page rank was calculated, given the huge number of web pages indexed by Google, fortunately I got a chance to explore page rank in-depth during my Stanford classes, how it resulted in the birth of Google, the first search engine ever to rank web-pages based on a reputation score.

Page rank of a web page is simple words means its reputation among other web-pages indexed by Google, here reputation is defined as how other highly reputed people think of you, therefore the more you get rated by other highly reputed people, the better your rank will be, in other words, if a web-page is linked by pages with high page rank, then its page rank will increase substantially, its essentially a concept sharing reputation, in other words, if a highly ranked web-page links to your web-page, then your reputation or rank will be increased as you get a high share of reputation.

One of the important problem addressed by page rank system is it answers the question "How to determine the highly reputed web-pages initially", let me give a very simple example of what is called as Naive Page rank with a simple setup and its significance.

Figure 1: A simple Naive Page rank setup with four nodes

In the above setup, there are four nodes (or web-pages), each with an incoming and an outgoing link, to determine the Naive page rank of each node, we have the following conditions.

Page rank of a node = Sum of Page ranks of nodes linking it

Also, if a node has two links, then its page rank is shared between two nodes, or each outgoing link from this node would have a page rank = Page rank of linking node / 2, i.e if a page A has 'n' links, then the page rank of the 'n' web-pages to which node 'A' links will be increased by Page-Rank(A) / n.

Another constraint in the Naive page rank system is that the sum of the page ranks of all nodes = 1, from which one can determine the relative weights of individual web-pages in a normalized form.

Therefore in the above setup, in accordance with the above conditions,

Page-Rank(A) = Page-Rank(D)
Page-Rank(B) = Page-Rank(A)
Page-Rank(C) = Page-Rank(B)
Page-Rank(D) = Page-Rank(C)

Also, we have Page-Rank(A) + Page-Rank(B) + Page-Rank(C) + Page-Rank(D) = 1, hence

Page-Rank(A) = Page-Rank(B) = Page-Rank(C) = Page-Rank(D) = 1/4.

Hope you got the basics of page rank from the above scenario, even though the Naive Page rank is quite important in understanding how web-pages are assigned reputation scores, its has an inherent weakness as well, i.e its vulnerable to link-spam or collusion where a group of nodes link to each other to improve their page rank which will drastically bring down the page rank of other nodes.

For example, in the above scenario, if node B links to node A instead of node C, then A and B can increase their page rank with circular links to each other, in which case we have,

Figure 2: Link spamming or collusion in the above Naive Page rank setup

Page-Rank(A) = Page-Rank(D) + Page-Rank(B)
Page-Rank(B) = Page-Rank(A)
Page-Rank(C) = 0 (Since no page links to C)
Page-Rank(D) = Page-Rank(C) = 0

Also, Page-Rank(A) + Page-Rank(B) + Page-Rank(C) + Page-Rank(D) = 1

Therefore, Page-Rank(A) = Page-Rank(B), hence Page-Rank(A) + Page-Rank(B) = 1, from which we get

Page-Rank(A) = 1/2
Page-Rank(B) = 1/2
Page-Rank(C) = 0
Page-Rank(D) = 0

Hence two nodes linking to each other can drastically bring down the page rank of the other two nodes in the above setup while doubling their page rank, resulting in what is known as link-spam or collusion due to circular links, this problem is addressed by the Page-rank system developed by Larry Page and Sergey Brin which we will explore in a future section, however the Naive-Page rank gives a basic overview of how web-pages are assigned a reputation score.


Copyright © 2008 Prasanna Seshadri, www.prasannatech.net, All Rights Reserved.
No part of the content or this site may be reproduced without prior written permission of the author.