This blog is subject the DISCLAIMER below.

Saturday, January 31, 2009

dotNetWork.org CodeCamp'09

This time, dotNetwork is preparing for a "bigger-than-the-usual-gatherings" event.This is supposed to be dotNetwork's first major Event, with around 18 sessions, 3 tracks, split over 2 days, which are intended to be on February 19th, 20th, isA.

The agenda has a wide range of topics covering Azure, Velocity, Scrum, BizTalk, SharePoint, C# 4, VS2010, SilverLight, & many other topics..

Speakers list include Stephen Forte, Remi Caron, Marianne Van Wanrooij, Mohamed Meligy, Mohamed  Samy, Yasser Makram, Mohammad Yousry, Hossam Kamel.

I'll be updating you with more soon, but for now check the event page here:
http://codecamp09.eventbrite.com/
& subscribe to its feeds for updates, also check the the Facebook event:
http://www.facebook.com/event.php?eid=63066020983

This is not yet a formal announcement, so I'll be updating you with any news here, too.. so keep tuned..

.. more.

Thursday, January 15, 2009

Intro to Caching,Caching algorithms and caching frameworks part 2

Introduction:

In this part we are going to show how to implement some of the famous replacement algorithms as we mentioned in part 1, the code in this article is just for demonstration purpose which means you will have to do some extra effort if you want to make use of it in your application (if you are going to build your own implementation and wont use any caching frameworks)

The Leftover policy:

After programmer 1 read the article he proceeded to review the comments on this article, one of these comments were talking about leftover policy, which is named “Random Cache”

Random Cache:

I am random cache, I replace any cache entry I want, I just do that and no one can complain about that, you can say the unlucky entry, by doing this I remove any overhead of tracking references or so, am better than FIFO policy, in some cases I perform even better than LRU but in general LRU is better than me.

It is comment time:

While programmer 1 was reading the rest of the comments, he found very interesting comment about implementation of some of the famous replacement policies, actually it was a link to the commenter site which has the actual implementation so programmer 1 clicked the link and here what he got:

Meet the Cache Element:

public class CacheElement {

private Object objectValue;

private Object objectKey;

private int index;

private int
hitCount;
.
. // getters and setters
.
}

This is the cache entry which will use to hold the key and the value; this will be used in all the cache algorithms implementation

Common Code for All Caches:

public final synchronized void addElement(Object key,Object value) {

int index;
Object obj;

// get the entry from the table
obj = table.get(key);

// If we have the entry already in our table
then get it and replace only its value.
if (obj != null) {
CacheElement
element;

element = (CacheElement) obj;
element.setObjectValue(value);
element.setObjectKey(key);

return;
}
}

The above code will be common for all our implementation; it is about checking if the cacheElemnet already exists in our cache, if so then we just need to place its value and we don’t need to make anything else but what if we didn’t find it ? Then we will have to dig deeper and see what will happen below.

The Talk Show:

Today’s episode is a special episode , we have special guests , they are in fact compotators we are going to hear what everyone has to say but first lets introduce our guests:

Random Cache, FIFO Cache

Let’s start with the Random Cache.

Meet Random Cache implementation:

public final synchronized void addElement(Object key,Object value) {

int index;
Object obj;

obj = table.get(key);

if (obj
!= null) {
CacheElement element;

// Just replace the value.
element = (CacheElement) obj;
element.setObjectValue(value);
element.setObjectKey(key);

return;
}

// If we haven't
filled the cache yet, put it at the end.
if (!isFull()) {
index =
numEntries;
++numEntries;
} else {
// Otherwise, replace a random
entry.
index = (int) (cache.length * random.nextFloat());
table.remove(cache[index].getObjectKey());
}

cache[index].setObjectValue(value);
cache[index].setObjectKey(key);
table.put(key, cache[index]);
}

Analyzing Random Cache Code (Talk show):

In today’s show the Random Cache is going to explain the code line by line and here we go.
I will go straight to the main point; if I am not full then I will place the new entry that the client requested at the end of the cache (in case there is a cache miss).

I do this by getting the number of entries that resides in the cache and assign it to index (which will be the index of the current entry the client is adding) after that I increment the number of entries.

if (!isFull()) {
index = numEntries;
++numEntries;
}

If I don’t have enough room for the current entry, I will have to kick out a random entry (totally random, bribing isn’t allowed).

In order to get the random entry, I will use the random util. shipped with java to generate a random index and ask the cache to remove the entry that its index equal to the generated index.

else {
// Otherwise, replace a random entry.
index = (int) (cache.length * random.nextFloat());
table.remove(cache[index].getObjectKey());
}

At the end I just place the entry -either the cache was full or no- in the cache.

cache[index].setObjectValue(value);
cache[index].setObjectKey(key);
table.put(key, cache[index]);

Magnifying the Code:

It is said that when you look at stuff from a near view it is better to understand it, so that’s why we have a magnifying glass and we are going to magnify the code to get more near to it (and maybe understand it more).

Cache entries in the same voice: hi ho, hi ho, into cache we go.

New cache entry: excuse me; I have a question! (Asking a singing old cache entry near to him)

Old cache entry: go ahead.

New cache entry: I am new here and I don’t understand my role exactly, how will the algorithm handle us?

Old cache entry: cache! (Instead of man!), you remind me of myself when I was new (1st time I was added to the cache), I used to ask questions like that, let me show you what will happen.






Meet FIFO Cache Implementation:

public final synchronized void addElement(Object
key,Object value) {
int index;
Object obj;

obj = table.get(key);

if (obj != null) {
CacheElement element;

// Just replace the
value.
element = (CacheElement) obj;
element.setObjectValue(value);
element.setObjectKey(key);

return;
}

// If we haven't
filled the cache yet, put it at the end.
if (!isFull()) {
index =
numEntries;
++numEntries;
} else {
// Otherwise, replace the current
pointer, entry with the new one
index = current;
// in order to make
Circular FIFO
if (++current >= cache.length)
current = 0;

table.remove(cache[index].getObjectKey());
}

cache[index].setObjectValue(value);
cache[index].setObjectKey(key);
table.put(key, cache[index]);
}

Analyzing FIFO Cache Code (Talk show):

After Random Cache, audience went crazy for random cache, which made FIFO a little bit jealous so FIFO started talking and said:

When there is no more rooms for the new cache entry , I will have to kick out the entry at the front (the one came first) as I work in a circular queue like manner, by default the current position is at the beginning of the queue(points to the beginning of the queue).

I assign current value to index (index of the current entry) and then check to see if the incremented current greater than or equals to the cache length(coz I want to reset current –pointer- position to the beginning of the queue) ,if so then I will set current to zero again ,after that I just kick the entry at the index position (Which is the first entry in the queue now) and place the new entry.

else {
// Otherwise, replace the current pointer,
which takes care of
// FIFO in a circular fashion.
index = current;

if (++current >= cache.length)
current = 0;

table.remove(cache[index].getObjectKey());
}

cache[index].setObjectValue(value);
cache[index].setObjectKey(key);
table.put(key, cache[index]);

Magnifying the Code:

Back to our magnifying glass we can observe the following actions happening to our entries


Conclusion:

As we have seen in this article how to implement the FIFO replacement policy and also Random replacement policy, in the upcoming articles we will try to take our magnifying glass and magnify LFU, LRU replacement policy, till then stay tuned ;)

.. more.

Sunday, January 11, 2009

STOP!!!!! Please, I need to know what BizTalk is!

This is what I heard from someone when I told him the famous definition of BizTalk which is "BizTalk is a business process management (BPM) server that enables companies to automate and optimize business processes. This includes powerful and, familiar tools to design, develop, deploy, and manage business processes."

He said,"haaaaa????!!!!!"


Trying to explain, I talked to him about the world-wide need of many organizations to integrate their enterprise applications. I told him some concepts like(Service-Oriented Architecture (SOA), Message-Oriented Middleware (MOM), methodologies (Business Process Management-BPM, Business Process Reengineering) and standards (XML, XSD, Rosetta.NET) as well as the advent of emerging technologies such as Web Services stack of protocols (WS-*), Enterprise Service Bus (ESB) and others, then vendors have released their own solutions and thus organizations have achieved their challenging goals of automation of business process by the integration of their Information Systems. In this arena, we can find fundamentally two important products: Microsoft BizTalk Server 2004/2006 and Oracle SOA Suite."

I thought he said inside him, "Ummmm…..seem that someone will be killed tonight"


I told him, "Wait man , I got a very simple definition of BizTalk Server which is Microsoft's central platform for Enterprise Application Integration (EAI) and Business Process Management (BPM) that embodies the integration and automation capabilities of XML and Web Services technologies. "

He screamed and said, "STOP!!!!!"

And carrying and said in a weak voice "Please, I need to know what BizTalk is!"

After I told him I know how can I explain this, he calmed down. I found that the most important terms that BizTalk depend on are: Business Process Management (BPM), Enterprise Application Integration (EAI) and finally, Service Oriented Architecture (SOA).

I told him , Man what if I told you that "What will you say that if the organization need a disciplined approach to identify, design, execute, document, monitor, control, and measure both automated and non-automated business processes to achieve consistent, targeted results consistent with an organization's strategic goals."

He said, "Of course this approach will help them to better manage their business processes"

I told him, "This is the BPM, beside there are many vendors who created application suites which enable organizations to better manage their business processes. These technologies typically involve tools to visually design and model business processes; simulate and test business processes; automate, control and measure business processes; and provide feedback and reporting on process performance. Some vendors have combined these functions into business process management suites that provide a complete integrated BPM platform, commonly referred to as a BPMS"


I GOT A POINT

I told him again, and what if I told you that many organizations have a large number of legacy systems, typically designed to support specific functions such as manufacturing or sales. In order to manage the end-to-end work involved in business processes, a BPMS must be able to integrate with legacy systems across the organization in order to control work, get information or measure performance. A variety of new technologies have emerged to simplify integration efforts and the technology industry appears to be standardizing on a specific set of open technologies, commonly referred to as Web Services. And By leveraging web services, organizations can build and manage end-to-end business processes across organizational silos and their legacy systems."

He said, "Oahu…this is a good architecture ".

Note: he said the term architecture

I told him this architecture called means a common framework for how these technologies are deployed is also being adopted, most often referred to as a Service Oriented Architecture (SOA).

I GOT ANOTHER POINT


Finally, I told him again, and what if I told you that EAI is the unrestricted sharing of data and business processes throughout the networked applications or data sources in an organization. You know that early software programs in areas such as inventory control, human resources, sales automation and database management were designed to run independently, with no interaction between the systems. They were custom built in the technology of the day for a specific need being addressed and were often proprietary systems. As enterprises grow and recognize the need for their information and applications to have the ability to be transferred across and shared between systems, companies are investing in EAI in order to streamline processes and keep all the elements of the enterprise interconnected."

He said, "Well this is a good thing to help my enterprise applications to run independently"

Congratulation Sir!!!! You had understood the BizTalk server right now.

Conclusion:

Then after this story we can say that BizTalk is a product developed by Microsoft to enable companies to automate and optimize business processes. This includes powerful, familiar tools to design, develop, deploy, and manage those processes (BPM). BizTalk not only used to integrate enterprise applications with each other (SOA) and (EAI), but also automate, control and measure business processes; and provide feedback and reporting on process performance and define and run your business rules dynamically.

.. more.

dotNetWork.org Jan'09 gathering

Date:

Saturday, January 24th 2009,
12:00 - 16:30

Attendance is FREE =)

Speakers
Mohmaed Meshref
SDT | SQL Server Team
Microsoft

Agenda
Inside SQL Server Engine!
Agenda:
12:00 - 14:00
SQL Server Architecture
◦ Protocols
◦ Relational Engine
◦ Storage Engine
◦ SQL OS
◦ Memory

14:30-16:30
Query Processing
◦ Iterators
◦ Rows Access Methods
◦ Joins
◦ Aggregation


Location:
Canadian International College, @ "El-Tagamo3 El-5ames"

Buses will be available at: Nady El-Sekka (11:00 AM - 11:30 AM)
Please be there before 11:30 coz we will leave on time..

For those who wanna keep tuned with further news about this event & other upcoming events of the dotnetwork.org user group, please check the following links:

Yahoo!Group:
http://tech.groups.yahoo.com/group/dotnetworkorg/

Facebook Event:
http://www.facebook.com/event.php?eid=60432032253
You'd better join the above event if you have a facebook account so we can roughly estimate the attendees count..

Facebook Fan Page:
http://www.facebook.com/pages/netWorkorg/13135685545

Facebook Group:
http://www.facebook.com/group.php?gid=2409268236

.. more.

Tuesday, January 06, 2009

Lisp, the ultimate programming experience

It is not a yet-another-programming-language article. Lisp is so different from other “normal” programming languages. One of the core differences that it has “no syntax”.

Of course that claim “no syntax” has a double meaning. There is no programming language without a syntax. To get the trick you have to know how compilers work first.

Compilers first groups the input character stream into tokens (lexical analysis) and then parses the token stream into an abstract syntax tree intermediate representation. Then it either executes the statements (interpreted language) or generates machine instructions (compiled language)*.

The difference is that in Lisp, you directly write the abstract syntax tree. You also can manipulate it (“macros”). And you also can affect how the compiler does the lexical analysis (“reader macros”) !

Since you directly type the abstract syntax tree, there is no need for operator precedence rules. There is also no need for predefined/fixed operators; you can change or add any operator you want. There is also no limit on the number of the parameters of an operator (in Lisp the operator '+' is a function name and can take any number of arguments, not just two).

In Lisp, since you write the syntax tree directly, there is no need for restrictions on identifier names; identifiers (“symbols”) can be made up of any sequence of character (given you provide appropriate escape sequences when necessary (mostly only on space, and parentheses). Any number not valid in the current compile radix is a valid symbol : 1027F is valid identifier under base 10 but not under base 16. (you can change the current compiler radix by code). %@!$^%^@$%^&** is valid identifier too. In fact +,-,*,/ are all function names not “hardcoded-operators”. And you can rebind them too to any other function.

Imagine the possibilities with access to the abstract syntax tree. You can build your own programming language on top of Lisp. Hence Lisp is called the “programmable programming language”. Even the CLOS (Common Lisp Object System); the Lisp's Object Oriented implementation is implemented in pure Common Lisp macros.

Lisp can parse and evaluate Lisp expression if seven fundamental operations are provided by an underlying system. In fact Lisp was originally a mathematical model of computations of very high abstraction level, the counter part of a Turing machine model of computations. It was originally designed to write algorithms on paper, and was never meant to be implemented. It is when a student implemented those seven fundamental operations in machine code when Lisp was first executable around 50s. When the student first done; the professor who invented Lisp wasn't happy because Lisp wasn't mean to be executed; it was very much like executing mathematics.

Lisp was very popular in the research field, esp. for it's fast prototyping and dynamic programming. At one point of time programming language researchers were trying a new compiler every day using Lisp as a development platform. Lisp was so popular that it had it's own hardware that did execute Lisp instead of machine language (lookup Lisp Machines on wikipedia). But that hardware lost the battle for the much faster general purpose computers.

Lisp has many features from many decades that didn't seem to appear but in late 90 and early 21st century. One of those features is “closures”. Which are being discussed at this moment for addition in C++0x and OpenJDK 7.0. Lisp is also the first language to ever incorporate garbage-collection (I think around 80s if not 70s). But then on the hardware available it took hours to run the GC once, so Lisp was usually considered impractical. One funny thing about Lisp GC, people used to use it at day and leave do GC all night till they come back :D :D

Lisp is so fundamental that any language that ever achieves the power of Lisp is only yet-another-Lisp-dialect. Lisp is different is that it has nothing called “statements”. It's all composed of expressions. (Expression always evaluates to something unlike a statement**)

The current Lisp parenthesized notation is the same for instruction and data (of any structure). One of the funny things is that the current Lisp parenthesized notation was meant only for the underlying representation, and another FORTRAN-like syntax was going to be invented for humans but surprisingly the underlying representation won the race !!

+ A point of interest: Lisp is related somehow to the Lambda calculus. Because it's highly functional-oriented languages. Although it is not purely functional language because it allows destructive operations.

++ Another point of interest: I don't consider "Lisp" a language per se but a class of languages that contains all the Lisp dialects. Some stuff of this article only applies for the Common Lisp dialect. Scheme is also another Lisp dialect.

+++ Most Lisp compilers are compilers and not interpreters. Please don't comment that Lisp is slow.

* There is another type which generates an intermediate machine-independent language that would be converted later into specific machine dependent instructions using JIT (Just-In-Time compiler). Or it can be further interpreted. Like Java byte-code or .NET MSIL.

** if(x) 1; 0; is a statement; we can't say y = if(x) 1; 0;. Unlike y = x?1:0. So x?1:0 is an expression.

Sources: Paul Graham's articles (look up Google). Also read his great book : On Lisp. For the compiler every day thing look up "the Lambda papers" to know what the research was about (http://library.readscheme.org/page1.html).

Some people noticed that all languages by incorporating dymanic programming, closures (lexical scopes), GC, and functional programming are eventually turning into Lisp (which were made at the 50s). It's an interesting debate, but I personally see that being a Lisp needs such a transparent access to the syntax tree which is not usuably achievable unless you do something similar to the current Lisp syntax; that makes it yet-another-Lisp-dialect. Here you go: the famous Egg and Chicken paradox :D

If you are geek like me you'd enjoy this "blah blah" programming language history article :)

.. more.

Sunday, January 04, 2009

Intro to Caching,Caching algorithms and caching frameworks part 1

Introduction:

A lot of us heard the word cache and when you ask them about caching they give you a perfect answer but they don’t know how it is built, or on which criteria I should favor this caching framework over that one and so on, in this article we are going to talk about Caching, Caching Algorithms and caching frameworks and which is better than the other.

The Interview:

"Caching is a temp location where I store data in (data that I need it frequently) as the original data is expensive to be fetched, so I can retrieve it faster. "

That what programmer 1 answered in the interview (one month ago he submitted his resume to a company who wanted a java programmer with a strong experience in caching and caching frameworks and extensive data manipulation)

Programmer 1 did make his own cache implementation using hashtable and that what he only knows about caching and his hashtable contains about 150 entry which he consider an extensive data(caching = hashtable, load the lookups in hashtable and everything will be fine nothing else) so lets see how will the interview goes.

Interviewer: Nice and based on what criteria do you choose you caching solution?

Programmer 1 :huh, (thinking for 5 minutes) , mmm based on, on , on the data (coughing…)

Interviewer: excuse me! Could you repeat what you just said again?

Programmer 1: data?!

Interviewer: oh I see, ok list some caching algorithms and tell me which is used for what

Programmer 1: (staring at the interviewer and making strange expressions with his face, expressions that no one knew that a human face can do :D )

Interviewer: ok, let me ask it in another way, how will a caching behave if it reached its capacity?

Programmer 1: capacity? Mmm (thinking… hashtable is not limited to capacity I can add what I want and it will extend its capacity) (that was in programmer 1 mind he didn’t say it)

The Interviewer thanked programmer 1 (the interview only lasted for 10minutes) after that a woman came and said: oh thanks for you time we will call you back have a nice day
This was the worst interview programmer 1 had (he didn’t read that there was a part in the job description which stated that the candidate should have strong caching background ,in fact he only saw the line talking about excellent package ;) )

Talk the talk and then walk the walk

After programmer 1 left he wanted to know what were the interviewer talking about and what are the answers to his questions so he started to surf the net, Programmer 1 didn’t know anything else about caching except: when I need cache I will use hashtable
After using his favorite search engine he was able to find a nice caching article and started to read.

Why do we need cache?

Long time ago before caching age user used to request an object and this object was fetched from a storage place and as the object grow bigger and bigger, the user had spend more time to fulfill his request, it really made the storage place suffer a lot coz it had to be working for the whole time this caused both the user and the db to be angry and there were one of 2 possibilities

1- The user will get upset and complain and even wont use this application again(that was the case always)

2- The storage place will backup its bags and leave your application , and that made a big problems(no place to store data) (happened in rare situations )

Caching is a god sent:

After few years researchers at IBM (in 60s) introduced a new concept and named it “Cache”

What is Cache?

Caching is a temp location where I store data in (data that I need it frequently) as the original data is expensive to be fetched, so I can retrieve it faster.

Caching is made of pool of entries and these entries are a copy of real data which are in storage (database for example) and it is tagged with a tag (key identifier) value for retrieval.
Great so programmer 1 already knows this but what he doesn’t know is caching terminologies which are as follow:



Cache Hit:

When the client invokes a request (let’s say he want to view product information) and our application gets the request it will need to access the product data in our storage (database), it first checks the cache.

If an entry can be found with a tag matching that of the desired data (say product Id), the entry is used instead. This is known as a cache hit (cache hit is the primary measurement for the caching effectiveness we will discuss that later on).
And the percentage of accesses that result in cache hits is known as the hit rate or hit ratio of the cache.

Cache Miss:

On the contrary when the tag isn’t found in the cache (no match were found) this is known as cache miss , a hit to the back storage is made and the data is fetched back and it is placed in the cache so in future hits it will be found and will make a cache hit.

If we encountered a cache miss there can be either a scenarios from two scenarios:

First scenario: there is free space in the cache (the cache didn’t reach its limit and there is free space) so in this case the object that cause the cache miss will be retrieved from our storage and get inserted in to the cache.

Second Scenario: there is no free space in the cache (cache reached its capacity) so the object that cause cache miss will be fetched from the storage and then we will have to decide which object in the cache we need to move in order to place our newly created object (the one we just retrieved) this is done by replacement policy (caching algorithms) that decide which entry will be remove to make more room which will be discussed below.

Storage Cost:

When a cache miss occurs, data will be fetch it from the back storage, load it and place it in the cache but how much space the data we just fetched takes in the cache memory? This is known as Storage cost

Retrieval Cost:

And when we need to load the data we need to know how much does it take to load the data. This is known as Retrieval cost

Invalidation:

When the object that resides in the cache need is updated in the back storage for example it needs to be updated, so keeping the cache up to date is known as Invalidation.
Entry will be invalidate from cache and fetched again from the back storage to get an updated version.

Replacement Policy:

When cache miss happens, the cache ejects some other entry in order to make room for the previously uncached data (in case we don’t have enough room). The heuristic used to select the entry to eject is known as the replacement policy.

Optimal Replacement Policy:

The theoretically optimal page replacement algorithm (also known as OPT or Belady’s optimal page replacement policy) is an algorithm that tries to achieve the following: when a cached object need to be placed in the cache, the cache algorithm should replace the entry which will not be used for the longest period of time.

For example, a cache entry that is not going to be used for the next 10 seconds will be replaced by an entry that is going to be used within the next 2 seconds.

Thinking of the optimal replacement policy we can say it is impossible to achieve but some algorithms do near optimal replacement policy based on heuristics.
So everything is based on heuristics so what makes algorithm better than another one? And what do they use for their heuristics?

Nightmare at Java Street:

While reading the article programmer 1 fall a sleep and had nightmare (the scariest nightmare one can ever have)

Programmer 1: nihahha I will invalidate you. (Talking in a mad way)

Cached Object: no no please let me live, they still need me, I have children.

Programmer 1: all cached entries say that before they are invalidated and since when do you have children? Never mind now vanish for ever.

Buhaaahaha , laughed programmer 1 in a scary way, ,silence took over the place for few minutes and then a police serine broke this silence, police caught programmer 1 and he was accused of invalidating an entry that was still needed by a cache client, and he was sent to jail.

Programmer 1 work up and he was really scared, he started to look around and realized that it was just a dream then he continued reading about caching and tried to get rid of his fears.

Caching Algorithms:

No one can talk about caching algorithms better than the caching algorithms themselves

Least Frequently Used (LFU):

I am Least Frequently used; I count how often an entry is needed by incrementing a counter associated with each entry.

I remove the entry with least frequently used counter first am not that fast and I am not that good in adaptive actions (which means that it keeps the entries which is really needed and discard the ones that aren’t needed for the longest period based on the access pattern or in other words the request pattern)

Least Recently Used (LRU):

I am Least Recently Used cache algorithm; I remove the least recently used items first. The one that wasn’t used for a longest time.

I require keeping track of what was used when, which is expensive if one wants to make sure that I always discards the least recently used item.
Web browsers use me for caching. New items are placed into the top of the cache. When the cache exceeds its size limit, I will discard items from the bottom. The trick is that whenever an item is accessed, I place at the top.

So items which are frequently accessed tend to stay in the cache. There are two ways to implement me either an array or a linked list (which will have the least recently used entry at the back and the recently used at the front).

I am fast and I am adaptive in other words I can adopt to data access pattern, I have a large family which completes me and they are even better than me (I do feel jealous some times but it is ok) some of my family member are (LRU2 and 2Q) (they were implemented in order to improve LRU caching

Least Recently Used 2(LRU2):

I am Least recently used 2, some people calls me least recently used twice which I like it more, I add entries to the cache the second time they are accessed (it requires two times in order to place an entry in the cache); when the cache is full, I remove the entry that has a second most recent access. Because of the need to track the two most recent accesses, access overhead increases with cache size, If I am applied to a big cache size, that would be a problem, which can be a disadvantage. In addition, I have to keep track of some items not yet in the cache (they aren’t requested two times yet).I am better that LRU and I am also adoptive to access patterns.

-Two Queues:

I am Two Queues; I add entries to an LRU cache as they are accessed. If an entry is accessed again, I move them to second, larger, LRU cache.

I remove entries a so as to keep the first cache at about 1/3 the size of the second. I provide the advantages of LRU2 while keeping cache access overhead constant, rather than having it increase with cache size. Which makes me better than LRU2 and I am also like my family, am adaptive to access patterns.

Adaptive Replacement Cache (ACR):

I am Adaptive Replacement Cache; some people say that I balance between LRU and LFU, to improve combined result, well that’s not 100% true actually I am made from 2 LRU lists, One list, say L1, contains entries that have been seen only once “recently”, while the other list, say L2, contains entries that have been seen at least twice “recently”.

The items that have been seen twice within a short time have a low inter-arrival rate, and, hence, are thought of as “high-frequency”. Hence, we think of L1as capturing “recency” while L2 as capturing “frequency” so most of people think I am a balance between LRU and LFU but that is ok I am not angry form that.

I am considered one of the best performance replacement algorithms, Self tuning algorithm and low overhead replacement cache I also keep history of entries equal to the size of the cache location; this is to remember the entries that were removed and it allows me to see if a removed entry should have stayed and we should have chosen another one to remove.(I really have bad memory)And yes I am fast and adaptive.

Most Recently Used (MRU):

I am most recently used, in contrast to LRU; I remove the most recently used items first. You will ask me why for sure, well let me tell you something when access is unpredictable, and determining the least most recently used entry in the cache system is a high time complexity operation, I am the best choice that’s why.

I am so common in the database memory caches, whenever a cached record is used; I replace it to the top of stack. And when there is no room the entry on the top of the stack, guess what? I will replace the top most entry with the new entry.

First in First out (FIFO):

I am first in first out; I am a low-overhead algorithm I require little effort for managing the cache entries. The idea is that I keep track of all the cache entries in a queue, with the most recent entry at the back, and the earliest entry in the front. When there e is no place and an entry needs to be replaced, I will remove the entry at the front of the queue (the oldest entry) and replaced with the current fetched entry. I am fast but I am not adaptive

-Second Chance:

Hello I am second change I am a modified form of the FIFO replacement algorithm, known as the Second chance replacement algorithm, I am better than FIFO at little cost for the improvement. I work by looking at the front of the queue as FIFO does, but instead of immediately replacing the cache entry (the oldest one), i check to see if its referenced bit is set(I use a bit that is used to tell me if this entry is being used or requested before or no). If it is not set, I will replace this entry. Otherwise, I will clear the referenced bit, and then insert this entry at the back of the queue (as if it were a new entry) I keep repeating this process. You can think of this as a circular queue. Second time I encounter the same entry I cleared its bit before, I will replace it as it now has its referenced bit cleared. am better than FIFO in speed

-Clock:

I am Clock and I am a more efficient version of FIFO than Second chance because I don’t push the cached entries to the back of the list like Second change do, but I perform the same general function as Second-Chance.

I keep a circular list of the cached entries in memory, with the "hand" (something like iterator) pointing to the oldest entry in the list. When cache miss occurs and no empty place exists, then I consult the R (referenced) bit at the hand's location to know what I should do. If R is 0, then I will place the new entry at the "hand" position, otherwise I will clear the R bit. Then, I will increment the hand (iterator) and repeat the process until an entry is replaced.I am faster even than second chance.

Simple time-based:

I am simple time-based caching; I invalidate entries in the cache based on absolute time periods. I add Items to the cache, and they remain in the cache for a specific amount of time. I am fast but not adaptive for access patterns

Extended time-based expiration:

I am extended time based expiration cache, I invalidate the items in the cache is based on relative time periods. I add Items the cache and they remain in the cache until I invalidate them at certain points in time, such as every five minutes, each day at 12.00.

Sliding time-based expiration:

I am Sliding time-base expiration, I invalidate entries a in the cache by specifying the amount of time the item is allowed to be idle in the cache after last access time. after that time I will invalidate it . I am fast but not adaptive for access patterns

Ok after we listened to some replacement algorithms (famous ones) talking about themselves, some other replacement algorithms take into consideration some other criteria like:

Cost: if items have different costs, keep those items that are expensive to obtain, e.g. those that take a long time to get.

Size: If items have different sizes, the cache may want to discard a large item to store several smaller ones.

Time: Some caches keep information that expires (e.g. a news cache, a DNS cache, or a web browser cache). The computer may discard items because they are expired. Depending on the size of the cache no further caching algorithm to discard items may be necessary.

The E-mail!

After programmer 1 did read the article he thought for a while and decided to send a mail to the author of this caching article, he felt like he heard the author name before but he couldn’t remember who this person was but anyway he sent him mail asking him about what if he has a distributed environment? How will the cache behave?

The author of the caching article got his mail and ironically it was the man who interviewed programmer 1 :D, The author replied and said :

Distributed caching:

*Caching Data can be stored in separate memory area from the caching directory itself (who handle the caching entries and so on) can be across network or disk for example.

*Distrusting the cache allows increase in the cache size.

*In this case the retrieval cost will increase also due to network request time.

*This will also lead to hit ratio increase due to the large size of the cache

But how will this work?

Let’s assume that we have 3 Servers, 2 of them will handle the distributed caching (have the caching entries), and the 3rd one will handle all the requests that are coming (Which asks about cached entries):

Step 1: the application requests keys entry1, entry2 and entry3, after resolving the hash values for these entries, and based on the hashing value it will be decided to forward the request to the proper server.

Step 2: the main node sends parallel requests to all relevant servers (which has the cache entry we are looking for).

Step 3: the servers send responses to the main node (which sent the request in the 1st place asking to the cached entry).

Step 4: the main node sends the responses to the application (cache client).

*And in case the cache entry were not found (the hashing value for the entry will be still computed and will redirect either to server 1 or server 2 for example and in this case our entry wont be found in server 1 so it will fetched from the DB and added to server 1 caching list.


Measuring Cache:

Most caches can be evaluated based on measuring the hit ratio and comparing to the theoretical optimum, this is usually done by generation a list of cache keys with no real data, but here the hit ratio measurement assumes that all entries have the same retrieval cost which is not true for example in web caching the number of bytes the cache can server is more important than the number of hit ration (for example I can replace the big entry will 10 small entries which is more effective in web)

Conclusion:

We have seen some of popular algorithms that are used in caching, some of them are based on time, cache object size and some are based on frequency of usage, next part we are going to talk about the caching framework and how do they make use of these caching algorithms, so stay tuned ;)

.. more.