Google
 
Web aaaaq.com

Monday, September 26, 2005

Using Math Tools for Web Management

1-2-3-4-5


A mathematician; Alan Turing, is considered the father of modern computer science. Anything to do with computers and the Internet is going to have a lot of mathematics behind it.

Here are some mathematical tools and concepts that are extremely useful for managing web traffic.

Probability Theory

Google and the other major search engines all use the concept of page rank pioneered by Google founders Sergey Brin and Lawrence Page. Page rank is determined using probability theory.

Search Engine Results and Probability Theory.




Zipf Distribution of Web Traffic


Traffic from search engines follow a Zipf distribution. This property can be used to predict traffic to a web site based on that site's rank at in search engine results pages (SERPS). A site ranked number 100, for example, will get 10 times more traffic than a site ranked number 1,000. (In general, site N gets M/N times the traffic of site M.)

Zipf Curves and Website Popularity.

Fractal Mathematics

Internet traffic follows a fractal distribution. This property can be used to make some predictions about future traffic.

Fractal Mathematics as a Risk Analysis Tool

Fibonacci And The Golden Ratio

Many Internet related statistics are mathematically similar to stock prices and other financial data and can be analyzed with the same mathematical tools.

The Golden Ratio and the Internet

Search Engine Results and Probability Theory

1-2-3-4-5


Search Engine Rank

Search engines are the main source of traffic on the web. In other words, most people who are using the web at any given time use a search engine such as Google or Yahoo to find pages.

All the major search engines (Google, Yahoo and MSN) use a complex algorithm to determine in what order to display results. All three are now using the concept of page rank pioneered by Google founders Sergey Brin and Lawrence Page.

Page rank is determined by a probability distribution. Here is a simplified version of how a search engine determines page rank:

  • Look at this page and determine how many other pages link to it
  • Use probability theory to determine how likely a web surfer would be to hit a page by randomly clicking links
  • Repeat the process for as many pages as possible, taking into account that links from more important pages are more likely to result in more random traffic than links from less important pages
This means that a link from a page that has many links to it (cnn.com for example) has more page rank than a site with few links to it (bob-jones-home-page.com for example).

Here is the page at that Google uses to describe how they use page rank now. Here is the original paper on their search engine theory by Sergey Brin and Lawrence Page.

Page rank is not the only factor that search engines use to determine search results but it is an important part of the process in all three.

Obviously, sites that place high in search engine rankings get more traffic than sites that place lower. Just how much more traffic is described by a Zipf Distribution.

Zipf Curves and Website Popularity

1-2-3-4-5

A very important thing to understand about web site traffic is that it follows a Zipf distribution.

Originally the term Zipf's law meant the observation of Harvard linguist George Kingsley Zipf that the frequency of use of the nth-most-frequently-used word in any natural language is approximately inversely proportional to n - for example, the tenth most frequent word is seen about twice as often as the 20th most frequent word, and ten times more often than the 100th most frequent word.



This property can be used to predict traffic to a web site based on that site's rank at in search engine results pages (SERPS). A site ranked number 100, for example, will get 10 times more traffic than a site ranked number 1,000. (In general, site N gets M/N times the traffic of site M.)

Actual results from real world sites follow this prediction closely. However, improving a site's description in search results can improve that site's relative performance.

Links:


The Golden Ratio and the Internet

1-2-3-4-5

Internet traffic follows a fractal distribution. This property can be used to make some predictions about future traffic.

One set of mathematical tools used to make predictions are based on The Golden Ratio.



From Fibonacci And The Golden Ratio:



Mathematicians, scientists, and neutralists have known this ratio for years. It's derived from something known as the Fibonacci sequence, named after its Italian founder, Leonardo Fibonacci (whose birth is assumed to be around 1175 AD and death around 1250 AD). Each term in this sequence is simply the sum of the two preceding terms (1, 1, 2, 3, 5, 8, 13, etc.).


But this sequence is not all that important; rather, it is the quotient of the adjacent terms that possesses an amazing proportion, roughly 1.618, or its inverse 0.618. This proportion is known by many names: the golden ratio, the golden mean, PHI and the divine proportion, among others. So, why is this number so important? Well, almost everything has dimensional properties that adhere to the ratio of 1.618, so it seems to have a fundamental function for the building blocks of nature.




Many Internet related statistics are mathematically similar to stock prices and other financial data that can be analyzed with the help of these tools.



Links:


Fractal Mathematics as a Risk Analysis Tool

1-2-3-4-5

This tool can be applied to analyzing web site traffic, click-throughs, ad performance and many others things. The mathematics of stock prices and traffic at a web site are closely related.

Portfolio managers try to maximize returns for a given level of risk. The problem is that the mathematical model that most portfolio managers use can easily be demonstrated to be very inaccurate.


The risk-reducing formulas behind traditional portfolio theory and risk management theory rely on unfounded premises that cause them to consistently underestimate risk:

  • Price changes and other events are statistically independent of one another.
  • All price changes and other events are distributed in a pattern that conforms to the standard bell curve. The width of the bell shape (as measured by its sigma, or standard deviation) depicts how far price changes diverge from the mean; events at the extremes are considered extremely rare.

Stock prices, interest rates, currency exchange rates, real estate prices, Internet traffic and any other economic and business data that you care to chart are not distributed on a normal curve. They are all fractals. If you are not familiar with this branch of mathematics this site has a introduction to fractals.

The problem with using a bell curves to predict risks is that they vastly underestimate the probability of large fluctuations that frequently occur in real data. By vastly I mean a factor of 1,000,000 or more. For example, the fluctuations in stock price observed when the stock market crashes are predicted to happen perhaps 1 in a 100 billion days by using a normal curve. In real life they occur about 1 in 37,000 days.

Fractal mathematics and chaos theory can't predict the price of stocks next month or whether your website will be down tomorrow but they can accurately predict the risk associated with investing money in stocks or the chance that your site will be down 50 hours in the next 6 months.

The reason that portfolio managers aren't wild about using fractal mathematics is that if many investors would be frightened away if they had an accurate picture of the risks involved. An investment that seems very conservative using traditional risk assessment methods often becomes a reckless gamble when using fractal mathematics.


How can you use this in real life?


In the case of using this approach as a tool to help make decisions, you are risking resources and money and looking for a return in money and resources.


  • Risks of redundant failures of equipment and services are underestimated using traditional models. For example, web server failures are a fractal not a normal curve. This means that the chance of having two web servers set up to redundantly host a site are way more likely to fail at the same time in real life than traditional models would predict. The chance of two failing at once is still small, but if it results in total disaster scenario for your business it might be unacceptably high.

  • There is much less difference between "risky" investments and decisions (such as investing in a startup company) and "safe" investments (such as investing in a mutual fund) than traditional methods suggest.

  • Traditional models under predict risk of "safer" decisions so they overestimate the relative risk of "risky" decisions. This explains why are so many successful risk-takers in the Internet business. Nothing is worse from an investment point of view than taking a risk that is disproportionately high for the return. Underestimating risks or overestimating potential return (or both) result in decisions that cause risk to be high compared with potential returns.

  • Traditionally "risky" ventures usually have a very large potential return. Given that the risk of any venture is large makes "risky" ventures more attractive. Risk is managed by taking many large risks each with a large return instead of one "safe" risk with a small return.


Many "experts" are using this to push such traditionally risky investment such as shorting stocks. Just because traditionally "safe" investments are worse than advertised does not make "risky" investments better ( they are even riskier than advertised). An example is that shorting stocks is still very risky, just less risky compared with mutual funds than traditional analysis indicates.

So where do you invest your money? The simple answer is, "only invest in things that have a positive expected return." I'd recommend finding a real mathematician to help you decide but here are things that are likely to have a positive expected return:

  1. Investments and other decisions that are super conservative (like CD's and savings accounts).
  2. Investments and other decisions with a very large potential return (like startup companies).
In between things tend to have the worst of both worlds, relatively low returns with relatively high risk. Most stock investments including 401(k)'s fall into this category. Fractal analysis predicts the type of losses that Enron employees and others that were heavily invested in 401(k)'s experienced in 2000. Learn from their experience and Mandelbrot's brilliant analysis.

Links:

Tuesday, September 13, 2005

Conversion Measurement and Tracking

A conversion is when a web site visitor at a site takes a desired action. What this action is varies with the purpose of the site but some examples are:

  • Buys a product

  • Fills out a loan application

  • Joins an organization


Good webmasters use tracking software to measure conversions accurately. Accurate conversion information is key to determining how much a visitor is worth to a website owner. Usually conversions are expressed in conversions per thousand. A good conversion rate is 5 conversions per 1000 visitors (often abbreviated 5 CPM).

Click Through Rates

One important thing to measure and monitor for website owners is click through rates (CTR). Click through rate is the percentage of people who click on some desired link. This could be a link to a page with a loan application or a banner ad that takes the visitor out of the site. A good CTR is about 5%.

Good click through rate measurement and monitoring is key to tracking the success of a web site. High CTR is one of the main goals of seach engine optimization (SEO).

Monday, September 12, 2005

What is a Blog? What are Blogs?

I have gotten this question too many times to count in the last year.


A weblog (now more commonly known as a blog) is a web-based publication consisting primarily of periodic articles (normally in reverse chronological order). Although most early weblogs were manually updated, tools to automate the maintenance of such sites made them accessible to a much larger population, and the use of some sort of browser-based software is now a typical aspect of "blogging".

Blogs range in scope from individual diaries to arms of political campaigns, media programs, and corporations. They range in scale from the writings of one occasional author, to the collaboration of a large community of writers. Many weblogs enable visitors to leave public comments, which can lead to a community of readers centered around the blog; others are non-interactive. The totality of weblogs or blog-related websites is often called the blogosphere. When a large amount of activity, information and opinion erupts around a particular subject or controversy in the blogosphere, it is sometimes called a blogstorm or blog swarm.

The format of weblogs varies, from simple bullet lists of hyperlinks, to article summaries or complete articles with user-provided comments and ratings. Individual weblog entries are almost always date and time-stamped, with the newest post at the top of the page, and reader comments often appearing below it. Because incoming links to specific entries are important to many weblogs, most have a way of archiving older entries and generating a static address for them; this static link is referred to as a permalink. The latest headlines, with hyperlinks and summaries, are frequently offered in weblogs in the RSS or Atom XML format, to be read with a feed reader.

The tools for editing, organizing, and publishing weblogs are variously referred to as "content management systems", "publishing platforms", "weblog software", and simply "blogware".

Click here for the full article from Wikipedia, the free encyclopedia>>>

Monday, August 22, 2005

Amyloid precursor protein involved in brain protection

Amyloid precursor protein involved in brain protection


By studying the fruit flies, Belgian investigators have discovered that the normal physiologic function of the amyloid precursor protein (APP) is to stimulate the growth of nerve paths in the brain. This may aid in recovery after brain injury but may also contribute to the development of Alzheimer's disease-like pathology.

"In an ironic twist of evolution," Dr. Hassan concluded, "a protective protein, which probably allows the brain to survive and function robustly as we go through the stressful, productive phases of our lives, produces, as an inevitable consequence of its necessary function, a toxic peptide that impairs us in old age."

Friday, May 27, 2005

New Real Estate Site

Friday, April 29, 2005

Great InformationTechnology Site

Monday, April 11, 2005

New Medical Site

I currently working on a large project to publish and organize hundreds of medical articles. The goal of the site is to add rich links between articles and therefore to increase the total amount of information.

The site connects relationships between the following medical topics:
Diabetes can damage the kidneys and nerves in the urinary and digestive systems. This causes problems such as overactive bladder (urinary problem) and gastroparesis (digestive problem).

High blood pressure can damage the kidneys and is caused by kidney disease.

There are articles explaining treatments for high blood pressure, treatments for kidney failure, and treatments for diabetes and much more.