Saturday, March 25, 2006

How long is a piece of string? (ie: The inner workings of the search engine)

Internet search engines are special sites on the Web that are designed to help people find information stored on other sites.

Building the IndexOnce the spiders have completed the task of finding information on Web pages (and we should note that this is a task that is never actually completed -- the constantly changing nature of the Web means that the spiders are always crawling), the search engine must store the information in a way that makes it useful. There are two key components involved in making the gathered data accessible to users:
The information stored with the data
The method by which the information is indexed
In the simplest case, a search engine could just store the word and the URL where it was found. In reality, this would make for an engine of limited use, since there would be no way of telling whether the word was used in an important or a trivial way on the page, whether the word was used once or many times or whether the page contained links to other pages containing the word. In other words, there would be no way of building the ranking list that tries to present the most useful pages at the top of the list of search results.
To make for more useful results, most search engines store more than just the word and URL. An engine might store the number of times that the word appears on a page. The engine might assign a weight to each entry, with increasing values assigned to words as they appear near the top of the document, in sub-headings, in links, in the meta tags or in the title of the page. Each commercial search engine has a different formula for assigning weight to the words in its index. This is one of the reasons that a search for the same word on different search engines will produce different lists, with the pages presented in different orders.
Regardless of the precise combination of additional pieces of information stored by a search engine, the data will be encoded to save storage space. For example, the original Google paper describes using 2 bytes, of 8 bits each, to store information on weighting -- whether the word was capitalized, its font size, position, and other information to help in ranking the hit. Each factor might take up 2 or 3 bits within the 2-byte grouping (8 bits = 1 byte). As a result, a great deal of information can be stored in a very compact form. After the information is compacted, it's ready for indexing.
An index has a single purpose: It allows information to be found as quickly as possible. There are quite a few ways for an index to be built, but one of the most effective ways is to build a hash table. In hashing, a formula is applied to attach a numerical value to each word. The formula is designed to evenly distribute the entries across a predetermined number of divisions. This numerical distribution is different from the distribution of words across the alphabet, and that is the key to a hash table's effectiveness.
In English, there are some letters that begin many words, while others begin fewer. You'll find, for example, that the "M" section of the dictionary is much thicker than the "X" section. This inequity means that finding a word beginning with a very "popular" letter could take much longer than finding a word that begins with a less popular one. Hashing evens out the difference, and reduces the average time it takes to find an entry. It also separates the index from the actual entry. The hash table contains the hashed number along with a pointer to the actual data, which can be sorted in whichever way allows it to be stored most efficiently. The combination of efficient indexing and effective storage makes it possible to get results quickly, even when the user creates a complicated search. What are my favourite SE? (http://computer.howstuffworks.com/search-engine2.htm accessed May 18th 2006)
(Note: It was incredible difficult to summarize this whole process so i used the website "How stuff works". It is incredibly helpful.
Some of the lates search engine news is:
Google is looking at criminal investigation by Brazil because the country claims that it is circulating child pornography. Also, a new york times internet writer was jailed by China. There are many other journalists and writers up for charges admist a crack down on the tightening up of controls and media and freedom of speech. (http://www.topix.net/business/search-engines accessed 18th May 2006)

Posting activities

I love reading the forums and now love writing in them. But it was a dark and lonely road to posting savvy-ness. I got really excited in week 1 or 2 and jumped straight into posting... Big mistake. Naive, I wandered into a zone of opinions and judgment and was absolutely destroyed by a fellow poster-er. This was fine, because it made me more conscious of what I was actually writing... And then I had an epiphany. I started to provoke with what I said. Subtle little things that I hope people will post about. Perhaps in the near future I'll post something incredibly controversial and see what kind of response they can conjure up. hahahahaha! I'm so evil! So post on fellow posters. Live long and post away!

Alphaville review

Adam was right, "Alphaville is not the matrix". Indeed, far from it. Of course only in a film savvy context. The whole concept of a computer controlling a city is amazing considering that the film was produced in 1964. It is a great concept even for today, yet the acting/cinematics are very 1960's. There is perhaps a resonance between Goddards view of the future and us today. Probably not as elaborate but we are more or less run by computers. Just one example of us relying on computers is university life and in particular, this course. Without it I couldnt post to the forums and therefore, probably fail. However its not just students who rely on/cant live without computers. It is also business people the world over. Stock exchange, up to the minute market updates, databases, all rely on computers and technology to exist. So our emotions, feelings and soul are our own, but in one way or another, we all rely on computers and technolgy to live... (comfortably anyway). So Goddard didnt miss the mark by much... And how long before we do rely on computers to tell us how to feel? When to laugh or to cry? When to love or hate?

Tuesday, March 21, 2006

Scavenger hunt

1. What is the weight of the world's biggest pumpkin? I asked "Jeeves" and his answer was: Another record for the world's largest pumpkin. Oregon farmer STEVE DOLETAS grew a pumpkin that weighed 1,180 pounds. Mooter.com offered me no help at all. Seeing 'relevance' is their moto...
2. What is the best way (quickest, most reliable) to contact Grant Hackett? Miami Olympic heated pool, 80 pacific ave, Miami, qld, 4220. I looked into Altavista.com and another blog from this course in the past came up. Sorry Lenny_G. Actually, I'll reference it : Lenny_G (http://www.users.on.net/~lenus/nct/2005/04/nct-week-6.html) accessed 21/3/06
3. What is the length of a giraffe's tongue? A giraffe's tongue is 18 to 20 inches (46 to 50 centimeters) long and blue-black. Thankyou a9.com http://a9.com/length%20of%20giraffes%20tongue
4. How would you define the word "ontology"? What does it really mean? An ontology is a specification of a conceptualization. In philosophy it refers to the subject of existence. Basically : Knowing about knowing... ujiko.com was little help but we got there in the end. http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
5. What was David Cronenberg's first feature film? Shivers... *ba ba boow* (scary horror music) Icerocket.com. cool name. coll search engine. http://horrorlibrary.net/MovieRJC100104.htm
6. When was the original "Hacker's Manifesto" written? originally written in 1989. hotbot.com isn't useless http://www.hotbot.com/default.asp?query=hackers+manifesto+originally+written&ps=&loc=searchbox&tab=web&provKey=Ask+Jeeves 7. Why do all phone numbers in Hollywood films start with "555"? They do it on purpose so that no-one can molest the unsuspecting owner of a number - 555-numbers do not exist. a9.com once again. http://www.bbc.co.uk/dna/h2g2/A812800#footnote5 (this page is very funny)
8. What is the cheapest form of travel from Crete to Rhodes? To walk and swim. or fly by yourself (check out my blog)... I didn't need a search engine for that one.
9. What song was top of the Australian Pop Charts this week in 1965? Over The Rainbow Billy Thorpe & The Aztecs. http://www.onmc.iinet.net.au/top/1965.htm
10. Which Brisbane band was (still is?) Stephen Stockwell a member of? Brisbane punk band Black Assassins http://www.brisbanewritersfestival.com.au/2005/content/standard.asp?name=StockwellS

Thursday, March 16, 2006

Mailing Lists

The association of Internet researches:www.aoir.org/ "The Association of Internet Researchers is an academic association dedicated to the advancement of the cross-disciplinary field of Internet studies. It is a member-based support network promoting critical and scholarly Internet research independent from traditional disciplines and existing across academic borders. The association is international in scope."(The association of Internet researcheswww.aoir.org/ accessed 16/3/2006) With over 1000 subscribers AOIR organsie an annual Internet Research conference, and also publish the Internet Research Annual. Their submissions from users are sometimes very general and of topics that aren't as relevant to my essay.

Institute for new media studies: www.inms.umn.edu/ "The University of Minnesota Institute for New Media Studies is a center for creation, innovation, and examination of content and messages and the affects of new media technologies and techniques on their forms and functions.
The goal is the imagining and testing of innovative forms, development of new knowledge about functions, and generation of greater understanding of the impacts of these changes in the media landscape."('Institute for new media studies' www.inms.umn.edu/ accessed 16/3/2006) Subscribers to this mailing list are offered a variety of research and information. Just one example is the GRAVEL (Game Research And Virtual Environment Plan) project. GRAVEL looks at exploring the structure of game and virtual reality environments. They do this to advance research and better understand the cultural, communicative, aesthetic, technical, and social implications and opportunities these structures provide. This particular mailing list is great for me because the focus of my essay will be Internet video games and their effect on society.

Tuesday, March 14, 2006

Spammy McSpamster: My SPam Dilema

I got busted by AOL when I first signed up with them a year ago. I was setting up my email account and sent myself 500 test emails by accident. They shut me down instantly. Woops! So I rang up the customer helpline and got it sorted out, and an hour after being accused of being a spam mastermind, everything was back to normal. It was interesting to note a particular article in the reader about junk email and spam laws. Authentication seems to be the way of the future. In terms of new spamming laws, check out www.caube.org.au/. Apparently as of the 10th of April 2004 it was "...illegal to send even one unsolicited commercial email..."(Coalition Against Unsolicited Bulk Email www.caube.org.au/ accessed 16/3/2006). However even after said date, I have been receiving more spam e-mail than I know what to do with. Hotmail accounts are particularly vulnerable (only from personal expierience) to unsolicited emails. So much so that I rarely ever check mine because I'm sick of seeing "Make your 'manly bit' larger... All natural!". Hotmail and other email accounts that are bombarded with spam, in my opinion, will probably be abandoned unless something is done to minimise the amount of 'junk' we receive. But, i'm getting off the soap box now 'cause I gotta go to work. Fare thee well

Wednesday, March 08, 2006

Blog ON!

Hey everyone, I thought I'd get excited and change the font and colour for this posting. Cool huh? Yay me. So, I'm still having fun, getting my head into study mode. working 40 hours isn't helping but i'm doing my best. Hey, if there's other disgruntled full-time-worker-students, let me know how ur coping (or not). But enough chit-chat, lets get right down to business. "New comm tech", i hear you ask. "Is it fun?". The answer simply, is yes. Dear reader, i was almost computer illiterate before commencing this course. Now look at me! I am the Blog king! Being techno savvy is the coolest... and new comm tech has broadened my horizons. Any who, i best make like a banana and split, cos those readers ain't going to read themselves. You stay classy, san diego ;)

Tuesday, March 07, 2006

Virgin Blogger!

Gosh, well here I am. A virgin of the blogger world. breaking the metaphorical hymen. Too far? probably. Sorry if I offended anyone... Don't sue me. Hey, lets get on topic. So, new comm tech is fun (listen to me- I've got into the lingo and everything. I abbreviate all of my classes now :) ) Alphaville... mmmmmmm, interesting. toward the end I was starting to get where the movie was going. and adams right, it ain't no matrix (but lets be grown up: thats not the point is it?) ooh, everyones leaving so i guess I better follow suit(im in my first tute). Make like a tree and leave! Do yourself a favour and hug random people. You just gotta dig the warm fuzzy feeling it gives. BYE!!!