Planet Larry

July 03, 2008

Jürgen Geuter

Decentralization of services

Identi.ca is creating quite the buzz today and (with me moving servers with my email) some parallels pop into the mind (with "the" meaning "my" but it sounds way less subjective that way ;-) ).

Identi.ca is a microblogging service like the omnipresent Twitter but instead of being yet another clone trying the same thing (just with a sane uptime probably) they did change a few things: Identi.ca is based on laconi.ca, a open source software and when signing up at Identi.ca you agree to make all your data (except your personal login data of course) usable under a creative commons license.

This is a change to most other social networks where locking the user in is the main goal (make leaving so expensive/annoying that people don't, no matter what you do).

Another change is that Identi.ca/laconi.ca tries establishing a standard for decentralized microblogging called OpenMicroBlogging Specification which would allow all the different services to interact: You could answer someone on twitter from your own laconi.ca installation if twitter implemented the standard (considering that its against their interest and they can't even keep their service up for an hour that chance ain't big ;-) ).

I'm a huge fan of decentralized services because it allows me to keep my data where I can control them. I can keep my mails on a computer that I can control, where I know that no one looks into the data I want no one to look into. Also it solves the "single point of failure" problem: When one of the nodes goes down it doesn't kill the whole network. Those are good principles and the basis of many of the services on the internet that we rely on. But it can also bring with it problems.

The good thing is that we have a lot of experience with decentralized services that are super popular and we know many of the problems that emerge. Let's look at examples.

The most known and most used service is probably the "web". It's decentralized cause everyone can setup his own server and can publish his or her stuff to the world. The problems we are having are mostly ones of man-in-the-middle-attacks: People make their page look like a big bank's login page and gather valid login credentials for that bank. It's hard to know whether some page is what it claims to be (especially in connection with bugs in browsers or problems with the protocol that make hiding the actual origin of a page hard to discover.

Another very popular service that works decentralized is "email". I myself administer a bunch of email servers and many many people and companies do the same (I never understood how people could have their confidential company email on some remote server they have no real access to and no control over). The downside is known as "spam", unwanted email usually with advertising in it. Since everyone can setup an email server/everyone can send mails it's easy to turn the big botnet (Windows computers) into one huge spam sender.

Then there's XMPP/Jabber which is decentralized instant messaging: Everybody can run his own Jabber server but can still communicate with people from other servers. Spam is not yet a big problem (probably because Jabber is still too small and because many people don't accept messages from people they have not added to their lists).

Decentralization always has to deal with the problem of people abusing the system to get noise in. Whether that noise is just some 15 year old kid harassing people or some 30 year old guy trying to sell penis-pills doesn't matter: A decentralized system has to face the fact that unwanted entities will enter the net.

With services that work IM-ish the problem can be dealt with easily: Just accept data when the recipient has actively whitelisted that sender. By adding him to his list or whatever. This doesn't really work with email since getting mails from people you have not yet whitelisted is a big part of the whole thing: You get inquiries about jobs or stuff you posted on your blog or a hint about a bug in software you wrote or maybe just a mail by a relative or old friend that stumbled on your email address by accident. Making email require whitelisting would pretty much kill the whole thing.

The problem of authentification is still pretty much unsolved. We have SSL certificates and enhanced certificates that are hard to get and that certify ... well often not much. Having a https connection is cool and all cause of the encryption but what do those certificates mean really? They give me a green bar if some weird certificate authority says that the certs are good. Who certifies that the certificate authority did their job properly? I can sign whatever SSL cert I want and so can anyone else. The problem of trust is still unsolved.

Those are the issues that Identi.ca will have to deal with. How do they see their service? Like email where you pretty much accept anything cause it might be valid? Or like IM where you keep a whitelist?

On the other hand it's good to see someone tackle the whole microblogging thingy with a different focus, all those Twitter clones were really starting to get boring.

On a different topic: Has anyone made a deep analysis on spam in the Jabber network? How much you can send out, how hard it gets and how successful it is (as in clickrate on the spam)? With AOL opening their services via Jabber gateways that topic might come sooner than we thought it would. How is the Jabber protocol built? Does the actual server not accept message to me when I don't have the people added to my list if I have set that setting in my client or is that just a client thing? How long till Jabber contacts will start offering to enlarge my penis?

July 03, 2008 01:12 PM :: Germany  

Alex Bogak

Opera 9.5 is 66% faster than IE7 - and I have numbers to prove it!

Hi All

I just found an interesting study, that compares cache efficiency of Opera, IE7, FF3 and Safari (on Windows) and finds, that Opera is the best one, leading 66% over IE7. From the site:
As it is clear from the results, Opera 9.5 caches web content most effectively, performing 3-times less disk operations that Internet Explorer 7. FireFox 3 coming on the second place with a minor -12.87% disadvantage. Safari 3.1.2 is on the third place with 6,991 accumulated disk operations. Internet Explorer 7 is coming on the last place with a huge -66.63% disadvantage relative to the Opera web browser.
Read full post here

Maybe I should consider using Opera now?

July 03, 2008 01:09 PM :: Israel  

Brian Carper

Still no news from Westinghouse

I filed a complaint against Westinghouse to the BBB, but two weeks later they haven't responded to it yet. Can't say I'm surprised. The good thing is that if they don't respond to the BBB, they get a big fat stinking black mark on their BBB profile. Might not do that much good, but it hopefully will let others know to avoid them.

I've gotten some posts from some people in the same situation of Westinghouse blowing them off and not sending them replacement items when their crappy merchandise breaks. Some people want to file a class-action lawsuit. I don't know if I'd go that far. I may take them to small claims court though. Before that, I'm going to write a letter to their corporate office, pointing out the existence of my website.

In the meantime, conclusions:

  • Westinghouse doesn't give half a crap about its customers.
  • Never ship anything expensive via UPS. They're happy to leave $500 merchandise sitting on your front porch (assuming it even made it to my front porch and the UPS guy didn't just keep it, how would I know? I never saw it).
  • Beware buying things online. When they break, your options for getting them fixed or covered by warranty are limited, impractical, and prone to months of aggravation. Is it worth the time you save buying things online, when you have to spend six months going through the kind of bullcrap I went through over this monitor?

This is post is just a friendly reminder to everyone reading this, don't buy anything from Westinghouse, and tell everyone you know the same thing. I wonder if Westinghouse cares about all of the people whose business they flushed down the toilet by ripping me off?

(Read the whole crappy story of Westinghouse's dishonesty and horrible customer service: The beginning, Update 1, Update 2, Update 3, Update 4, Update 5, Update 6.)

July 03, 2008 06:46 AM :: Pennsylvania, USA  

Zeth

Python and TCL

I have had to use the TCL programming language recently, I don't know it well yet, and I have found the quickest way at the moment is to prototype in Python and then edit it into TCL code. This way I know the logic is sound, and therefore logic errors are not mixed in with syntax errors.

in the following example, I had a sequential list of numbers in TCL (which were unique ids of XML elements), and for a given number I had to find the nearest numbers on either side.

"""Nearest Neighbours in a list of numbers."""

def nearestneighbours(numlist, number):
    """For a given number, find the nearest lower and higher numbers in
    a given (ordered) list of numbers."""
    left = None
    right = float('inf')
    for i in numlist:
        if i < number and i > left:
            left = i
        if i > number and i < right:
            right = i

    return (left, right)

def main():
    """Demo when called directly."""
    mylist = [58163, 62140, 66139, 70280, 74371,
              78525, 82426, 86584, 90650, 94749]

    number = 67000
    lower, higher = nearestneighbours(mylist, number)
    print "Lower:", lower
    print "Higher:", higher

if __name__ == "__main__":
    main()

We have the function working as we want to, so now we can try to rewrite the code into TCL:

# Nearest Neighbours in a list of numbers.

proc nearestneighbours {numlist number} {
    # For a given number, find the nearest lower and higher numbers in
    # a given (ordered) list of numbers.
    set left 0
    set right 1000000000

    foreach i $numlist {
        if {[expr $i < $number]} {if {[expr $i > $left]} {set left $i}} elseif {
        [expr $i > $number]} {if {[expr $i < $right]} {set right $i}}
    } ;# end foreach

    set nearest [list $left $right]
    return $nearest
    } ;# end proc findnearest

proc main {} {
    # Demo when called directly.
    set mylist [list "58163" "62140" "66139" "70280" "74371" "78525" "82426" "86584"
    "90650" "94749"]
    set number 67000
    set highlow [nearestneighbours $mylist $number]

    puts "Lower: [lindex $highlow 0]"
    puts "Higher: [lindex $highlow 1]"
    } ;# end proc main

main

This works great.

However, I wrote the above Python code in a verbose way because I was sure I could replicate it in TCL, in a Python program, I can just use the Python list's sort method to find the neighbours.

def nearestneighbours(numlist, number):
    """For a given number, find the nearest lower and higher numbers in
    a given (ordered) list of numbers."""

    numlist.append(number)
    numlist.sort()
    return(numlist[numlist.index(number)-1],
           numlist[numlist.index(number)+1])

This works exactly the same as the much more long winded version at the start of this post. How does one do this in TCL? Well rewriting the Python gives us:

proc nearestneighbours {numlist number} {
    # For a given number, find the nearest higher and lower numbers in
    # a given (ordered) list of numbers.

    lappend numlist $number
    set numlist [lsort -integer $numlist]
    return [list [lindex $numlist [expr [lsearch $numlist $number] -1]]
                    [lindex $numlist [expr [lsearch $numlist $number] +1]]]

   } ;# end proc findnearest

This seems to work fine too, which is the preferred TCL way, I'm not sure.

Discuss this post - Leave a comment

July 03, 2008 03:24 AM :: West Midlands, England  

July 02, 2008

TopperH

A smarter way to hibernate

While I like the way linux handles memory caches, it can become a pain in the ass when you need to hibernate (suspend to disk) your laptop.

The memory cache size grows the more you work and tends to fulfill all your free ram. While this is good because it speeds up your system while doing repetitive jobs it can consistently increase the hibernate time.

Along with that it writes and reads lots of data every time you hibernate/resume stressing your disks.

For example my new laptop has 3gb of ram and sometimes the hibernate process has to write on the swap partition more than one gb.

I wrote a little script that (hopefully) safely wipes the memory cache and hibernates my machine.

I called it /usr/local/sbin/freeze.

#! /bin/sh
sync
echo 1 > /proc/sys/vm/drop_caches
sync
hibernate
I'm using this instead of hibernate since a couple of days and haven't encountered any issues yet.

July 02, 2008 06:16 PM :: Italy  

Jürgen Geuter

Counting google searches might be a statistic but it's not really saying what you think it does

I stumbled on a page today that used data from Google trends to determine Which Linux Distributions Are Dying?. (Google trends is a Google application that allows you to plot how often a certain phrase has been searched on Google's search engine in a certain time.)

He shows that searches for "Debian", "Red Hat", "Fedora", "Suse", "OpenSuse" and "Slackware" are all decreasing, that "Ubuntu" is raising an actually almost overtaking "linux" as a search term. His interpretation of the numbers is that the "new kids in town" (Ubuntu, OpenSuse and Fedora) are overtaking the old distros and making them less and less important. He's deriving that Ubuntu is developing towards becoming the "face of linux" and (while he's not making it explicit) he's hinting at the point of view that the old distros will lose importance and at some point if not vanish at least completely shift out of the focus of the average user: "People will not use Debian, everybody will use Ubuntu" so to speak.

I have talked about the problems with Ubuntu and the problematic fact that the Hype around it might lead some people to the impression that Ubuntu actually is Linux so I see the problem he is hinting at (though I'm not sure he sees it as a problem) but I still think that his approach to interpret the data is very flawed.

Google trends looks at searches. When you search for "Ubuntu firefox bug" Ubuntu gets a point. When you search for "Debian codec package" debian gets one. Considering you are someone who has used Linux for a longer time, think about how your searches changed:

In the beginning you have a hard time to keep the different parts of linux apart: What's the X-Server, what's a kernel, what's a DE? So all you know is your distribution name (a fact that Ubuntu's branding policy actually intensifies) which leads to you including the distro name in the search.

But the more you use linux and search engines to solve your problems the more your searches change: If you find a bug when mounting NTFS drives you will probably search for "ntfs-3g $MY_ERROR_MESSAGE" to have a bigger chance of finding something to help you: You move away from considering your distribution as the main thing towards looking at the actual program that might be messed up.

Those more generic searches do include all the problems that are specific to your distro (like all the patches that make Ubuntu just not really work) but also include other distros that might have the same issue. Your chances of getting a result are just better that way but they do not add up in Google trends.

My hypothesis is that Google trends does not capture anything about the installed base, it just shows general interest which might also hint at new users. Now you could argue that you need new users to survive and that is actually true but let's look at the Ubuntu example.

Ubuntu is hyped to death and whenever some site has nothing real to report it claims that this year is the year of the linux desktop and reports about the last Ubuntu release. This drives searches of course: Media buzz drives internet searches (which is the first thing you learn from looking at google's data). It does not show how many people googled for Ubuntu, slapped it on their computer and removed it as soon as the game they torrented wouldn't run.

It's not that the google trends data has absolutely no value, it's just that the interpretation he has given is overly simplistic and naive. I give you another example:

One person searches for "Ubuntu wine world of warcraft" which is probably a very popular search term for an average desktop user. 1 point for Ubuntu. An other person searches for "Debian mirror howto" (which is something I searched for a few days ago for a job). 1 point for debian. Now the debian search already hints at the fact that that person might be in charge of more than just one desktop machine (in my case we are talking about 50 to 100 DebianEdu/Skolelinux workstations). What does that say about installed base? If someone deploys Ubuntu on his desktop does that really count as much as someone deploying a whole network of workstations? Not that I consider the home desktop irrelevant just that just counting google searches does especially treat distributions with more professional users unfair: If someone uses Red hat Enterprise Linux he'll probably have a support contract, he just won't spam google with searches. If someone has hours upon hours of debian administration under his belt he won't search as much as someone new to linux.

Google trends are a nice toy but deriving in this way from it (a way that might even look plausible at first sight) leads to wrong results and might (in case the article gets enough publicity) even damage some distributions. The people using them might know the truth but managers often don't know a lot about that kind of matter: They have to rely on what they can quickly gather on the internet or other media. An article with seemingly right conclusions from Google's data (which looks like a good source) can do a lot of damage.

"There are three kinds of lies: lies, damned lies, and statistics." (Benjamin Disraeli, link)

July 02, 2008 12:18 PM :: Germany  

Sean Potter

Re-Install. Still sick of Vista.

I hate having to reinstall Windows. I try to avoid it like death, but sometimes it's just inevitable. Asus sent a new motherboard, and as I've written before, new motherboards usually mean new chipsets, which mean new drivers, which mean Blue Screens of Death for Windows.

Thankfully, there were no blue screens of death. However, one of my hard drives stopped working shortly after this, but I think it was already dying. Regardless, my installation of Vista had become overly bloated, so I reinstalled anyways. Things seem a little more responsive, and I don't have a shitload of unneeded baggage trying to start when Windows does.

I eventually got the broken drive working again. Now I just need to get my other broken drive working so I can retrieve my pictures.

Eventually, I will switch over to pure Linux. I'm just waiting on the Linux port of Unreal Tournament 3.

On the Agenda...

(A) Re-do my audio cables that run from computers to my surround sound receiver. I didn't insulate them properly, so there's constant buzzing in the background. It's faint, but noticeable. Ideally, I'd like to replace the receiver and run an HDMI cable to it for digital audio. For now, however, I need to purchase new jacks.

(B) Finalize my Linux installation on my MacBook. I still need to get the touchpad, 3D acceleration, and wireless working properly. I can't wait to be able to play Unreal Tournament 2004 on it.

(C) On top of all that, I need to finish a ton of reviews. I'm still looking for more PC cases for review as well.

(D) Did I mention Pittco is quickly approaching? August 16th-17th doesn't seem all that far away. And yet there's so much to do!

July 02, 2008 07:42 AM

Christoph Bauer

“The server thinks the ICQ client you are using is too old”

Somehow AOL doesn’t brighten up my day. In fact, making me angry would describe it way better. As I am using Linux, I am trying to avoid proprietary software - that’s why you won’t find me that often using Skype. The Instant messenger of my choice would be Jabber, but as my contacts don’t use it, I have to use ICQ. My Client here is Kopete.

AOL now did a protocol bump for eliminating old clients, for pushing the ICQ6 protocol, regarding the error message and the Bugreport here.

However, this is quite nasty and I guess you don’t want to read rantings, you might want to see a solution - here it is. Add the following lines to your ~/.kde/share/config/kopeterc and should be able to connect again:

[ICQVersion]
Build=0×17AB
ClientId=0×010A
ClientString=ICQ Client
Country=us
Lang=en
Major=0×0006
Minor=0×0000
Other=0×00007535
Point=0×0000


Copyright © 2007
Please note that this feed is for private use only. All other usage, including the distribution or reproduction of multiple copies, performance or otherwise use in a public way of the images or text require the authorization of the author.
(digitalfingerprint: 0f46ca51d0fa4e6588e24f0bf2b80fed)

July 02, 2008 07:11 AM :: Vorarlberg, Austria  

July 01, 2008

Martin Matusiak

can hidden complexity be good?

The intuitive answer would be: no. Complexity is the huge cross we have to bear, the great weight that squashes our systems and makes them unmaintainable. We fight complexity tooth and nail, and hidden complexity is the worst kind, because it breaks things for reasons we don’t understand. So the least you can achieve with a system is grok the full complexity of it, even if it’s too much of a mess to do anything with.

On the other hand… if you take a step back and think about what programming really *is* then you might have to rethink that conclusion. It is, plainly, finding ways to solve problems of data processing in one way or another. And to be a coder that’s really all you need to know. It does mean that you risk ending up on thedailywtf, however. So now, how is “good code” different from “just code”? What characterizes code that wins our approval? In a word: simplicity. The smartest way of doing something is the simplest way, without missing any of the requirements. A simple solution is an elegant solution, isn’t it? Simplest often means shortest, too. The principle of simplicity also relates directly to the issue of complexity in a technical sense. High performance code is efficient because it’s the most lazy way to do the job. Inefficient code, conversely, does too much work — it’s for suckers.

What “good programming” is

This ingrained characteristic of programming is reflected very clearly in just about any discussion of code that is “too slow”. People critique the code for being awkward, for doing things in a round about manner. Eventually you arrive at a solution that is typically both shorter, and clearer. On the other end of the spectrum you have concepts like “beautiful code” and “code that stands the test of time”. And when you look at the code they’re raving about, the same observation transpires. It’s simple. It’s both blindingly obvious (once you get around to thinking about the problem in that particular way) and impressively simple for something *that* hard. It is the ultimate optimization of problem vs effort.

Our product smacks somewhat of mathematical proofs. In mathematics you score points for simple solutions, but it’s not strictly necessary. All it takes is for someone to read your proof and verify that it all fits together. No one is gonna run the proof on their machine a thousand times per second.

And that is what I’m driving at here. Programming is the activity of solving a problem such that the solution exerts the least amount of effort. It’s kind of funny that we of all people are dedicated to this particular discipline. Us, with the expensive silicone that can perform more operations than any other machine.

Set in those terms, programming is the art of doing as little as possible. And the great pieces of code are great not in what they *do* but in what they *don’t do*. In other words, if you write good code, it’s because you’ve found a way to take an input and do the least amount of work to produce the output. Baked into that is the secret of choosing your input very carefully in the first place. So if you can gain something from the form that the input is in, then you’re achieving something without writing any code for it. This is step 1 toward your brilliant piece of code.

Hidden complexity

But this is also when you start introducing complexity. Even if it’s external to your program, it is still an assumption that must hold. Is this good or bad? From your classical software engineering perspective, you badly want to minimize the amount of text you have to read to look at a piece of code and make sense of it. But then again, you need to know everything, because ignorance will bite you.

For example, in spiderfetch I spider a web page for urls. And it can run in a mode that will just output the urls and stop there. Now, if the urls are in the same order that they appear on the page, this is a big advantage, because the average web page can easily yield 50 urls and the user won’t be able to easily recognize them if they come up in some random order. But this is also too cosmetic an issue to be an explicit requirement. I certainly didn’t think of this particular issue when I started working on it. If you really wanted to document this kind of behavior down to the smallest detail, your documentation would be enormous. Pragmatically speaking, this behavior probably would not be documented.

Why is this an issue? Because giving a unique list of urls *is* a requirement which will influence this particular issue. (Hence, list(set(urls)) won’t do the trick).

Suppose you find a way to produce the desired behavior without doing any work (or doing very little). Should this be documented in order to make this bit of complexity explicit? If this added bit of complexity doesn’t affect the working of the function much at all, then it’s quite peripheral anyway. What are the risks? If you break the function then obviously the peripheral complexity doesn’t affect you. If you refactor you might break the peripheral bit, because it wasn’t written down anywhere. On other other hand, if all such peripheral bits were to be specified, it would take you that much longer to grok the code at all.

The message we send

The question we like to ask ourselves is: what happens to the person who inherits the code? Will he notice the “hidden” (or more precisely: incidental) desirable behavior? If not, is it really important enough to document it? And if yes, will he understand why it works that way? If you lose this behavior you haven’t broken the program. You have degraded it in a cosmetic way, but it still works well enough. So does it really need to be explicit?

The so-called clever coder that every middle manager is blogging his heart out about hiring, will, obvously, notice the hidden complexity. And know both how and why it’s there. The less clever coder might not notice. Or he might notice, but not understand the thought process behind it. What do we want to say to him? It’s okay if you mess this up, it’s not that important -or- Pay close attention to the detailed documentation or you might break something?

July 01, 2008 08:13 PM :: Utrecht, Netherlands  

Jürgen Geuter

PHP should not be your template engine

PHP is probably the language that most people trying to do something dynamic in the web start with. It's easy to learn (and hard to master) and you get results really quick which is a lot more motivating than having to write many lines of boilerplate code that looks like black magic.

PHP suffers from many design flaws that can't be easily repaired without turning it into something non-PHP but that is a completely different story that I'm not talking about here. You can be productive in PHP and if you know what you are doing you can write good software that scales well and that many people can continue developing (which is a huge benefit that some people sometimes forget).

When you are writing a non-trivial software that generates HTML you'll need some kind of templating (If you don't do templating you are doing it wrong. Seriously. This is not just some opinion, that's just how it is.) to separate logic from what is really sent to the browser. Your "real" program does the calculations, the database queries and data transformations which are then given to the template that "knows" how the HTML is supposed to look.

The advantages are obvious: You can change the look of a page without changing the actual code which allows you to work better in teams. Your designer who can't really code can still work with the data you've given him and that you have set up properly. No need to worry about breaking code while trying to add an image. It also makes your business logic, the code that actually does something, simpler, leaner, easier to understand and all in all better. It's just easier to fix a bug when your code is not cluttered with "<a href=\"" stuff in it.

Now especially in the PHP land you sometimes get people that claim that PHP does not need a templating system because PHP is just a big templating engine. Since PHP is build to mix PHP code and HTML you find basically HTML files with some processing sections in the top of the file and some variables and loops thrown in the HTML code. It might sound smart since you get rid of an abstraction layer which might make stuff faster but is that little overhead really worth it?

The point of a templating engine is that it's just doing formatting, no calculations or database queries (which is all stuff that the business logic has already taken care of). This is the reason that many templating engines just offer a small subset of the features that "real" programming languages: The templates don't need all those features and actually shouldn't have them.

The fact that templating languages are that reduced help you with the separation of layers of your application: If you come to a point where your template is not really able to do what you wanna do with the data you probably have stumbled on a point where the business logic should do more heavy lifting (or where your programmer writes a filter to extend the templating system with a specialized function). Having an "all powerful" templating engine introduces the opportunity to break more things, it is harder to learn for designers who are not programmers and it can lead to you not separating your layers properly.

Of course you can still violate the layering even with templating engines, but the danger is smaller and everything that makes your code easier to understand makes the web a possibly safer and therefore better place.

PHP is just in danger because the language itself is build upon mixing code and HTML but it's true for other languages, too. If you put "<img src" into one of your django views you are doing it wrong. The limitations of templating engines are a blessing in disguise, your results are better exactly because of the limitations they enforce.

July 01, 2008 06:00 PM :: Germany  

Alex Bogak

The new computer

Hi all

I've purchased a new computer couple of weeks ago. I made a research on my locale market, and found that I want the following configuration:
  1. CPU: Intel Q9450
  2. MB: Gigabyte GA-EP35-DS3R
  3. Memory: Mushkin CL4 4Gb
  4. HD: Western Digital WD5001ABYS
  5. Case: Antec Sonata Plus 550
  6.  Graphics: Gigabyte GeForce 8500GT Silent 512Mb

All in all, it came in about $1630.
Currently, my main work requires me to work with lots of different Windows configurations. It lead me to use the following installation:

Main OS: CentOS 5.2 (updated 2 days ago) 64bit
Windows OS: XP machines in VirtualBox.

This configuration was chosen to provide as versatile environment as possible.

CentOS Linux was chosen as enterprise-grade OS, providing me as stable environment as possible.

It is a great, quite machine. I'm happy.

July 01, 2008 01:54 PM :: Israel  

TopperH

Ebuild for SMILE

I needed a slideshow application to make a dvd of my daughter's pictures for my grandparents. I always knew, even if never tried that there is an app called "manslide" to do it.

After some google searches I found out that this app is broken and unmanteined, and the kde-apps page has been removed.

Smile (Slideshow Maker In linux Enviroment) seems to be a nice qt4 rewrite of manslide. I haven't been able to find an ebuild for it, neither in bugzilla or in some overlays, so I wrote one by myself. I've never been good in writing ebuilds, btw this one just works:

# Copyright 1999-2008 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: $
inherit eutils qt4
DESCRIPTION="Slideshow Maker In Linux Environement"
HOMEPAGE="http://www.kde-apps.org/content/show.php/SMILE?content=83276"
SRC_URI="http://www.mandrivalinux-online.eu/manslide/${P}.tar.gz"
SLOT="0"
LICENSE="GPL-2"
KEYWORDS="~amd64"
IUSE=""
DEPEND=">=media-sound/sox-14.0.1
media-video/mplayer
( $(qt4_min_version 4.2.0) )
"
RDEPEND="${DEPEND}"
src_compile() {
cd smile && eqmake4 smile.pro
emake || die
}
src_install() {
dodir /usr/bin
exeinto /usr/bin
doexe smile/smile
make_desktop_entry Smile Smile smile "Qt;AUdioVideo;Video"
}

July 01, 2008 01:39 PM :: Italy  

June 29, 2008

Brian Carper

Lispforum.com

Ten days ago I complained that there were no good Lisp equivalents of ruby-forum or perlmonks. It looks like someone went and made one. What good timing.

I hope it's a success, and I hope it stays newb-friendly. The amount of fake watch and shoe spam on comp.lang.lisp has reached critical mass.

Speaking of mailing lists, maybe it's just me but I've never found mailing lists to be all that enjoyable to use. They have the benefit of being a sort of lowest common denominator (everyone has email, and you can slap an HTML interface on top of one). They also have the benefit of being distributed to some degree, because everyone who gets the email serves as an archive, and if the main server dies maybe you can recover things. And mailing lists do have less overhead than MBs when it comes to running one, especially a high-traffic one, I would imagine.

But the bad things about mailing lists vs. message boards:

  • Message boards are accessible from anywhere that you have a web browser, which is everywhere. Email isn't necessarily accessible from everywhere, unless you use webmail or SSH home and use mutt or something, which not everyone can or wants to do. Or if there's a good web interface on the mailing list.
  • You can't do anything more than plaintext, which isn't entirely a bad thing, HTML email is pure evil, but being able to cleanly post images or clickable links or formatted text on a message board is a nice feature.
  • Threading never quite works correctly on mailing lists, because eventually someone will hit the wrong button in their mail client and break the thread; whereas on a message board it always works fine.
  • You can move threads around between forums on an MB, you can edit threads, you can close threads, you can delete a post if you make a mistake; but mailing lists are write-only, and once you send a message off into the ether it's posted for everyone to see forever, and no one has much control over a list beyond moderating the messages that end up getting through.
  • Avatars. Personal profiles. These things make people seem more like people and less like a nameless entity. It's friendlier and more inviting.
  • The HTML interfaces people slap on top of mailing list archives are pretty horrible 95% of the time. Probably because most people are using email clients anyways so no one cares. Message boards generally look nice and have nice interfaces for reading and posting.
  • Email sucks. Spam filters and bounced messages mean you never quite know if what you just wrote actually made it to the list. Reply to list vs. reply to sender vs. reply to all, etc. are all needless complications. How many times have you seen "UNSUBSCRIBE" sent to everyone on a list? The interface to mailing lists is not intuitive. Whereas you can always see immediately if an MB post worked or not.

And so on. I likes me my message boards.

June 29, 2008 05:51 AM :: Pennsylvania, USA  

June 28, 2008

Niel Anthony Acuna

linux raw spin locks

apparently, there are some issues with linux 2.6 and pearpc.. so i still cant
get the gentoo ppc to start properly. i thought i’d do some studying in the
meantime..

typedef struct {
        volatile unsigned int slock;
} raw_spinlock_t;
static inline void __raw_spin_lock(raw_spinlock_t *lock)
{        asm volatile(\"\n1:\t\"
                     LOCK_PREFIX \" ; decb %0\n\t\"
                     \"jns 3f\n\"
                     \"2:\t\"
                     \"rep;nop\n\t\"
                     \"cmpb $0,%0\n\t\"
                     \"jle 2b\n\t\"
                     \"jmp 1b\n\"
                     \"3:\n\t\"
                     : \"+m\" (lock->slock) : : \"memory\");
}

interpretation of the extended inline asm:

1) : “+m” (locl->slock)
“+m” means that the memory operand ‘lock->slock’ is both input and output.

2) : “memory”
from gcc manual:
If your assembler instructions access memory in an unpredictable
fashion, add `memory’ to the list of clobbered registers. This will
cause GCC to not keep memory values cached in registers across the
assembler instruction and not optimize stores or loads to that memory.

3) LOCK_PREFIX is defined at /usr/src/linux/include/asm/alternative.h:

#define LOCK_PREFIX \
                \".section .smp_locks,\\"a\\"\n\"   \   ; push current section and change section to .smp_locks
                \"  .align 4\n\"                  \   ; align to 4 bytes
                \"  .long 661f\n\" /* address */  \   ; create an entry in the up/smp alternative locking table
                \".previous\n\"                   \   ; pop back the old section
                \"661:\n\tlock; \"                    ; assert processor LOCK# signal

LOCK_PREFIX is supposed to be called within the context of a procedure (executable) which finds its way to a .text section.

i haven’t digged down deep into the internals of ‘alternatives’ yet, but a quick look at kernel/alternative.c :: alternatives_smp_unlock() shows that it’s employing a self modifying code that patches the “lock” instruction with an architecture specific NOP opcode.

without all the extended asm semantics above, the asm roughly translates to:

1:      lock       ; decrement atomically
        decb    %0 ; really decrement now..
        jns     3f ; successful in acquiring lock?
2:      rep        ; nothing
        nop        ; nothing
        cmpb    $0, %0 ; someone must release the lock first.
        jle     2b     ; busy wait...
        jmp     1b     ; lock available! try acquiring lock again.
3:     ; exit __raw_spin_lock()

and then comes the trylock variant

static inline int __raw_spin_trylock(raw_spinlock_t *lock)
{
        char oldval;
        asm volatile(
                \"xchgb %b0,%1\"
                :\"=q\" (oldval), \"+m\" (lock->slock)
                :\"0\" (0) : \"memory\");
        return oldval > 0;
}

interpretation of the extended inline asm:

1) : “=q” (oldval)
output oldval in any a,b,c or d register.

2) : “+m” (locl->slock)
“+m” means that the memory operand ‘lock->slock’ is both input and output.

3) : “0″ (0)
%0 matches the output register for oldval. initialize that with the value 0.

4) : “memory”
from gcc manual:
If your assembler instructions access memory in an unpredictable
fashion, add `memory’ to the list of clobbered registers. This will
cause GCC to not keep memory values cached in registers across the
assembler instruction and not optimize stores or loads to that memory.

__raw_spin_trylock() basically exchanges the current value in raw_spinlock_t->slock with zero and returns true if locking was successful and false if acquiring of lock didn’t succeed.

lastly, we have the raw unlock procedure:

static inline void __raw_spin_unlock(raw_spinlock_t *lock)
{
        char oldval = 1;
	
        asm volatile(\"xchgb %b0, %1\"
                     : \"=q\" (oldval), \"+m\" (lock->slock)
                     : \"0\" (oldval) : \"memory\");
}

i haven’t been able to understand the significance of the “%b” in the assembler template but looking at the asm listing of the template with and without the %b yielded the same intructions, so could anyone shed some light on this one?

basically looks like __raw_spin_trylock(), but the difference is that oldval is now initialized to 1 instead of 0.

there are many variants stemming from the idea of this implementation. raw reader/writer locks which allows multiple readers but only one writer (with no concurrent readers). and then there’s the spinlock layer above the raw spinlock implementation.. but those are for another post.

’til then.

June 28, 2008 04:04 AM :: Zamboanga, Philippines  

Martin Matusiak

spiderfetch, now in python

Coding at its most fun is exploratory. It’s exciting to try your hand at something new and see how it develops, choosing a route as you go along. Some poeple like to call this “expanding your ignorance”, to convey that you cannot decide on things you don’t know about, so first you have to become aware - and ignorant - of them. Then you can tackle them. If you want a buzzword for this I suppose you could call this “impulse driven development”.

spiderfetch was driven completely by impulse. The original idea was to get rid of awkward, one-time grep/sed/awk parsing to extract urls from web pages. Then came the impulse “hey, it took so much work to get this working well, why not make it recursive at little added effort”. And from there on countless more impulses happened, to the point that it would be a challenge to recreate the thought process from there to here.

Eventually it landed on a 400 line ruby script that worked quite nicely, supported recipes to drive the spider and various other gimmicks. Because the process was completely driven by impulse, the code became increasingly dense and monolithic as more impulses were realized. And it got to the point where the code worked, but was pretty much a dead end from a development point of view. Generally speaking, the deeper you go into a project, gradually the lesser the ideas have to be to be realized without major changes.

Introducing the web

The most disruptive new impulse was that since we’re spidering anyway, it might be fun to collect these urls in a graph and be able to do little queries on them. At the very least things like “what page did I find this url on” and “how did I get here from the root url” could be useful.

spiderfetch introduces the web, a local representation of the urls the spider has seen, either visited (spidered) or matched by any of the rules. Webs are stored, quite simply, in .web files. Technically speaking, the web is a graph of url nodes, with a hash table frontend for quick lookup and duplicate detection. Every node carries information about incoming urls (locations where this url was found) and outgoing urls (links to other documents), so the path from the root to any given url can be traced.

Detecting file types

Aside from the web impulse, the single biggest flaw in spiderfetch was the lack of logic to deal with filetypes. Filetypes on the web work pretty much as well as they do on your local computer, which means if you rename a .jpg to a .gif, suddenly it’s not a .jpg anymore. File extensions are a very weak form of metadata and largely useless. Just the same with spidering, if you find a url on a page you have no idea what it is. If it ends in .html then it’s probably that, but it can also not have an extension at all. Or it can be misleading, which when taken to perverse lengths (eg. scripts like gallery), does away with .jpgs altogether and encodes everything as .php.

In other words, file extensions tell you nothing that you can actually trust. And that’s a crucial distinction: what information do I have vs what can I trust. In Linux we deal with this using magic. The file command opens the file, reads a portion of it, and scans for well known content that would identify the file as a known type.

For a spider this is a big roadblock, because if you don’t know what urls are actual html files that you want to spider, you have to pretty much download everything. Including potentially large files like videos that are a complete waste of time (and bandwidth). So spiderfetch brings the “magic” principle to spidering. We start a download and wait until we have enough of the file to check the type. If it’s the wrong type, we abort. Right now we only detect html, but there is a potential for extending this with all the information the file command has (this would involve writing a parser for “magic” files, though).

A brand new fetcher

To make filetype detection work, we have to be able to do more than just start a download and wait until it’s done. spiderfetch has a completely new fetcher in pure python (no more calling wget). The fetcher is actually the whole reason why the switch to python happened in the first place. I was looking through the ruby documentation in terms of what I needed from the library and I soon realized it wasn’t cutting it. The http stuff was just too puny. I looked up the same topic in the python docs and immediately realized that it will support what I want to do. In retrospect, the python urllib/httplib library has covered me very well.

The fetcher has to do a lot of error handling on all the various conditions that can occur, which means it also has a much deeper awareness of the possible errors. It’s very useful to know whether a fetch failed on 404 or a dns error. The python library also makes it easy to customize what happens on the various http status codes.

A modular approach

The present python code is a far cry from the abandoned ruby codebase. For starters, it’s three times larger. Python may be a little more verbose than ruby, but the increase is due to a new modularity and most of all, new features. While the ruby code had eventually evolved into one big chunk of code, the python codebase is a number of modules, each of which can be extended quite easily. The spider and fetcher can both be used on their own, there is the new web module to deal with webs, and there is spiderfetch itself. dumpstream has also been rewritten from shellscript to python and has become more reliable.

Grab it from github:

spiderfetch-0.4.0

June 28, 2008 01:18 AM :: Utrecht, Netherlands  

June 27, 2008

Matt Harrison

Why doesn't pushd popd work in my Ubuntu shell script?

I'm in the process of moving over to my new laptop. I run into an issue where a shell script (that worked under gentoo) now no longer works. Here's the error foo.sh: 18: pushd: not found Since I'm running this as sh foo.sh, I turn on debugging wit

June 27, 2008 04:42 AM :: Utah, USA  

New lappy, Lenovo T61p (linux edition)

My trusty r52 switches between a state of 100-200 free megs of memory and no memory on the 40 gig hard drive. Rather than investing in an expensive IBM harddrive (I've heard the BIOS on my machine won't take normal OEM drives), I got a new computer.

June 27, 2008 04:30 AM :: Utah, USA  

June 26, 2008

George Kargiotakis

rxvt-unicode 256 color support with vim

Following my previous post on minimizing the resources that urxvt needs on Gentoo, I tried applying some more patches to it that I found in Gentoo’s bugzilla. Since that happened a few days ago there was no ebuild for version 9.05 yet. So I created one and applied the patch for the 256 color support. Here’s my [...]

June 26, 2008 11:14 PM :: Greece  

Thomas Capricelli

Static link with cmake under windows (Qt and others)

One of the great things about Qt is that you can compile the code under esoteric OSs (mmh?) like Windows or MacOS. Although I’ve almost never used windows, I did the packaging of yzis. I used the Nullsoft Installer, which is ok.

I had several issues with static linking, and thought maybe some of you could give some help about it. Don’t ask why I want static link, this is not the issue here.

Problem #1:

The most important one is Qt itself. I expected that to be easy, but it is not. The FindQt4 shipped with cmake 2.4.8 or 2.6 is not really aware of static lib, or is it ? Nor is the one from kde svn. I’ve tried using the one from quassel-irc, but it fails as well. If you can help me with this, the code is available.

It seems (according to #qt on irc) that it makes sense to have two different Qt installed, one static and one shared. I did that (double the space used…) and in one of them did “configure -static” and recompiled qt. Now the *.a are much bigger, and though qyzis links against those *.a, it still depends on the DLLs, according to “dependencies walker”. Gr…

Problem #2:

Then, there is the problem of gnuwin32 tools. In the lib/ directory, you can find for example

  • libintl.dll.a
  • libintl.def
  • libintl.lib

And the DLL (libintl3.dll) is in the bin/ directory. I need to ship the DLL with qyzis, because cmake found the dll, and linked against libintl.dll.a. In a perfect world, I would like cmake to use the libintl.lib and have a static link, but i dont know how to convince cmake of doing that. Do you know ?

Problem #3:

How do you find out what kind of library a file is ? According to what I’ve found googling the web, libraries can have a wild number of different names (libX.dll, libX.dll.a, libX.lib, libX.a, all of this without the lib prefix and so on…). Even worse : libX.a for example, could be either a static lib or the stub for the DLL. Do you have clarifications about this ? Do you know of a reliable way to know which kind of library it is ? (DLL, DLL stub, or purely static).

June 26, 2008 10:23 PM

June 25, 2008

Thomas Capricelli

Release of Yzis-1.0-alpha1

Ok, here it is, things have finally settled and we are happy to bring you the first alpha release for Yzis. The foundations are stabilizing, and we are now focusing on fixing bugs and porting to different architectures, OSs, and interfaces.

What you have so far :

The most noteworthy missing feature is the KDE embeddable component, we are aware of this, and this is the main thing we shall work on after 1.0 is out.

At this point, we are not yet asking for broad testing, but developers not afraid of bleeding edge are encouraged to test it out. If you can help with porting/solving bugs, this is of course even better.

You can join us on #yzis on freenode to discuss all of this.

The web site has been updated, and you should find information about getting the source from source control (mercurial) and compiling on most platforms.

http://www.yzis.org

The tarballs and the windows installer can be downloaded from this url:

http://labs.freehackers.org/projects/list_files/yzis

Below are some screenshots showing the different interfaces (qt/linux, ncurses, qt/windows):

ncurses frontend

June 25, 2008 10:54 PM

Ray Booysen

Sumatro

I’ve been pushing some new photography to my photoblog over at Sumatro.com.  I have a new set of work lined up, all being published at 9am every morning.  To keep up to date, subscribe to the RSS feed.

Would love some comments/idea/abuse on the photos if you have the time. :)

June 25, 2008 07:55 PM :: England  

Jürgen Geuter

Your automatically generated APIdoc is not a replacement for real documentation

When dealing with software you have not written yourself (or have written a year ago which can make it as alien as code someone else wrote) you have to rely on the software's documentation.

The quality of documentation varies a lot from one project/software to another: Some projects have really good and useful documentation, some just gather links to articles by third parties and some completely lack documentation, the quality of a software's documentation should always be one of the most important arguments for or against using a certain software.

One of the worst kind of "documentation" is the "API docs". It's not useless at all, you need it if you use that on class from the software and you really need to know in detail which methods instances of that class have, but some projects seem to think that it is a replacement for real documentation.

When I had to learn JAVA at university we had to do a so-called "software project" written in JAVA (it was with a group of around 10 people and ran for half a year). Back then JAVA's documentation was pretty much just their auto-generated "JAVAdoc" API documentation. (They had a few tutorials but those were written so badly that they were of no real use to someone trying to figure out how to do something.)

API docs help when you are in the middle of doing something and you need details about the entities you work with, when you have already picked your entities, but they do not help you when you are looking for something like "I need a container for a bunch of different objects that might or might not have the same type with the following properties...".

Let's look at a few project's documentation and see how good they are:

Python



The Python documentation is separated into different segments: There's the Tutorial that's even marked with "Start here" so the beginner knows where to dive in. Then there's the library reference that explains what a certain built-in module can do and how it works before diving into the details of the classes and functions that module supplies.

The Python docs are very good, it's easy for a newbie to dive in while still offering people that need info about that one special function a direct and quick access. It's a good compromise but does lack something that other languages' docs have: User comments that can sometimes elaborate something that the docs don't make clear enough (you just cannot see any problem that people might have).

PHP



PHP's documentation is not as strictly separated as Python's, but the document is separated into chapters that somewhat mirror the segments that Python's docs have. It starts with a tutorial, then goes over the language syntax with then going over the modules that PHP comes with.

PHP is easy to learn but that is also due to the really good docs: You go through the first two chapters of the docs and you know your way around. Then you can look up the special functionality that you need for whatever you do. The biggest plus point of the PHP docs is that every documentation page has a comment section where uses can contribute notes about quirks that a certain function has or that offer some example for common use cases. The md5 documentation for example shows you why your calculated md5 hash might not match one you calculated in .net or mysql. This is really useful not only to new users but also people that do know their way around but stumble on some weird quirk.

All in all PHP might be a badly designed language but the documentation is really good and helps you dive into it quickly and effectively. The user comments are (while there is of course the danger of spam) a very valuable tool to enrich your docs with information that might just not fit into the actual documentation article.

Ruby On Rails



Ruby on Rails's docs are separated in the API docs, a link to a bunch of books you can buy and a bunch of links to tutorials somewhere on the net.

This is an example for bad documentation: The team just supplies API docs that are not really all that helpful if you don't know exactly what you are looking for in the first place. Having the second aspect in your docs be advertising for books that are not available for free does make your thingy look like somewhat of a scam, the new users feels like he has to pay a tax to start using your stuff. Not good.

The fact that there are good tutorials on the net is neat and will help users, but what happens when you change something and the linked tutorial doesn't work anymore? Your users will try to use it and fail and how should they be able to determine which tutorial is right and which ain't? If they have two tutorials that work but that work somewhat different, how can they determine which can be considered "best practice"?

All in all the Ruby On Rails documentation is not good, while the framework might be opinionated (something I always like) it lacks tutorials by the devs that the user can be sure are correct and teaching best practices.

Django



Django's documentation is separated into a few main parts: The essential docs that include installation howtos, the tutorial working you through creating an application with django and and the FAQ. The next segment called "Reference" is going into detail about the different parts of django and how to work with them. The docs end with a howto on deployment and a few howtos for common problems.

The best thing about the django docs is the style. It's not weird technobabble or random API docs: It's written in plain English with a clear focus on the audience. The template language documentation for example is separated in the part for "HTML authors" and the part for "Python Programmers", with the first part being completely read- and understandable by someone with no clue about python or django (like someone who's just a designer that builds the HTML for a django application) and the latter one telling you how to do powerful thingy by extending the built-in stuff.

All in all Django's docs lack the comment feature that PHP has but probably have the best writing style and are the easiest to work with and understand.

All documentation I reviewed here is for rather popular projects and it's already obvious how different the quality in those big projects is, when you come to even smaller projects the quality can diverge even more.

Some big projects I didn't have to review in detail (like JAVA's docs that are pretty much still a trainwreck) but what you always have to keep in mind: A software with bad documentation that does not empower you will require you to buy books where someone explains what all the fuss is about. But usually there's not "the one" book, look at how many JAVA books there are (JAVA is just one example, C, C++, C# ain't much better)! With so many books around you'll have to dig through book recommendations to see which one of the books is worth reading and which might help you, which is even more time you have to invest into something plus the books usually ain't free.

When you decide on the software you want to use for your next project, have a look at the documentation for the project. Read through it so see if you feel like it's really telling you how it works. If there are books available that is a plus (see, I don't hate books! ;-) ), but you shouldn't be forced to buy books just to be able to use something; books should provide something different that the official docs, like the O'Reilly "cookbooks" that offer recipes for common problems for example. A book can also wrap the documentation up in a different way which can help you understand them better.

Do you know other projects that have a particularly good or bad documentation? Please share them in the comments.

June 25, 2008 09:54 AM :: Germany  

Christoph Bauer

Static webpages

Are you ‘up to date’ if you are still using a static webpage today? I wouldn’t say that there is a global answer to that question and everybody’s got to find out for himself.

Usually I’m a big fan of automatism and technology and its advantages - but is it really worth it? Let’s have a closer look at the topic:

A Content Management System (CMS) is a program, running on your webserver, generating the html output for the visitors browser, which is actually a good thing, if things are changing a lot.

But a CMS is just a piece of software that needs updates and patches too, disregarding new features and stuff as there might be undiscovered security holes which might turn out to be a rather serious problem. I don’t want to question any cms here. I just want to make you think if you really need one.

I’ll give you an example: My domain stargazer.at hosts a portal page, offering links to my services, my blog, whatever. As my services don’t change that often, there is no need for dynamical created content as things just don’t change. If I think back, the last update on the portal page happened four month ago. Since then it wasn’t touched anymore. Regarding security, this would be quite fatal. In the other hand, my blog here gets much more love and changes much more often. That’s why I am using Wordpress here and plain html on my portal page.


Copyright © 2007
Please note that this feed is for private use only. All other usage, including the distribution or reproduction of multiple copies, performance or otherwise use in a public way of the images or text require the authorization of the author.
(digitalfingerprint: 0f46ca51d0fa4e6588e24f0bf2b80fed)

June 25, 2008 08:50 AM :: Vorarlberg, Austria  

Sean Potter

Slowly coming along...

Nick was home from college this past week, and I took a lot of time away from working on the blog and BIOSLEVEL to spend some time with friends. It was a good time for sure, but I always come to regret it when I realize how little work I've done.

Hopefully I get things back on track over the next few days. I'm way behind on several reviews for BIOSLEVEL, and some of the products were just released in the last two weeks or so.

I also rebuilt two of my computers in to new housing, with a few new components that were sent to me for review. Let me say that I hate reviewing some CPU coolers. A certain company that will go unnamed for now sent two CPU coolers that otherwise seem impossible to install on an Intel machine. Stay tuned for a review, if I ever get them installed.

On top of all that, I somehow started playing World of Warcraft again.

June 25, 2008 07:59 AM

Brian Carper

Laptops at border crossings

There's an article on Slashdot about a US Senate hearing on laptop seizures at border crossings. This affects me, because I travel to Canada a lot and plan to move there within a year or so.

It's a problem because my job requires me to handle what amount to people's medical records as data files on my laptop. It's part of my job, and often I work from home. As of right now, I never take my laptop with me to Canada partly because I don't know what would happen if a border agent decided to inspect or copy all of my data. I can get in very serious trouble for breaching patient confidentiality. On the other hand I could get in serious trouble if I refused to allow a search for myself; at best I'd be turned way at the border, having wasted hundreds of dollars to travel there.

I really don't know what I'm going to do when I move. I'll probably have to wipe my computers clean before shipping them up there. Another option would be to encrypt all the data, upload it to the server that hosts my website, then download it all again after I move. It's insane that I'd have to do such a thing though. And shuffling sensitive data around to strangers' computers and servers isn't the safest thing in the world either.

How do lawyers and doctors and people with trade secrets and other people with classified or legally protected information handle border crossings? It's a bit of a conflict of interest.

June 25, 2008 06:08 AM :: Pennsylvania, USA  

June 24, 2008

Johannes Gilger

The GitHub

GitHubOk, I hinted that I would do a more thorough review of GitHub, the new and easy-to-use git repository hosting site. Although I’m still no power-user I’ve come to know the features that make GitHub worth using and so far unique.
The “Fork” feature is probably the most important one. Instead of just cloning a repository and working on it locally, you can fork it on GitHub. When you fork a project everyone can see you did, and has a nice flashy graph-view so they can see where you branched and what commits you made that are not (yet) in the upstream. GitHub forkAnd if you have introduced changes that you think would benefit the project you can send people (e.g. the original project owner) a “pull-request”. The recipient can then easily fetch/merge your changes into his project. It really doesn’t get much easier to contribute to (open-source) projects. I certainly did for the first time ;)

GitHub forks

Then there are feeds. You can watch projects, which means that your feed includes any commits/comments on the watched projects. It goes without saying that this can be quickly overwhelming for active projects. GitHubComments can be made on specific lines in a commit (or on the whole commit), which is a great feature (think of it as the equivalent to patches being discussed line-by-line in mailing-lists). I still prefer mails though ;)
The syntax-highlighting looks pretty good. I’ve already mentioned the very pale interface (as in low-contrast) and it still has not changed. But I think that most people really browse the commit history in their local clones anyway.
Each project has an attached Wiki too, so you can add a few pages (or a lot if you wish).
To conclude I can say that GitHub is a great service, since it has a free plan for public projects. I would not need it for my personal projects, but to run a or contribute to a open-source project it’s perfect. The amount you use its features is up to you. If you already have a Wiki, already have an active mailing-list and discuss patches there too then you can just use GitHub as the central source code repository (plus the forks of course). The real work is done on your local repository with git anyway, but y’already knew that I guess ;)

June 24, 2008 08:46 PM :: Germany  

Martin Matusiak

emacs that firefox!

So the other day I was thinking what a pain it is to handle text in input boxes on web pages, especially when you’re writing something longer. Since I started using vim for coding I’ve become aware of how much more efficient it is to edit when you have keyboard shortcuts to accelerate common input operations.

I discovered a while back that bash has input modes for both vi and emacs and ever since then editing earlier commands is so much easier. And not only does it work in bash, but just as well in anything else, like ipython, irb, whatever. :cap:

So now only Firefox remains of my most used applications that still has the problem of stoneage editing, and I’m stuck using the mouse way too much. It bugs me that I can’t do Ctrl+w to kill a word. Thus I went hunting for an emacs extensions and what do you know, of course there is one: Firemacs. Turns out it works well, and it also has keyboard shortcuts for navigation. > gets you to the bottom of the page, no more having to hold down <space>. :thumbup:

June 24, 2008 08:20 PM :: Utrecht, Netherlands  

Jürgen Geuter

NVIDIA: Fuck you.

(Imagine the headline being sung to the melody of the "America: Fuck yeah!" song from the "Team America - World Police" soundtrack).

After a bunch of kernel developers issued a statement concerning closed source kernel drivers arguing that closed source kernel drivers are pretty much bad for everyone (the reasons are obviuously that binary modules have a hard time to keep up with kernel development, them crashing makes the kernel look bad and the community cannot fix issues within those blobs, in addition to the fact that many consider them to be illegal) NVIDIA replied:
Nvidia reiterated that it won’t provide open source drivers for Linux because the company claims there is no need for it.


So there is no need for it? What about all the problems people are having getting your blob to run because it takes ages for you to support new Xserver functionality or because it takes you quite some time to modify your blob to work with recent kernels? What about the problems many people with your graphics cards have with suspending their machines? What about the need to hack around so your graphics cards don't start with the fan pretending to be a helicopter? What about your "we write our own aiglx implementation that works a little differently than the default one just because we like to do that kind of stunt"?

If there's no need for free drivers, why does the nouveau project exist? Why do people keep on asking your for open drivers?

NVIDIA will probably stay strong on the Windows side of things where people don't care that the NVIDIA driver crashes their Windows all the time but I hope that the people using free operating systems will make their wallets do the talking. It might not mean a lot to NVIDIA but the extra business the competition gets is a good sign to support their effords to create open and free drivers.

It's simple (especially for those who don't know a lot about linux but who wanna use it): Buy a card with free drivers (Intel or AMD) and things work out of the box. My laptop has the intel graphics chip which has free drivers and I have 3D accelleration on a liveCD without any setup. I install a system and everything is as fancy as it can be (though I don't use desktop effects cause they are more of an annoyance than a nice addition). With binary drivers you have to dig through howtos and forums and whatnot. With free drivers linux takes all that work away from you and you can focus on what you really want to do. Easy choice.

June 24, 2008 09:46 AM :: Germany  

Dan Ballard

Work around I probably shouldn’t need

I had to install amsn just so could video chat with a friend on MSN. Really? On the plus side, at least I could do it :)

Also, figured out how to use skype on Ubuntu. Skype really want /dev/dsp. And Ubuntu now uses pulseaudio. So basically if you've done anything with sound, Skype won't be able to get the sound. Which sucks. However, pulse audio ships with this handy utility 'pasuspender' which temporarily suspends pulse audio and it's lock on /dev/dsp. So to actually use skype

pasuspender skype

And skype can then seize and monopolize the sound card. So you can use it, but no other sound till you shut Skype down. So it's 2008 and we still can't share the sound card :/.

June 24, 2008 08:39 AM :: British Columbia, Canada  

John Alberts

Turbotail and multitail

I just found a couple cool programs called turbotail and multitail while searching for rbot using eix.

Turbotail is just like tail, but it uses dnotify instead of auto refreshing a defined number of seconds.  I always thought it was kind of silly to keep refreshing the screen searching for new content with tail.  Turbotail just sits there until the kernel notifies of a change in the file that you are tail’ing and then it updates what you see.

Multitail looks like a VERY robust way of viewing multiple files.  It can tail any number of files and supports text filtering and even syntax highlighting.

Turbotail works great, but unfortunately multitail crashes when I try to run it from my Yakuake console.  I get this:

--*- multitail 5.2.0 (C) 2003-2007 by folkert@vanheusden.com -*--
 
A problem occured at line 511 in function mynewwin (from file term.c):
 
Failed to create window with dimensions 55x9 at offset -27,-4 (terminal size: 167,19)

Seems to work just fine from a regular console though.  It will take me a while to actually learn all of the features of multitail.

June 24, 2008 02:32 AM :: Indiana, USA  

New Job!

After a while of looking for a new job, I finally got a new job. Well, actually, I’ve been working my new job for about 3 months now. So… I guess it’s not really a new job anymore.

I’m now a full time Linux administrator with ExLibris. Unfortunately, Red Hat is the preferred distribution. That’s to be expected. Most business want to make sure they use something that’s proven and has a clear line of support.

The new job is in Des Plaines, Il, which means I’ll have to sell my house and move a little closer. So far, the job seems pretty good. It’s doing something I like and the people are nice, and most of them seem pretty smart.

June 24, 2008 02:17 AM :: Indiana, USA  

June 23, 2008

Jürgen Geuter

Platform independence

You'll probably know this phrase either from some company trying to sell you their JAVA stuff or from some university students trying to look competent: "Our software is platform independent because we use technologyX which is platform independent."

Now this does not only look right, it also looks smart: The people chose a development platform that allows them to deploy their software on different operating systems and hardware architectures, which offers them the benefit of possibly bigger markets and target audiences. It also reduces the amount of code dealing with target platform details. Smart.

And wrong. Your code has to be platform independent or your choice of development environment means absolutely nothing.

If you wire your paths in your application like this: ..\..\directory your code is completely dependent on the platform below because you actively worked around the possible platform independence.

Almost any amount of interesting and non-trivial will include some things that can mess up your program on some platform. The most common ones are paths (because there are basically two ways to separate paths, the UNIX and the Windows way) but there are other things:

Some people coming from a Windows background don't seem to know that for a UNIX system "Abcd" and "abdc" are different files. So if you write to the first variant but try reading from the latter, it might work under Windows but not under other systems.

If you look at your programming language's documentation you might probably see that some function or API does not properly work under one or another operating system or architecture. Some might return "dumb" or at least irritating values (because the low level call that they wrap does not actually work on that operating system for example).

Another thing is that in a 64bit environment your types might behave somewhat different that what you are used to in your 32bit world.

There are many many traps that you can fall into when writing code that aims to be platform independent, the most important thing to realize is that your chosen development environment, whether it is JAVA or Python or whatever else you might choose can help you but it does not do everything magically.

When writing platform independent code, use the programming language's facilities to abstract away from the operating system. In Python for example use os.path.join to build a path, don't try guessing whether "/" or "\" is the right delimiter.

While today's programming platforms make it quite easy to get something starting on pretty much any system under the sun there are still many bugs that are hiding in the shadows. They can all be worked around but that requires awareness.

Don't be a platform dummy and claim that the technology you use makes your code something that it ain't, you might look less smart than you actually are ;-)

June 23, 2008 08:04 PM :: Germany  

George Kargiotakis

Euro 2008 open source tour

451 CAOS Theory has a mini review of what’s going on with open source among the countries that compete in Euro 2008. It’s quite interesting. Here’s the link about Greece. It has quite a point…Things don’t look very promising…

June 23, 2008 03:04 PM :: Greece  

Daniel de Oliveira

Crossover office 7 to support Microsoft Office 2007 and more


The following is the release announcemnet from Jeremy White of CodeWeavers (CrossOver):

Hi Folks,

I am pleased to announce that we have shipped CrossOver 7 for both Macintosh and Linux. New in Version 7 is support for Microsoft Office 2007, dramatically improved support for Outlook 2003 and Internet Explorer 6, and a broad range of improvements that should bring improvements to all Windows applications.

For our Linux customers, it also brings expanded support for most Adobe programs, with Photoshop CS and CS2 working particularly well.

For our Macintosh customers, this release also brings a change in our product mix. We are now providing “CrossOver Mac Standard” and “CrossOver Mac Professional”. The new Standard product will mirror

the Linux Standard product, in that it will be a lower priced product with more basic support and no multiple user support. The new CrossOver Mac Professional product replaces the existing CrossOver Mac product. It continues to have our best support, support for multiple users, and, CrossOver Mac Pro continues to come with a complimentary copy of CrossOver Games. If you have purchased CrossOver Mac in the past, you have been automatically upgraded to a CrossOver Mac Pro license.

Finally, another major benefit of 7.0 is that it includes many of the elements of Wine 1.0, which was also released today. This is a major milestone for us, and for the Wine project. Our many years of work, and your many years of supporting our work, have enabled us to help bring Wine to this milestone. I am very proud to have been part of this, and very grateful for all the support of our customers, advocates, and fellow Wine developers.

If you are an existing CrossOver customer with an active support entitlement, you can visit our web site to download this latest version: www.codeweavers.com
You will need to log in with the email and password that you used when
purchasing CrossOver. Please write to info@codeweavers.com if you need help with this process.

Thanks again for all your support, and I hope that you enjoy CrossOver 7!

Cheers,

Jeremy White
CEO
CodeWeavers


Version 7.0 Changelog:

New application support:

  • Office 2007 (Including Word, Excel, PowerPoint, and Outlook)
  • Adobe Photoshop CS and CS2
  • Added support for the “Compatibility Pack for the 2007 Office system” so that Office 2003 can open Office 2007 documents

Bug fixes:

  • Greatly improved online banking integration in Quicken 2007 and 2008
  • Greatly improved Outlook behavior, particularly with Exchange servers
  • Fixed service pack support for several versions of Office
  • Improved IE support in win2000 and winxp bottles (though win98 is still better)
  • Improved support for modern Linux distributions (especially Ubuntu)
  • Fixed a seriously horrible interaction with the Logitech Control Center documents from Office 2007
  • This version also includes countless Wine fixes and synchronizes with Wine 1.0.
  • Many small bugs should be fixed, and unsupported application behavior should be greatly improved.

Source: Wine Reviews

June 23, 2008 01:36 PM :: São Paulo, Brazil  

Leif Biberg Kristensen

Code prettification

Inevitably, as you’re learning a new skill, such as a programming language, you may want to revisit your old work and see if you can do it better. I had this pair of functions to fetch the previous and the next “page” of a source collection, based on the sort order of the source. In plain text, in order to get to the previous “page” I want the source with the maximum sort order smaller than the present one, and with the same parent id. Here is my brute-force approach from a couple of years ago:

CREATE OR REPLACE FUNCTION get_prev_page(INTEGER) RETURNS INTEGER AS $$
DECLARE
    self_page INTEGER;
    prev_page INTEGER;
    prev_src INTEGER;
    par_id INTEGER;
BEGIN
    SELECT parent_id FROM sources INTO par_id WHERE source_id = $1;
    SELECT sort_order FROM sources INTO self_page
        WHERE source_id = $1;
    SELECT MAX(sort_order) FROM sources INTO prev_page
        WHERE parent_id = par_id AND sort_order < self_page;
    SELECT source_id FROM sources INTO prev_src
        WHERE parent_id = par_id AND sort_order = prev_page;
    RETURN COALESCE(prev_src,0);
END;
$$ LANGUAGE plpgsql STABLE;

Ugly, ugly, ugly. I remember what was the main stumbling block here: You can’t use an aggregate function such as MAX() in a WHERE clause. The thing is that you don’t need all those assignments. A little code folding, replacing variables with sub-selects, takes you a long way:

CREATE OR REPLACE FUNCTION get_prev_page(INTEGER) RETURNS INTEGER AS $$
DECLARE
    pp INTEGER;
BEGIN
    SELECT source_id INTO pp FROM sources
        WHERE parent_id = (SELECT parent_id FROM sources WHERE source_id = $1)
        AND sort_order < (SELECT sort_order FROM sources WHERE source_id = $1)
        ORDER BY sort_order DESC LIMIT 1;
    RETURN COALESCE(pp, 0);
END
$$ LANGUAGE plpgsql STABLE;

The “ORDER BY sort_order DESC LIMIT 1″ is a great idiom whenever you need to use an extreme value as part of a WHERE clause.

Even if the two versions of the code above are functionally equivalent, and the PostgreSQL planner probably will rewrite the query to something like version 1, most programmers would prefer version 2. Why? I think there’s a lot to the concept that “coding is poetry”. Version 2 is more esthetically pleasing, because it conveys its inner meaning in a much more succinct way, as in the Merriam-Webster definition of the word succinct: “marked by compact precise expression without wasted words”. At least to me it does. And that’s probably some of the essence of poetry.

June 23, 2008 11:51 AM :: Norway  

Ow Mun Heng

Automatic Raid Array Rebuilding

Hi guys, long time no post. Last post was at March and it's now already June.

Been busy as usual, however, not been dabbling as much as I "should" as I've been busy with other NON-FOSS related stuffs. (psst: I'm now heavily into photography. Went to shoot some Japan GT queens!! Kawaaiii)

Anyway, since this is a (nearly) purely an FOSS based blog, I'm gonna talk about my automatic Raid Rebuilding script.

You see, what happens is this, my postgresql box, (celeron 2x500GB in Raid 1) has a tendency to keep dieing once in a while for X reasons. (I have till now, been unable to locate the reason why it's dieing so often) I've tried to the write-all, read-all using dd but thus far, has not seen errors being thrown out. So, it's been a manual instance of...

go to work. see the email : Your raid has Died!
log onto the box, do the rebuild.

After a while, this just becomes tiring and I decided to fsck it and make it automatic.

Here's the script

#!/bin/bash

FAIL_DRV=`mdadm --detail /dev/md0 | grep faulty | awk '{print $6}'`

if [ -n "$FAIL_DRV" ]
then
  echo "Detected degraded array : $FAIL_DRV"
  echo "Starting automated array rebuild process"
  mdadm /dev/md0 --fail $FAIL_DRV --remove $FAIL_DRV --add $FAIL_DRV
else
  echo "Nothing to do"
fi


Simple eh..

So, now I don't have to come to work to see it all wonky because it'll automatically rebuild itself.

Some of you may ask, how come I don't just replace the drive? Because I can't find any replacement drive which is a PATA connection and at 500GB capacity! The largest I can find are 160GB.

Bummer

June 23, 2008 01:32 AM

Jason Jones

Toilets and Servers

This past Thursday, we finally decided to rip it up.  I