Principal Component Analysis 4 Philosophers: Identity, Nietzsche and Deleuze

It’s been almost 10 years since I published my post on PCA 4 Dummies – by far the most popular entry on this blog. It’s is pretty much dormant here these days, but for old time’s sake I thought it’d be nice to dust off the cobwebs and revisit this topic from a different perspective.

Rather than focus on how PCA can be used as a tool in machine learning, I find it more fun to think about how dimension reduction can be used as an analogy for identity and how we connect with one another. None of these ideas are new, Nietzsche and Deleuze wrote about them much more eloquently and deeply than I ever could, but I find that using data science language can frame these ideas in ways that are useful to us simpleton Engineers. So stop bashing two rocks together, put a beret on, light a cigarette and lets delve into the mind.

Continue reading

Chaos theory, non linear dynamics and peripheries of maths: Part 1 – Back to basics

‘The periphery of the circle of science has an infinite number of points, and while there is no telling how this circle can ever be completely measured, the noble person, before the middle of his life, inevitably comes into contact with those extreme points of the periphery where he stares at the inexplicable. When he here sees to his dismay how logic coils round itself at these limits and finally bites its own tail…’

Nietzsche, The Birth of Tragedy

Chaos theory, like relativity or quantum physics, enjoys a reputation that precedes itself. There are countless pop culture references to it (the film ‘The Butterfly Effect’ starring Ashton Kutcher being a particularly notable one) and it sounds cool and mysterious. It’s also quite easy to understand conceptually, the classic example being if you went back in a time machine and stepped on an ancient bug and came back to the present day the world would look very different. 

Continue reading

The Story of Computer Vision

What is computer vision? Where has it come from? 20 years ago only a few esoteric Engineers were working on it, but now mass-market products like the Microsoft Kinect, Google’s driverless cars and facial recognition systems are using computer vision in a fast, cheap and effective way. What happened?

One reason is that computer vision requires a lot of processing, and computers have only recently become fast enough to handle the workload needed. However this is not the only reason why computer vision has gone from an academic dream to a consumer reality. Continue reading

An Engineer’s Guide to Cooking

I have recently delved into the world of cooking. But as an Engineer simply following recipes doesn’t quite fill my appetite for culinary knowledge. What happens to food when it’s cooking? Why does a steak need to rest? What happens when something is ‘aged’? Most people want to learn how to make a good meal rather than the science behind it, so these sort of questions don’t often appear in recipe books. However Engineers aren’t most people…

Over the last 20 years there has been a big focus on the science of cooking and great literature has accompanied it. This short post is about the most important rule I have learnt so far: the temperature of the water inside the food is vital. Continue reading

Making sense of Internet Platforms: Network Effects and Two Sided Markets

It seems like the quickest way to make a billion dollars at the moment is to create a successful internet platform. Companies like Facebook, eBay, Airbnb, Twitter and Paypal are platforms that have gone from obscurity to internet giants in a matter of years. So what are these platforms and how are they making so much money? A lot of starry-eyed tech entrepreneurs wax lyrical with theories that equate the technology revolution to a revolution in business and economics. But the typical way an internet platform makes profit is by acting as a two sided market, which is a type of business that existed long before the internet.

Two sided markets are naturally able to thrive at huge scales and platforms have been taking advantage of this, attaining unbelievable valuations. It is useful to view internet platforms through the lens of a two sided market because it explains the incentive structure of the platform and how the companies orient themselves in terms of product decisions. Continue reading

Wavelets 4 Dummies: Signal Processing, Fourier Transforms and Heisenberg

Wavelets have recently migrated from Maths to Engineering, with Information Engineers starting to explore the potential of this field in signal processing, data compression and noise reduction. What’s interesting about wavelets is that they are starting to undermine a staple mathematical technique in Engineering: the Fourier Transform. In doing this they are opening up a new way to make sense of signals, which is the bread and butter of Information Engineering. Continue reading

What are Fractals and why should I care?

Fractal geometry is a field of maths born in the 1970’s and mainly developed by Benoit Mandelbrot. If you’ve already heard of fractals, you’ve probably seen the picture below. It’s called the Mandelbrot Set and is an example of a fractal shape.

mandelbrot

The geometry that you learnt in school was about how to make shapes; fractal geometry is no different. While the shapes that you learnt in classical geometry were ‘smooth’, such as a circle or a triangle, the shapes that come out of fractal geometry are ‘rough’ and infinitely complex. However fractal geometry is still about making shapes, measuring shapes and defining shapes, just like school.

There are two reasons why you should care about fractal geometry: Continue reading

Principal Component Analysis 4 Dummies: Eigenvectors, Eigenvalues and Dimension Reduction

(I recently wrote a new post that you may also find interesting called Principal Component Analysis 4 Philosophers)

Having been in the social sciences for a couple of weeks it seems like a large amount of quantitative analysis relies on Principal Component Analysis (PCA). This is usually referred to in tandem with eigenvalues, eigenvectors and lots of numbers. So what’s going on? Is this just mathematical jargon to get the non-maths scholars to stop asking questions? Maybe, but it’s also a useful tool to use when you have to look at data. This post will give a very broad overview of PCA, describing eigenvectors and eigenvalues (which you need to know about to understand it) and showing how you can reduce the dimensions of data using PCA. As I said it’s a neat tool to use in information theory, and even though the maths is a bit complicated, you only need to get a broad idea of what’s going on to be able to use it effectively. Continue reading

DNA legislation: how the courts view your identity.

As part of a job application last month I wrote a draft blog post about DNA legislation. I thought I would put it here for posterity’s sake:

Last month the United States Supreme Court made a ruling that was in direct opposition to the European Court of Human Rights. The ruling, bought by the case ‘Maryland vs King’, was in regard to the collection of DNA of those in custody. It held that ‘taking and analyzing a cheek swab of the arrestee’s DNA is, like fingerprinting and photographing, a legitimate police booking procedure that is reasonable under the Fourth Amendment’. In contrast, the case ‘S and Marper v United Kingdom’, brought to the European Court of Human Rights in 2008, ruled that DNA collection of those in custody was in direct breach of Article 8 of the European Convention on Human Rights, which guarantees ‘the right to respect for his private and family life, his home and his correspondence’. In the ruling the

European Court said Article 8 ‘would be unacceptably weakened if the use of modern scientific techniques in the criminal justice system were allowed at any cost and without carefully balancing the potential benefits of the extensive use of such techniques against important private-life interests.  This disagreement between the courts highlights the ethical ambiguities that have arisen from the widespread adoption of DNA databases in the last 15 years.  Where should we draw the line between the state’s duty to maintain law and order and the individual’s right to privacy?

Continue reading

Data Compression: What it is and how it works

Data compression is used everywhere. Mp3, mp4, rar, zip, jpg and png files (along with many others) all use compressed data. Without data compression a 3 minute song would be over 100Mb and a 10 minute video would easily be over 1Gb. Data compression condenses large files into much smaller ones. It does this by getting rid of data that isn’t needed while retaining the information in the file.

Does that mean information is different to data? Yes. Lets take an example: I ask Bob who won the Arsenal game. He then launches into a 30 minute monologue about the match, detailing every pass, throw-in, tackle etc. Right at the end he tells me Arsenal won. I only wanted to know who won, so all the data Bob gave me about the game was useless. He could have compressed the data he sent into two words, ‘Arsenal won’, because that’s all the information I needed. Data compression works on the same principle. Continue reading