Taming Big Data - Interview with Udi Falkson of iknow.io

Udi Falkson

Udi Falkson

Udi is a co-founder of iknow.io, a service providing big data analytics.

These days it's very easy to bump into term "big data". What is it like to develop a service around it? That is what we are about to find out as Udi Falkson of iknow.io will tell a bit about his story and some of the technology choices they have made.

Hi, @udi. Can you tell us something about yourself?

Hi. I'm Udi, the co-founder and head of product at iknow.io. Earlier in my career, I was one of the first engineers working on Yahoo! Answers and I'm also the creator of isitnormal.com.

Can you describe what big data is about? How does iknow.io relate to it?

The amount of raw, available data out there in the world is growing exponentially. Sites like data.gov and others are giving the public all kinds of great data to play with. However, no good tools have emerged to make all this data accessible and useful to the majority of people that really care about it. This is what we're changing with iknow.io.

Today, unless you're a computer programmer or a scientist, working with anything more complicated than what will fit into a simple spreadsheet is not possible. Most people are shut out of the game. We are taking these complex data-sets and making them useful for anyone to explore and analyze. We do this by sourcing, merging and cleaning the data and providing a new breed of intuitive data analysis and comparison tools for people to use.

We currently have detailed data about Movies and US Congress. NBA data is coming very soon, and many more verticals will be added in the future.

What kind of technical challenges have you encountered during its development?

What we are doing is extremely technically challenging. Our amazing back-end team has built systems from the ground-up to scalably handle extremely large datasets of drastically varying structures. To account for the different ways that data can be organized (for example, movie data and congressional data have very different structures), we have built our systems around a proprietary graph database. To complement our graph database, we make heavy use of existing technologies, like Redis, Mongodb, Sphinx Search and Postgres.

The biggest challenge we've encountered is that data is messy, really messy. This fact led us to build a suite of in-house data cleanup and matching tools that have enabled us to efficiently load, update and organize these complex data sets and provide them to our users in a seemingly simple format that hides the real complexity from them.

On the front-end, our data analysis tools are rather advanced and it's taken our very small front-end team (just me) quite a bit of work to get them working nicely. There are also a few particular UI touches that required quite a bit of effort to get working across all browsers, such as our data results table view that has persistent horizontal and vertical scrolling headers.

You mentioned you have chosen to use Backbone on the front-end side. The framework has received some criticism lately. New options, such as Angular and Ember.js have arisen. Why did you choose Backbone? Are you happy with your choice?

Most of our front-end code is standard server-side Python (Django). For our more complex and interactive data analysis tools, we opted to provide our users with a more responsive, client-side experience. We used Backbone.js for this, and the client-side UI communicates with our query engine via REST apis. I selected Backbone after a lot of research and I'm more or less happy with it. It's relatively light-weight and there's very little magic, but I have run into a few small bugs and the code can get messy if you're not careful. A Backbone program will be only as good as the programmer writing it, because it does very little work for you. What it does do is give you just enough of a structure to work with so you can keep things manageable and maintainable.

That said, I'm very interested in Angular and Ember and have read a lot about them. When I started building iknow.io nearly a year ago both of these other frameworks were not quite as mature and I didn't feel comfortable going with either one. If I were to start over again today, I think I'd give them both serious consideration and I'm leaning towards Angular for my next such project.

How do you see the future of JavaScript development? Are there any particular trends in sight you would like to highlight?

Javascript is being used more and more on the server, and it has become tempting to use it for the whole stack to be able to share code between the front-end and back-end. In our case, Python is so strong when it comes to number crunching (we make heavy use of Pandas), that it didn't make sense for us to do so, but I see this becoming more and more common in the future and I know a few people that are building sites this way today.

Given JSter is a JavaScript catalog, can you please list some of your favorite libraries?

1
19570 3788 4.71

D3.js

D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.

2
5025 932

Masonry

Masonry is a dynamic grid layout plugin for jQuery. Think of it as the flip-side of CSS floats. Whereas floating arranges elements horizontally then vertically, Masonry arranges elements vertically, positioning each element in the next open spot in the grid. The result minimizes vertical gaps between elements of varying height, just like a mason fitting stones in a wall.

3
1534 151

Hopscotch

Hopscotch is a framework to make it easy for developers to add product tours to their pages. Hopscotch accepts a tour JSON object as input and provides an API for the developer to control rendering the tour display and managing the tour progress.

4
13877 2551 3.00

Chosen

Chosen is a JavaScript plugin that makes long, unwieldy select boxes much more user-friendly. It is currently available in both jQuery and Prototype flavors.

5

Highcharts JS

Highcharts is a charting library written in pure JavaScript, offering intuitive, interactive charts to your web site or web application. Highcharts currently supports line, spline, area, areaspline, column, bar, pie, scatter, angular gauges, arearange, areasplinerange, columnrange and polar chart types.

6
25572 5369 4.35

jQuery

jQuery is a fast and concise JavaScript Library that simplifies HTML document traversing, event handling, animating, and Ajax interactions for rapid web development. jQuery is designed to change the way that you write JavaScript.

7
7888 2674 3.83

jQuery UI

jQuery UI - Interactions and Widgets for the web jQuery UI provides interactions like Drag and Drop and widgets like Autocomplete, Tabs and Slider and makes these as easy to use as jQuery itself.

8
42 14

jQuery Infinite Scroll

A lightweight (**1kb**) jQuery plugin that provides a basic mechanism for triggering more results to be loaded when the bottom of the page is reached. It's simple and designed not to get in the way. In addition to working on all major browsers, it supports iScroll (for scrolling content on iOS devices).

9
14810 2925 4.50

Backbone.js

Backbone.js gives structure to web applications by providing models with key-value binding and custom events, collections with a rich API of enumerable functions, views with declarative event handling, and connects it all to your existing API over a RESTful JSON interface.

10
8646 1462 3.75

Underscore.js

Underscore is a utility-belt library for JavaScript that provides a lot of the functional programming support that you would expect in Prototype.js (or Ruby), but without extending any of the built-in JavaScript objects. It's the tie to go along with jQuery's tux, and Backbone.js's suspenders.

Thank you for the interview Udi. It was nice to get an inside view on a service such as iknow.io! Best of luck. :)

Published by bebraw on 2013-09-19 14:18:16

More to read

Comments