Rediff CEO Ajith Balakrishnan’s Article - interesting read
World Class- In Our Backyard
It was late night at the end of a long, long day at work. Even the lights in the fishing boats anchored in
harbour for the night were out. I was fishing in another ocean that
night, trawling the internet for any research paper that would provide
us with a method to detect India-related content on the web.
There
is a brute force way to do that - employ hundreds to check every major
website in the world… But every time the content changed you’d have
to go back and redo the work. Anyway, hiring that many people would
cost too much. There had to be a more elegant
way. For example, write a computer program to patiently check sites
throughout the world and send back snippets of information that was of
interest to Indians.
We understood the first steps in doing this. For instance you can easily get a computer program to tag an article as ‘
in an article. Generalizing from this rule you could make the program
compare every word it encounters against a list of Indian pronouns to
detect ‘
interest content’. Even more generally you could make the program
compare words it encounters with a corpus known to be relevant to
Indians, for example the last five years of Business Standard articles.
This
much we had figured out on our own. The trick was how to do this
economically. Here again the brute force method is to crawl every
single English language site in the world and look for words to compare
with our chosen corpus. But the elegant way would be to devise a method
of inspired guessing as to where to look for first and where to look
for next.
That night I was trawling the internet for research papers that described methods of inspired guessing.
Here was one! Accelerated Focused Crawling through
Online Relevance Feedback. I skimmed through the paper; it pretty much dealt with the problem I had in mind.
It was past midnight in
I ran my eyes back up to the start of the paper to check which
the paper had came out of. Then came the shock! The research paper was
from our own backyard- IIT Bombay. And the lead author was
a Professor of Computer Science there. I was astounded. Anything to do
with web crawling was hot; that’s the stuff Search Engines are built
on. There was a person from
I
couldn’t wait for the sun to rise to call him. The next day, I and some
colleagues trudged to IIT Bombay to meet the professor. He was sitting
at his computer in an ice cold room in a remote corner of the Computer
Science Department which itself was in a remote corner of the IIT
Bombay campus. He was very helpful and immediately gave us the computer
code that we needed.
I was curious about how he got interested in this topic. He pulled out a book from the stack in his room: Mining the Web: Discovering Knowledge from Hypertext Data.
“This
is a textbook I have just finished writing for US computer science
students”, he said nonchalantly tossing it back into the stack. “It
will be out in later this year in the
It
turns out that he was at Stanford at the same time as Brin and Page,
the Google founders; they went and found a venture capitalist to fund
their search engine efforts and Soumen came back to India to work at IIT Bombay: “because my mother is old and does not want to leave India”.
Landing
the Soumen catch turned out to be the easy part. Getting to engage IIT
Bombay in a commercial relationship was to be a near-impossible task.
The process for such an engagement is unchartered territory for Indian
academic institutions. We settled on a compromise: we hired two of his
star graduate students (or more accurately he persuaded them to join us
instead of doing what all their classmates did- immigrate to
Since then, we have been happily working together; whenever we run into
a really tough computer science problem, we could get to Soumen through
his students.
I
have ever since felt mildly guilty about this arrangement that gave us
so much know-how for so little payment. Till I encountered the head of
Sarnoff Labs at a conference in
is a haloed center for innovation since the 1940’s and is responsible
for bringing to the world, Colour Television and the Electron
Microscope, among other inventions. He shocked the
said that the era of corporations maintaining large central research
labs staffed with Nobel Prize winners (Bell Labs symbolized this) is
long gone. The locus of innovation has almost
entirely moved to start-ups. These start-ups staff themselves with
recent graduates of top institutions and the alumni who maintain close
consultative links to their University professors.
I
was much relieved to hear this. So, the arrangement we stumbled into:
hiring a professor’s top students and getting the professor for free
seems to be the way R&D is done today. And the irony was that we
found it in our own backyard.
END